Re: RFC: Logging plan of the running query
On 2022-02-08 01:13, Fujii Masao wrote: AbortSubTransaction() should reset ActiveQueryDesc to save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? Otherwise ActiveQueryDesc of top-level statement will be unavailable after subtransaction is aborted in the nested statements. I once agreed above suggestion and made v20 patch making save_ActiveQueryDesc a global variable, but it caused segfault when calling pg_log_query_plan() after FreeQueryDesc(). OTOH, doing some kind of reset of ActiveQueryDesc seems necessary since it also caused segfault when running pg_log_query_plan() during installcheck. There may be a better way, but resetting ActiveQueryDesc to NULL seems safe and simple. Of course it makes pg_log_query_plan() useless after a subtransaction is aborted. However, if it does not often happen that people want to know the running query's plan whose subtransaction is aborted, resetting ActiveQueryDesc to NULL would be acceptable. Attached is a patch that sets ActiveQueryDesc to NULL when a subtransaction is aborted. How do you think? -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 5be784278e8e7aeeeadf60a772afccda7b59e6e4 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Wed, 9 Mar 2022 18:18:06 +0900 Subject: [PATCH v21] Add function to log the plan of the query currently running on the backend. Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro Horiguchi, Robert Treat --- doc/src/sgml/func.sgml | 49 +++ src/backend/access/transam/xact.c| 13 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 140 ++- src/backend/executor/execMain.c | 10 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 4 + src/backend/utils/init/globals.c | 1 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 3 + src/include/miscadmin.h | 1 + src/include/storage/lock.h | 2 - src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h| 2 + src/test/regress/expected/misc_functions.out | 54 +-- src/test/regress/sql/misc_functions.sql | 41 -- 17 files changed, 314 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 8a802fb225..075056 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25461,6 +25461,25 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID. +It will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -25574,6 +25593,36 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_query_plan can be used +to log the plan of a backend process. For example: + +postgres=# SELECT pg_log_query_plan(201116); + pg_log_query_plan +--- + t +(1 row) + +The format of the query plan is the same as when VERBOSE, +COSTS, SETTINGS and +FORMAT TEXT are used in the EXPLAIN +command. For example: + +LOG: query plan running on backend with PID 201116 is: +Query Text: SELECT * FROM pgbench_accounts; +Seq Scan on public.pgbench_accounts (cost=0.00..52787.00 rows=200 width=97) + Output: aid, bid, abalance, filler +Settings: work_mem = '1MB' + +Note that when statements are executed inside a function, only the +plan of the most deeply nested query is logged.
Re: Is it useful to record whether plans are generic or custom?
On 2020-12-04 14:29, Fujii Masao wrote: On 2020/11/30 15:24, Tatsuro Yamada wrote: Hi Torikoshi-san, In this patch, exposing new columns is mandatory, but I think it's better to make it optional by adding a GUC something like 'pgss.track_general_custom_plans. I also feel it makes the number of columns too many. Just adding the total time may be sufficient. I think this feature is useful for DBA. So I hope that it gets committed to PG14. IMHO, many columns are Okay because DBA can select specific columns by their query. Therefore, it would be better to go with the current design. But that design may waste lots of memory. No? For example, when plan_cache_mode=force_custom_plan, the memory used for the columns for generic plans is not used. Yeah. ISTM now that creating pg_stat_statements_xxx views both for generic andcustom plans is better than my PoC patch. And I'm also struggling with the following. | However, I also began to wonder how effective it would be to just | distinguish between generic and custom plans. Custom plans can | include all sorts of plans. and thinking cache validation, generic | plans can also include various plans. | Considering this, I'm starting to feel that it would be better to | not just keeping whether generic or cutom but the plan itself as | discussed in the below thread. Yamada-san, Do you think it's effective just distinguish between generic and custom plans? Regards,
Re: Get memory contexts of an arbitrary backend process
On 2020-12-03 10:36, Tom Lane wrote: Fujii Masao writes: I'm starting to study how this feature behaves. At first, when I executed the following query, the function never returned. ISTM that since the autovacuum launcher cannot respond to the request of memory contexts dump, the function keeps waiting infinity. Is this a bug? Probably we should exclude non-backend proceses from the target processes to dump? Sorry if this was already discussed. SELECT pg_get_backend_memory_contexts(pid) FROM pg_stat_activity; Thanks for trying it! It was not discussed explicitly, and I was going to do it later as commented. + /* TODO: Check also whether backend or not. */ FWIW, I think this patch is fundamentally unsafe. It's got a lot of the same problems that I complained about w.r.t. the nearby proposal to allow cross-backend stack trace dumping. It does avoid the trap of thinking that it can do work in a signal handler, but instead it supposes that it can do work involving very high-level objects such as shared hash tables in anyplace that might execute CHECK_FOR_INTERRUPTS. That's never going to be safe: the only real expectation the system has is that CHECK_FOR_INTERRUPTS is called at places where our state is sane enough that a transaction abort can clean up. Trying to do things like taking LWLocks is going to lead to deadlocks or worse. We need not even get into the hard questions, such as what happens when one process or the other exits unexpectedly. Thanks for reviewing! I may misunderstand something, but the dumper works not at CHECK_FOR_INTERRUPTS but during the client read, i.e., ProcessClientReadInterrupt(). Is it also unsafe? BTW, since there was a comment that the shared hash table used too much memory, I'm now rewriting this patch not to use the shared hash table but a simpler static shared memory struct. I also find the idea that this should be the same SQL function as pg_get_backend_memory_contexts to be a seriously bad decision. That means that it's not possible to GRANT the right to examine only your own process's memory --- with this proposal, that means granting the right to inspect every other process as well. Beyond that, the fact that there's no way to restrict the capability to just, say, other processes owned by the same user means that it's not really safe to GRANT to non-superusers anyway. Even with such a restriction added, things are problematic, since for example it would be possible to inquire into the workings of a security-definer function executing in another process that nominally is owned by your user. I'm going to change the function name and restrict the executor to superusers. Is it enough? Regards,
Re: Get memory contexts of an arbitrary backend process
On 2020-12-04 19:16, torikoshia wrote: On 2020-12-03 10:36, Tom Lane wrote: Fujii Masao writes: I'm starting to study how this feature behaves. At first, when I executed the following query, the function never returned. ISTM that since the autovacuum launcher cannot respond to the request of memory contexts dump, the function keeps waiting infinity. Is this a bug? Probably we should exclude non-backend proceses from the target processes to dump? Sorry if this was already discussed. SELECT pg_get_backend_memory_contexts(pid) FROM pg_stat_activity; Thanks for trying it! It was not discussed explicitly, and I was going to do it later as commented. + /* TODO: Check also whether backend or not. */ FWIW, I think this patch is fundamentally unsafe. It's got a lot of the same problems that I complained about w.r.t. the nearby proposal to allow cross-backend stack trace dumping. It does avoid the trap of thinking that it can do work in a signal handler, but instead it supposes that it can do work involving very high-level objects such as shared hash tables in anyplace that might execute CHECK_FOR_INTERRUPTS. That's never going to be safe: the only real expectation the system has is that CHECK_FOR_INTERRUPTS is called at places where our state is sane enough that a transaction abort can clean up. Trying to do things like taking LWLocks is going to lead to deadlocks or worse. We need not even get into the hard questions, such as what happens when one process or the other exits unexpectedly. Thanks for reviewing! I may misunderstand something, but the dumper works not at CHECK_FOR_INTERRUPTS but during the client read, i.e., ProcessClientReadInterrupt(). Is it also unsafe? BTW, since there was a comment that the shared hash table used too much memory, I'm now rewriting this patch not to use the shared hash table but a simpler static shared memory struct. Attached a rewritten patch. Accordingly, I also slightly modified the basic design as below. --- # Communication flow between the dumper and the requester - (1) When requesting memory context dumping, the dumper changes the struct on the shared memory state from 'ACCEPTABLE' to 'REQUESTING'. - (2) The dumper sends the signal to the dumper process and wait on the latch. - (3) When the dumper noticed the signal, it changes the state to 'DUMPING'. - (4) When the dumper completes dumping, it changes the state to 'DONE' and set the latch. - (5) The requestor reads the dump file and shows it to the user. Finally, the requestor removes the dump file and reset the shared memory state to 'ACCEPTABLE'. # Query cancellation - When the requestor cancels dumping, e.g. signaling using ctrl-C, the requestor changes the state of the shared memory to 'CANCELING'. - The dumper checks the state when it tries to change the state to 'DONE' at (4), and if the state is 'CANCELING', it initializes the dump file and reset the shared memory state to 'ACCEPTABLE'. # Cleanup dump file and the shared memory - In the normal case, the dumper removes the dump file and resets the shared memory entry as described in (5). - When something like query cancellation or process termination happens on the dumper after (1) and before (3), in other words, the state is 'REQUESTING', the requestor does the cleanup. - When something happens on the dumper or the requestor after (3) and before (4), in other words, the state is 'DUMPING', the dumper does the cleanup. Specifically, if the requestor cancels the query, it just changes the state to 'CANCELING' and the dumper notices it and cleans up things later. OTOH, when the dumper fails to dump, it cleans up the dump file and reset the shared memory state. - When something happens on the requestor after (4), i.e., the state is 'DONE', the requestor does the cleanup. - In the case of receiving SIGKILL or power failure, all dump files are removed in the crash recovery process. --- I also find the idea that this should be the same SQL function as pg_get_backend_memory_contexts to be a seriously bad decision. That means that it's not possible to GRANT the right to examine only your own process's memory --- with this proposal, that means granting the right to inspect every other process as well. Beyond that, the fact that there's no way to restrict the capability to just, say, other processes owned by the same user means that it's not really safe to GRANT to non-superusers anyway. Even with such a restriction added, things are problematic, since for example it would be possible to inquire into the workings of a security-definer function executing in another process that nominally is owned by your user. I'm going to change the function name and restrict the executor to superusers. Is it enough? In the attached patch, I changed the fu
adding wait_start column to pg_locks
Hi, When examining the duration of locks, we often do join on pg_locks and pg_stat_activity and use columns such as query_start or state_change. However, since these columns are the moment when queries have started or their state has changed, we cannot get the exact lock duration in this way. So I'm now thinking about adding a new column in pg_locks which keeps the time at which locks started waiting. One problem with this idea would be the performance impact of calling gettimeofday repeatedly. To avoid it, I reused the result of the gettimeofday which was called for deadlock_timeout timer start as suggested in the previous discussion[1]. Attached a patch. BTW in this patch, for fast path locks, wait_start is set to zero because it seems the lock will not be waited for. If my understanding is wrong, I would appreciate it if someone could point out. Any thoughts? [1] https://www.postgresql.org/message-id/28804.1407907184%40sss.pgh.pa.us Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 1a6a7377877cc52e4b87a05bbb8ffae92cdb91ab Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Tue, 15 Dec 2020 10:55:32 +0900 Subject: [PATCH v1] Add wait_start field into pg_locks. To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the exact lock duration in this way. This patch adds a new field preserving the time at which locks started waiting. --- doc/src/sgml/catalogs.sgml | 9 + src/backend/storage/lmgr/lock.c | 10 ++ src/backend/storage/lmgr/proc.c | 2 ++ src/backend/utils/adt/lockfuncs.c | 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 3 +++ src/test/regress/expected/rules.out | 5 +++-- 7 files changed, 38 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 62711ee83f..19af0e9af4 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10567,6 +10567,15 @@ SCRAM-SHA-256$:&l lock table + + + + wait_start timestamptz + + + The time at which lock started waiting + + diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index d86566f455..7b30508f95 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -1196,6 +1196,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc, lock->waitMask = 0; SHMQueueInit(&(lock->procLocks)); ProcQueueInit(&(lock->waitProcs)); + lock->wait_start = 0; lock->nRequested = 0; lock->nGranted = 0; MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES); @@ -3628,6 +3629,12 @@ GetLockStatusData(void) instance->leaderPid = proc->pid; instance->fastpath = true; + /* + * Successfully taking fast path lock means there was no + * conflicting locks. + */ + instance->wait_start = 0; + el++; } @@ -3655,6 +3662,7 @@ GetLockStatusData(void) instance->pid = proc->pid; instance->leaderPid = proc->pid; instance->fastpath = true; + instance->wait_start = 0; el++; } @@ -3707,6 +3715,7 @@ GetLockStatusData(void) instance->pid = proc->pid; instance->leaderPid = proclock->groupLeader->pid; instance->fastpath = false; + instance->wait_start = lock->wait_start; el++; } @@ -4184,6 +4193,7 @@ lock_twophase_recover(TransactionId xid, uint16 info, lock->waitMask = 0; SHMQueueInit(&(lock->procLocks)); ProcQueueInit(&(lock->waitProcs)); + lock->wait_start = 0; lock->nRequested = 0; lock->nGranted = 0; MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES); diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 7dc3911590..f3702cc681 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -1259,6 +1259,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) } else enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout); + + lock->wait_start = get_timeout_start_time(DEADLOCK_TIMEOUT); } /* diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c index f592292d06..5ee0953305 100644 --- a/src/backend/utils/adt/lockfuncs.c +++ b/src/backend/utils/adt/lockfuncs.c @@ -63,7 +63,7 @@ typedef struct } PG_Lock_Status; /* Number of columns in pg_locks output */ -#define NUM_LOCK_STATUS_COLUMNS 15 +#define NUM_LOCK_STATUS_COLUMNS 16 /* * VXIDGetDatum - Construct a text representation of a VXID @@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS) BOOLOID, -1, 0); TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath", BOOLOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 16, "wait_start", + TIMESTAMPTZOID, -1, 0); funcctx->t
Re: Get memory contexts of an arbitrary backend process
On Fri, Dec 25, 2020 at 6:08 PM Kasahara Tatsuhito wrote: Thanks for reviewing and kind suggestion! Attached a rewritten patch. Thanks for updating patch. But when I had applyed the patch to the current HEAD and did make, I got an error due to duplicate OIDs. You need to rebase the patch. Assigned another OID. Accordingly, I also slightly modified the basic design as below. --- # Communication flow between the dumper and the requester - (1) When requesting memory context dumping, the dumper changes the struct on the shared memory state from 'ACCEPTABLE' to 'REQUESTING'. - (2) The dumper sends the signal to the dumper process and wait on the latch. - (3) When the dumper noticed the signal, it changes the state to 'DUMPING'. - (4) When the dumper completes dumping, it changes the state to 'DONE' and set the latch. - (5) The requestor reads the dump file and shows it to the user. Finally, the requestor removes the dump file and reset the shared memory state to 'ACCEPTABLE'. # Query cancellation - When the requestor cancels dumping, e.g. signaling using ctrl-C, the requestor changes the state of the shared memory to 'CANCELING'. - The dumper checks the state when it tries to change the state to 'DONE' at (4), and if the state is 'CANCELING', it initializes the dump file and reset the shared memory state to 'ACCEPTABLE'. # Cleanup dump file and the shared memory - In the normal case, the dumper removes the dump file and resets the shared memory entry as described in (5). - When something like query cancellation or process termination happens on the dumper after (1) and before (3), in other words, the state is 'REQUESTING', the requestor does the cleanup. - When something happens on the dumper or the requestor after (3) and before (4), in other words, the state is 'DUMPING', the dumper does the cleanup. Specifically, if the requestor cancels the query, it just changes the state to 'CANCELING' and the dumper notices it and cleans up things later. OTOH, when the dumper fails to dump, it cleans up the dump file and reset the shared memory state. - When something happens on the requestor after (4), i.e., the state is 'DONE', the requestor does the cleanup. - In the case of receiving SIGKILL or power failure, all dump files are removed in the crash recovery process. --- If the dumper is terminated before it dumps, the requestor will appear to enter an infinite loop because the status of mcxtdumpShmem will not change. The following are the steps to reproduce. - session1 BEGIN; LOCK TABLE t; - session2 SELECT * FROM t; -- wait - session3 select pg_get_target_backend_memory_contexts(); -- wait - session1 select pg_terminate_backend(); -- kill session2 - session3 waits forever. Therefore, you may need to set mcxtdumpShmem->dump_status to MCXTDUMPSTATUS_CANCELING or other status before the dumper terminates. In this case, it may be difficult for the dumper to change dump_status because it's waiting for latch and dump_memory_contexts() is not called yet. Instead, it's possible for the requestor to check the existence of the dumper process periodically during waiting. I added this logic to the attached patch. Also, although I have not been able to reproduce it, I believe that with the current design, if the requestor disappears right after the dumper dumps the memory information, the dump file will remain. Since the current design appears to allow only one requestor per instance, when the requestor requests a dump, it might be a good idea to delete any remaining dump files, if any. Although I'm not sure when the dump file remains, deleting any remaining dump files seems good for safety. I also added this idea to the attached patch. The following are comments on the code. + proc = BackendPidGetProc(dst_pid); + + if (proc == NULL) + { + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL server process", dst_pid))); + + return (Datum) 1; + } For now, background writer, checkpointer and WAL writer are belong to the auxiliary process. Therefore, if we specify the PIDs of these processes for pg_get_target_backend_memory_contexts(), "PID is not a PostgreSQL server process" whould be outoput. This confuses the user. How about use AuxiliaryPidGetProc() to determine these processes? Thanks and I modified the patch to output the below message when it's an auxiliary process. | PID %d is not a PostgreSQL backend process but an auxiliary process. + ereport(INFO, + (errmsg("The request has failed and now PID %d is requsting dumping.", + mcxtdumpShmem->src_pid))); + + LWLockRelease(McxtDumpLock); You can release LWLock before ereport. Modified to release the lock before ereport. + Assert(mcxtdumpShmem->dump_status = MCXTDUMPSTATUS_REQUESTING); typo? It might be "mcxtdumpShmem->dump_status == MCXTDUMPSTATUS_REQUESTING". Ops, it's a serious typo
Re: adding wait_start column to pg_locks
On 2021-01-02 06:49, Justin Pryzby wrote: On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote: So I'm now thinking about adding a new column in pg_locks which keeps the time at which locks started waiting. Attached a patch. This is failing make check-world, would you send an updated patch ? I added you as an author so it shows up here. http://cfbot.cputube.org/atsushi-torikoshi.html Thanks! Attached an updated patch. Regards,From 608bba31da1bc5d15db991662fa858cd4632d849 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 4 Jan 2021 09:53:17 +0900 Subject: [PATCH v2] To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the exact lock duration in this way. This patch adds a new field preserving the time at which locks started waiting. --- contrib/amcheck/expected/check_btree.out | 4 ++-- doc/src/sgml/catalogs.sgml | 9 + src/backend/storage/lmgr/lock.c | 10 ++ src/backend/storage/lmgr/proc.c | 2 ++ src/backend/utils/adt/lockfuncs.c| 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 3 +++ src/test/regress/expected/rules.out | 5 +++-- 8 files changed, 40 insertions(+), 8 deletions(-) diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..c0aecb0288 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | wait_start +--+--+--+--+---++---+-+---+--++-+--+-+--+ (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 3a2266526c..626e5672bd 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10578,6 +10578,15 @@ SCRAM-SHA-256$<iteration count>:&l lock table + + + + wait_start timestamptz + + + The time at which lock started waiting + + diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 20e50247ea..27969d3772 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -1195,6 +1195,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc, lock->waitMask = 0; SHMQueueInit(&(lock->procLocks)); ProcQueueInit(&(lock->waitProcs)); + lock->wait_start = 0; lock->nRequested = 0; lock->nGranted = 0; MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES); @@ -3627,6 +3628,12 @@ GetLockStatusData(void) instance->leaderPid = proc->pid; instance->fastpath = true; + /* + * Successfully taking fast path lock means there was no + * conflicting locks. + */ + instance->wait_start = 0; + el++; } @@ -3654,6 +3661,7 @@ GetLockStatusData(void) instance->pid = proc->pid; instance->leaderPid = proc->pid; instance->fastpath = true; + instance->wait_start = 0; el++; } @@ -3706,6 +3714,7 @@ GetLockStatusData(void) instance->pid = proc->pid; instance->leaderPid = proclock->groupLeader->pid; instance->fastpath = false; + instance->wait_start = lock->wait_start; el++; } @@ -4183,6 +4192,7 @@ lock_twophase_recover(TransactionId xid, uint16 info, lock->waitMask = 0; SHMQueueInit(&(lock->procLocks)); ProcQueueInit(&(lock->waitProcs)); + lock->wait_start = 0; lock->nRequested = 0; lock->nGranted = 0; MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES); diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 57717f666d..56aa8b7f6b 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -1259,6 +1259,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) } else enable_timeout_after(DEADLOCK_TIMEOUT, Deadlo
Re: adding wait_start column to pg_locks
On 2021-02-16 16:59, Fujii Masao wrote: On 2021/02/15 15:17, Fujii Masao wrote: On 2021/02/10 10:43, Fujii Masao wrote: On 2021/02/09 23:31, torikoshia wrote: On 2021-02-09 22:54, Fujii Masao wrote: On 2021/02/09 19:11, Fujii Masao wrote: On 2021/02/09 18:13, Fujii Masao wrote: On 2021/02/09 17:48, torikoshia wrote: On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(&MyProc->waitStart) == 0) + pg_atomic_write_u64(&MyProc->waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) &now)); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(&MyProc->waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart)); Same as above. + /* + * Record waitStart reusing the deadlock timeout timer. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(&MyProc->waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Thanks for the test! I pushed the patch. But I reverted the patch because buildfarm members rorqual and prion don't like the patch. I'm trying to investigate the cause of this failures. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10 - relation | locktype | mode --+--+- - test_prepared_1 | relation | RowExclusiveLock - test_prepared_1 | relation | AccessExclusiveLock -(2 rows) - +ERROR: invalid spinlock number: 0 "rorqual" reported that the above error happened in the server built with --disable-atomics --disable-spinlocks when reading pg_locks after the transaction was prepared. The cause of this issue is that "waitStart" atomic variable in the dummy proc created at the end of prepare transaction was not initialized. I updated the patch so that pg_atomic_init_u64() is called for the "waitStart" in the dummy proc for prepared transaction. Patch attached. I confirmed that the patched server built with --disable-atomics --disable-spinlocks passed all the regression tests. Thanks for fixing the bug, I also tested v9.patch configured with --disable-atomics --disable-spinlocks on my environment and confirmed that all tests have passed. Thanks for the test! I found another bug in the patch. InitProcess() initializes "waitStart", but previously InitAuxiliaryProcess() did not. This could cause "invalid spinlock number" error when reading pg_locks in the standby server. I fixed that. Attached is the updated version of the patch. I pushed this version. Thanks! While reading the patch again, I found two minor things. 1. As discussed in another thread [1], the atomic variable "waitStart" should be initialized at the postmaster startup rather than the startup of each child process. I changed "waitStart" so that it's initialized in InitProcGlobal() and also reset to 0 by using pg_atomic_write_u64() in InitProcess() and InitAuxiliaryProcess(). 2. Thanks to the above c
Re: Printing backtrace of postgres processes
Hi, I also think this feature would be useful when supporting environments that lack debugger or debug symbols. I think such environments are not rare. + for more information. This +will help in identifying where exactly the backend process is currently +executing. When I read this, I expected a backtrace would be generated at the moment when it receives the signal, but actually it just sets a flag that causes the next CHECK_FOR_INTERRUPTS to print a backtrace. How about explaining the timing of the backtrace generation? +print backtrace of superuser backends. This feature is not supported +for postmaster, logging and statistics process. Since the current patch use BackendPidGetProc(), it does not support this feature not only postmaster, logging, and statistics but also checkpointer, background writer, and walwriter. And when I specify pid of these PostgreSQL processes, it says "PID is not a PostgreSQL server process". I think it may confuse users, so it might be worth changing messages for those PostgreSQL processes. AuxiliaryPidGetProc() may help to do it. diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 54a818b..5fae328 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -57,6 +57,7 @@ #include "storage/shmem.h" #include "storage/smgr.h" #include "storage/spin.h" +#include "tcop/tcopprot.h" #include "utils/guc.h" #include "utils/memutils.h" #include "utils/resowner.h" @@ -547,6 +548,13 @@ HandleCheckpointerInterrupts(void) if (ProcSignalBarrierPending) ProcessProcSignalBarrier(); + /* Process printing backtrace */ + if (PrintBacktracePending) + { + PrintBacktracePending = false; + set_backtrace(NULL, 0); + } + Although it implements backtrace for checkpointer, when I specified pid of checkpointer it was refused from BackendPidGetProc(). Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: Get memory contexts of an arbitrary backend process
On 2021-01-14 19:11, torikoshia wrote: Since pg_get_target_backend_memory_contexts() waits to dump memory and it could lead dead lock as below. - session1 BEGIN; TRUNCATE t; - session2 BEGIN; TRUNCATE t; -- wait - session1 SELECT * FROM pg_get_target_backend_memory_contexts(); --wait Thanks for notifying me, Fujii-san. Attached v8 patch that prohibited calling the function inside transactions. Regrettably, this modification could not cope with the advisory lock and I haven't come up with a good way to deal with it. It seems to me that the architecture of the requestor waiting for the dumper leads to this problem and complicates things. Considering the discussion printing backtrace discussion[1], it seems reasonable that the requestor just sends a signal and dumper dumps to the log file. Since I found a past discussion that was doing exactly what I thought reasonable[2], I'm going to continue that discussion if there are no objections. Any thought? [1] https://www.postgresql.org/message-id/flat/CALDaNm3ZzmFS-=r7oduzj7y7bgqv+n06kqyft6c3xzdoknk...@mail.gmail.com [2] https://www.postgresql.org/message-id/flat/20171212044330.3nclev2sfrab36tf%40alap3.anarazel.de#6f28be9839c74779ed6aaa75616124f5 Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: Printing backtrace of postgres processes
On 2021-03-04 21:55, Bharath Rupireddy wrote: On Mon, Mar 1, 2021 at 10:43 AM torikoshia wrote: Since the current patch use BackendPidGetProc(), it does not support this feature not only postmaster, logging, and statistics but also checkpointer, background writer, and walwriter. And when I specify pid of these PostgreSQL processes, it says "PID is not a PostgreSQL server process". I think it may confuse users, so it might be worth changing messages for those PostgreSQL processes. AuxiliaryPidGetProc() may help to do it. Exactly this was the doubt I got when I initially reviewed this patch. And I felt it should be discussed in a separate thread, you may want to update your thoughts there [1]. [1] - https://www.postgresql.org/message-id/CALj2ACW7Rr-R7mBcBQiXWPp%3DJV5chajjTdudLiF5YcpW-BmHhg%40mail.gmail.com Thanks! I'm going to join the discussion there. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: Should we improve "PID XXXX is not a PostgreSQL server process" warning for pg_terminate_backend(<>)?
On 2021-03-07 19:16, Bharath Rupireddy wrote: On Fri, Feb 5, 2021 at 5:15 PM Bharath Rupireddy wrote: pg_terminate_backend and pg_cancel_backend with postmaster PID produce "PID is not a PostgresSQL server process" warning [1], which basically implies that the postmaster is not a PostgreSQL process at all. This is a bit misleading because the postmaster is the parent of all PostgreSQL processes. Should we improve the warning message if the given PID is postmasters' PID? +1. I felt it was a bit confusing when reviewing a thread[1]. If yes, how about a generic message for both of the functions - "signalling postmaster process is not allowed" or "cannot signal postmaster process" or some other better suggestion? [1] 2471176 ---> is postmaster PID. postgres=# select pg_terminate_backend(2471176); WARNING: PID 2471176 is not a PostgreSQL server process pg_terminate_backend -- f (1 row) postgres=# select pg_cancel_backend(2471176); WARNING: PID 2471176 is not a PostgreSQL server process pg_cancel_backend --- f (1 row) I'm attaching a small patch that emits a warning "signalling postmaster with PID %d is not allowed" for postmaster and "signalling PostgreSQL server process with PID %d is not allowed" for auxiliary processes such as checkpointer, background writer, walwriter. However, for stats collector and sys logger processes, we still get "PID X is not a PostgreSQL server process" warning because they don't have PGPROC entries(??). So BackendPidGetProc and AuxiliaryPidGetProc will not help and even pg_stat_activity is not having these processes' pid. I also ran into the same problem while creating a patch in [2]. I'm now wondering if changing the message to something like "PID is not a PostgreSQL backend process". "backend process' is now defined as "Process of an instance which acts on behalf of a client session and handles its requests." in Appendix. [1] https://www.postgresql.org/message-id/CALDaNm3ZzmFS-%3Dr7oDUzj7y7BgQv%2BN06Kqyft6C3xZDoKnk_6w%40mail.gmail.com [2] https://www.postgresql.org/message-id/0271f440ac77f2a4180e0e56ebd944d1%40oss.nttdata.com Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: Get memory contexts of an arbitrary backend process
On 2020-10-23 13:46, Kyotaro Horiguchi wrote: Wait... Attachments: 0003-Enabled-pg_get_backend_memory_contexts-to-collect.patch For a moment I thought that the number is patch number but the predecessors are 0002-Enabled..collect.patch and 0001-(same name). It's not mandatory but we usually do as the follows and it's the way of git. v1-0001-Enabled...collect.patch v2-0001-Enabled...collect.patch The vn is added by -v option for git-format-patch. Sorry for the confusion. I'll follow that way next time. At Thu, 22 Oct 2020 21:32:00 +0900, torikoshia wrote in > > I added a shared hash table consisted of minimal members > > mainly for managing whether the file is dumped or not. > > Some members like 'loc' seem useful in the future, but I > > haven't added them since it's not essential at this point. > Yes, that would be good. > + /* > + * Since we allow only one session can request to dump > memory context at > + * the same time, check whether the dump files already exist. > + */ > + while (stat(dumpfile, &stat_tmp) == 0 || stat(tmpfile, > &stat_tmp) == 0) > + { > + pg_usleep(100L); > + } > If pg_get_backend_memory_contexts() is executed by two or more > sessions at the same time, it cannot be run exclusively in this way. > Currently it seems to cause a crash when do it so. > This is easy to reproduce and can be done as follows. > [session-1] > BEGIN; > LOCK TABKE t1; > [Session-2] > BEGIN; > LOCK TABLE t1; <- waiting > [Session-3] > select * FROM pg_get_backend_memory_contexts(); > [Session-4] > select * FROM pg_get_backend_memory_contexts( session-2>); > If you issue commit or abort at session-1, you will get SEGV. > Instead of checking for the existence of the file, it might be better > to use a hash (mcxtdumpHash) entry with LWLock. Thanks! Added a LWLock and changed the way from checking the file existence to finding the hash entry. > + if (proc == NULL) > + { > + ereport(WARNING, > + (errmsg("PID %d is not a PostgreSQL server > process", dst_pid))); > + return (Datum) 1; > + } > Shouldn't it clear the hash entry before return? Yeah. added codes for removing the entry. + entry = AddEntryToMcxtdumpHash(dst_pid); + + /* Check whether the target process is PostgreSQL backend process. */ + /* TODO: Check also whether backend or not. */ + proc = BackendPidGetProc(dst_pid); + + if (proc == NULL) + { + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL server process", dst_pid))); + + LWLockAcquire(McxtDumpHashLock, LW_EXCLUSIVE); + + if (hash_search(mcxtdumpHash, &dst_pid, HASH_REMOVE, NULL) == NULL) + elog(WARNING, "hash table corrupted"); + + LWLockRelease(McxtDumpHashLock); + + return (Datum) 1; + } Why do you enter a useles entry then remove it immedately? Do you mean I should check the process existence first since it enables us to skip entering hash entries? + PG_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) Int32GetDatum(dst_pid)); + { + SendProcSignal(dst_pid, PROCSIG_DUMP_MEMORY, InvalidBackendId); "PROCSIG_DUMP_MEMORY" is somewhat misleading. Hwo about "PROCSIG_DUMP_MEMCXT" or "PROCSIG_DUMP_MEMORY_CONTEXT"? I'll go with "PROCSIG_DUMP_MEMCXT". I thought that the hash table would prevent multiple reqestors from making a request at once, but the patch doesn't seem to do that. + /* Wait until target process finished dumping file. */ + while (entry->dump_status == MCXTDUMPSTATUS_NOTYET) This needs LWLock. And this could read the entry after reused by another backend if the dumper process is gone. That isn't likely to happen, but theoretically the other backend may set it to MCXTDUMPSTATUS_NOTYET inbetween two successive check on the member. Thanks for your notification. I'll use an LWLock. + /* +* Make dump file ends with 'D'. +* This is checked by the caller when reading the file. +*/ + fputc('E', fpout); Which is right? Sorry, the comment was wrong.. + fputc('E', fpout); + + CHECK_FOR_INTERRUPTS(); This means that the process accepts another request and rewrite the file even while the first requester is reading it. And, the file can
Re: Is it useful to record whether plans are generic or custom?
On 2020-09-29 02:39, legrand legrand wrote: Hi Atsushi, +1: Your proposal is a good answer for time based performance analysis (even if parsing durationor blks are not differentiated) . As it makes pgss number of columns wilder, may be an other solution would be to create a pg_stat_statements_xxx view with the same key as pgss (dbid,userid,queryid) and all thoses new counters. Thanks for your ideas and sorry for my late reply. It seems creating pg_stat_statements_xxx views both for generic and custom plans is better than my PoC patch. However, I also began to wonder how effective it would be to just distinguish between generic and custom plans. Custom plans can include all sorts of plans. and thinking cache validation, generic plans can also include various plans. Considering this, I'm starting to feel that it would be better to not just keeping whether generic or cutom but the plan itself as discussed in the below thread. https://www.postgresql.org/message-id/flat/CAKU4AWq5_jx1Vyai0_Sumgn-Ks0R%2BN80cf%2Bt170%2BzQs8x6%3DHew%40mail.gmail.com#f57e64b8d37697c808e4385009340871 Any thoughts? Regards, -- Atsushi Torikoshi
Re: Get memory contexts of an arbitrary backend process
On 2020-10-28 15:32, torikoshia wrote: On 2020-10-23 13:46, Kyotaro Horiguchi wrote: I think we might need to step-back to basic design of this feature since this patch seems to have unhandled corner cases that are difficult to find. I've written out the basic design below and attached corresponding patch. # Communication flow between the dumper and the requester - (1) When REQUESTING memory context dumping, the dumper adds an entry to the shared memory. The entry manages the dump state and it is set to 'REQUESTING'. - (2) The dumper sends the signal to the dumper and wait on the latch. - (3) The dumper looks into the corresponding shared memory entry and changes its state to 'DUMPING'. - (4) When the dumper completes dumping, it changes the state to 'DONE' and set the latch. - (5) The dumper reads the dump file and shows it to the user. Finally, the dumper removes the dump file and reset the shared memory entry. # Query cancellation - When the requestor cancels dumping, e.g. signaling using ctrl-C, the requestor changes the status of the shared memory entry to 'CANCELING'. - The dumper checks the status when it tries to change the state to 'DONE' at (4), and if the state is 'CANCELING', it removes the dump file and reset the shared memory entry. # Cleanup dump file and the shared memory entry - In the normal case, the dumper removes the dump file and resets the shared memory entry as described in (5). - When something like query cancellation or process termination happens on the dumper after (1) and before (3), in other words, the state is 'REQUESTING', the requestor does the cleanup. - When something happens on the dumper or the requestor after (3) and before (4), in other words, the state is 'DUMPING', the dumper does the cleanup. Specifically, if the requestor cancels the query, it just changes the state to 'CANCELING' and the dumper notices it and cleans up things later. OTOH, when the dumper fails to dump, it cleans up the dump file and deletes the entry on the shared memory. - When something happens on the requestor after (4), i.e., the state is 'DONE', the requestor does the cleanup. - In the case of receiving SIGKILL or power failure, all dump files are removed in the crash recovery process. Although there was a suggestion that shared memory hash table should be changed to more efficient structures, I haven't done it in this patch. I think it can be treated separately, I'm going to work on that later. On 2020-11-11 00:07, Georgios Kokolatos wrote: Hi, I noticed that this patch fails on the cfbot. For this, I changed the status to: 'Waiting on Author'. Cheers, //Georgios The new status of this patch is: Waiting on Author Thanks for your notification and updated the patch. Changed the status to: 'Waiting on Author'. Regards, -- Atsushi TorikoshiFrom c6d06b11d16961acd59bfa022af52cb5fc668b3e Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 16 Nov 2020 11:49:03 +0900 Subject: [PATCH v4] Enabled pg_get_backend_memory_contexts() to collect arbitrary backend process's memory contexts. Previsouly, pg_get_backend_memory_contexts() could only get the local memory contexts. This patch enables to get memory contexts of the arbitrary backend process which PID is specified by the argument. --- src/backend/access/transam/xlog.c| 7 + src/backend/catalog/system_views.sql | 4 +- src/backend/postmaster/pgstat.c | 3 + src/backend/replication/basebackup.c | 3 + src/backend/storage/ipc/ipci.c | 2 + src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lwlocknames.txt | 1 + src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/mcxtfuncs.c| 615 ++- src/backend/utils/init/globals.c | 1 + src/bin/initdb/initdb.c | 3 +- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 4 +- src/bin/pg_rewind/filemap.c | 3 + src/include/catalog/pg_proc.dat | 11 +- src/include/miscadmin.h | 1 + src/include/pgstat.h | 3 +- src/include/storage/procsignal.h | 1 + src/include/utils/mcxtfuncs.h| 52 ++ src/test/regress/expected/rules.out | 2 +- 19 files changed, 697 insertions(+), 28 deletions(-) create mode 100644 src/include/utils/mcxtfuncs.h diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index a1078a7cfc..f628fa8b53 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -73,6 +73,7 @@ #include "storage/sync.h" #include "utils/builtins.h" #include "utils/guc.h" +#include "utils/mcxtfuncs.h"
Re: Is it useful to record whether plans are generic or custom?
On 2020-11-12 14:23, Pavel Stehule wrote: yes, the plan self is very interesting information - and information if plan was generic or not is interesting too. It is other dimension of query - maybe there can be rule - for any query store max 100 most slows plans with all attributes. The next issue is fact so first first 5 execution of generic plans are not generic in real. This fact should be visible too. Thanks! However, AFAIU, we can know whether the plan type is generic or custom from the plan information as described in the manual. -- https://www.postgresql.org/docs/devel/sql-prepare.html If a generic plan is in use, it will contain parameter symbols $n, while a custom plan will have the supplied parameter values substituted into it. If we can get the plan information, the case like 'first 5 execution of generic plans are not generic in real' does not happen, doesn't it? Regards, -- Atsushi Torikoshi
[doc] adding way to examine the plan type of prepared statements
Hi, Currently, EXPLAIN is the only way to know whether the plan is generic or custom according to the manual of PREPARE. https://www.postgresql.org/docs/devel/sql-prepare.html After commit d05b172, we can also use pg_prepared_statements view to examine the plan types. How about adding this explanation like the attached patch? Regards, -- Atsushi TorikoshiFrom 2c8f66637075fcb2f802a2b9cfd354f2ef18 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 12 Nov 2020 17:00:19 +0900 Subject: [PATCH v1] After commit d05b172, we can use pg_prepared_statements view to examine whether the plan is generic or custom. This patch adds this explanation in the manual of PREPARE. --- doc/src/sgml/ref/prepare.sgml | 7 +++ 1 file changed, 7 insertions(+) diff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml index 57a34ff83c..2268c222a9 100644 --- a/doc/src/sgml/ref/prepare.sgml +++ b/doc/src/sgml/ref/prepare.sgml @@ -179,6 +179,13 @@ EXPLAIN EXECUTE name(parameter_values + + To examine how many times each prepared statement chose generic and + custom plan cumulatively in the current session, refer + pg_prepared_statements + system view. + + Although the main point of a prepared statement is to avoid repeated parse analysis and planning of the statement, PostgreSQL will -- 2.18.1
[doc] plan invalidation when statistics are update
Hi, AFAIU, when the planner statistics are updated, generic plans are invalidated and PostgreSQL recreates. However, the manual doesn't seem to explain it explicitly. https://www.postgresql.org/docs/devel/sql-prepare.html I guess this case is included in 'whenever database objects used in the statement have definitional (DDL) changes undergone', but I feel it's hard to infer. Since updates of the statistics can often happen, how about describing this case explicitly like an attached patch? Regards, -- Atsushi TorikoshiFrom d71dbb0b100f706f19d92175b72f9e1833a8a442 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 12 Nov 2020 17:18:29 +0900 Subject: [PATCH v1] When the planner statistics are updated, generic plans are invalidated and PostgreSQL recreates. However, the manual didn't explain it explicitly. This patch adds this case as a example. --- doc/src/sgml/ref/prepare.sgml | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml index 57a34ff83c..4075de5689 100644 --- a/doc/src/sgml/ref/prepare.sgml +++ b/doc/src/sgml/ref/prepare.sgml @@ -185,7 +185,10 @@ EXPLAIN EXECUTE name(parameter_values changes + statement. For example, when the planner statistics of the statement + are updated, PostgreSQL re-analyzes and + re-plans the statement. + Also, if the value of changes from one use to the next, the statement will be re-parsed using the new search_path. (This latter behavior is new as of PostgreSQL 9.3.) These rules make use of a -- 2.18.1
Re: [doc] plan invalidation when statistics are update
On 2020-11-18 11:35, Fujii Masao wrote: Thanks for your comment! On 2020/11/18 11:04, torikoshia wrote: Hi, AFAIU, when the planner statistics are updated, generic plans are invalidated and PostgreSQL recreates. However, the manual doesn't seem to explain it explicitly. https://www.postgresql.org/docs/devel/sql-prepare.html I guess this case is included in 'whenever database objects used in the statement have definitional (DDL) changes undergone', but I feel it's hard to infer. Since updates of the statistics can often happen, how about describing this case explicitly like an attached patch? +1 to add that note. - statement. Also, if the value of changes + statement. For example, when the planner statistics of the statement + are updated, PostgreSQL re-analyzes and + re-plans the statement. I don't think "For example," is necessary. "planner statistics of the statement" sounds vague? Does the statement is re-analyzed and re-planned only when the planner statistics of database objects used in the statement are updated? If yes, we should describe that to make the note a bit more explicitly? Yes. As far as I confirmed, updating statistics which are not used in prepared statements doesn't trigger re-analyze and re-plan. Since plan invalidations for DDL changes and statistcal changes are caused by PlanCacheRelCallback(Oid 'relid'), only the prepared statements using 'relid' relation seem invalidated. Attached updated patch. Regards, - Atsushi TorikoshiFrom f8c051e57e1ca15e2b91d3e69fe0531c0b7bf7ca Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 19 Nov 2020 13:23:18 +0900 Subject: [PATCH v2] When the planner statistics are updated, generic plans are invalidated and PostgreSQL recreates. However, the manual didn't explain it explicitly. This patch adds an explanation for this case. --- doc/src/sgml/ref/prepare.sgml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml index 57a34ff83c..5a6dd481bc 100644 --- a/doc/src/sgml/ref/prepare.sgml +++ b/doc/src/sgml/ref/prepare.sgml @@ -185,7 +185,9 @@ EXPLAIN EXECUTE name(parameter_values changes + statement. Similarly, whenever the planner statistics of database + objects used in the statement have updated, re-analysis and re-planning + happen. Also, if the value of changes from one use to the next, the statement will be re-parsed using the new search_path. (This latter behavior is new as of PostgreSQL 9.3.) These rules make use of a -- 2.18.1
Re: [doc] adding way to examine the plan type of prepared statements
On 2020-11-18 11:04, torikoshia wrote: Hi, Currently, EXPLAIN is the only way to know whether the plan is generic or custom according to the manual of PREPARE. https://www.postgresql.org/docs/devel/sql-prepare.html After commit d05b172, we can also use pg_prepared_statements view to examine the plan types. How about adding this explanation like the attached patch? Sorry, but on second thought, since it seems better to add the explanation to the current description of pg_prepared_statements, I modified the patch. Regards, -- Atsushi TorikoshiFrom ec969fa55c2ffc71ce0b94e923e013d650de2220 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 19 Nov 2020 14:45:49 +0900 Subject: [PATCH v2] After commit d05b172, we can use pg_prepared_statements view to examine the number of generic and custom were chosen. This patch adds this explanation in the manual of PREPARE. --- doc/src/sgml/ref/prepare.sgml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml index 57a34ff83c..3cc5f9de4a 100644 --- a/doc/src/sgml/ref/prepare.sgml +++ b/doc/src/sgml/ref/prepare.sgml @@ -204,7 +204,8 @@ EXPLAIN EXECUTE name(parameter_values You can see all prepared statements available in the session by querying the pg_prepared_statements - system view. + system view. This view also shows the numbers of generic and custom plans + were chosen. -- 2.18.1
Re: [doc] plan invalidation when statistics are update
On 2020-11-25 14:13, Fujii Masao wrote: On 2020/11/24 23:14, Fujii Masao wrote: On 2020/11/19 14:33, torikoshia wrote: On 2020-11-18 11:35, Fujii Masao wrote: Thanks for your comment! On 2020/11/18 11:04, torikoshia wrote: Hi, AFAIU, when the planner statistics are updated, generic plans are invalidated and PostgreSQL recreates. However, the manual doesn't seem to explain it explicitly. https://www.postgresql.org/docs/devel/sql-prepare.html I guess this case is included in 'whenever database objects used in the statement have definitional (DDL) changes undergone', but I feel it's hard to infer. Since updates of the statistics can often happen, how about describing this case explicitly like an attached patch? +1 to add that note. - statement. Also, if the value of linkend="guc-search-path"/> changes + statement. For example, when the planner statistics of the statement + are updated, PostgreSQL re-analyzes and + re-plans the statement. I don't think "For example," is necessary. "planner statistics of the statement" sounds vague? Does the statement is re-analyzed and re-planned only when the planner statistics of database objects used in the statement are updated? If yes, we should describe that to make the note a bit more explicitly? Yes. As far as I confirmed, updating statistics which are not used in prepared statements doesn't trigger re-analyze and re-plan. Since plan invalidations for DDL changes and statistcal changes are caused by PlanCacheRelCallback(Oid 'relid'), only the prepared statements using 'relid' relation seem invalidated.> Attached updated patch. Thanks for confirming that and updating the patch! force re-analysis and re-planning of the statement before using it whenever database objects used in the statement have undergone definitional (DDL) changes since the previous use of the prepared - statement. Also, if the value of changes + statement. Similarly, whenever the planner statistics of database + objects used in the statement have updated, re-analysis and re-planning + happen. "been" should be added between "have" and "updated" in the above "objects used in the statement have updated"? You're right. I'm inclined to add "since the previous use of the prepared statement" into also the second description, to make it clear. But if we do that, it's better to merge the above two description into one, as follows? whenever database objects used in the statement have undergone - definitional (DDL) changes since the previous use of the prepared + definitional (DDL) changes or the planner statistics of them have + been updated since the previous use of the prepared statement. Also, if the value of changes Thanks, it seems better. Regards,
Re: Is it useful to record whether plans are generic or custom?
wrote in ISTM now that creating pg_stat_statements_xxx views both for generic andcustom plans is better than my PoC patch. On my second thought, it also makes pg_stat_statements too complicated compared to what it makes possible.. I'm also worrying that whether taking generic and custom plan execution time or not would be controlled by a GUC variable, and the default would be not taking them. Not many people will change the default. Since the same queryid can contain various queries (different plan, different parameter $n, etc.), I also started to feel that it is not appropriate to get the execution time of only generic/custom queries separately. I suppose it would be normal practice to store past results of pg_stat_statements for future comparisons. If this is the case, I think that if we only add the number of generic plan execution, it will give us a hint to notice the cause of performance degradation due to changes in the plan between generic and custom. For example, if there is a clear difference in the number of times the generic plan is executed between before and after performance degradation as below, it would be natural to check if there is a problem with the generic plan. [after performance degradation] =# SELECT query, calls, generic_calls FROM pg_stat_statements where query like '%t1%'; query| calls | generic_calls -+---+--- PREPARE p1 as select * from t1 where i = $1 | 1100 |50 [before performance degradation] =# SELECT query, calls, generic_calls FROM pg_stat_statements where query like '%t1%'; query| calls | generic_calls -+---+--- PREPARE p1 as select * from t1 where i = $1 | 1000 | 0 Attached a patch that just adds a generic call counter to pg_stat_statements. Any thoughts? Regards, -- Atsushi Torikoshidiff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql index 0f63f08f7e..7fdef315ae 100644 --- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql +++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql @@ -44,7 +44,8 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean, OUT blk_write_time float8, OUT wal_records int8, OUT wal_fpi int8, -OUT wal_bytes numeric +OUT wal_bytes numeric, +OUT generic_calls int8 ) RETURNS SETOF record AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8' diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 72a117fc19..171c39f857 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -77,10 +77,12 @@ #include "storage/fd.h" #include "storage/ipc.h" #include "storage/spin.h" +#include "tcop/pquery.h" #include "tcop/utility.h" #include "utils/acl.h" #include "utils/builtins.h" #include "utils/memutils.h" +#include "utils/plancache.h" #include "utils/timestamp.h" PG_MODULE_MAGIC; @@ -192,6 +194,7 @@ typedef struct Counters int64 wal_records; /* # of WAL records generated */ int64 wal_fpi; /* # of WAL full page images generated */ uint64 wal_bytes; /* total amount of WAL generated in bytes */ + int64 generic_calls; /* # of times generic plans executed */ } Counters; /* @@ -277,6 +280,9 @@ static int exec_nested_level = 0; /* Current nesting depth of planner calls */ static int plan_nested_level = 0; +/* Current plan type */ +static bool is_plan_type_generic = false; + /* Saved hook values in case of unload */ static shmem_startup_hook_type prev_shmem_startup_hook = NULL; static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL; @@ -1034,6 +1040,20 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags) */ if (pgss_enabled(exec_nested_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0)) { + /* + * Since ActivePortal is not available at ExecutorEnd, we preserve + * the plan type here. + */ + Assert(ActivePortal); + + if (ActivePortal->cplan) + { + if (ActivePortal->cplan->is_generic) +is_plan_type_generic = true; + else +is_plan_type_generic = false; + } + /* * Set up to track total elapsed time in ExecutorRun. Make sure the * space is allocated in the per-query context so it will go away at @@ -1427,6 +1447,8 @@ pgss_store(const char *query, uint64 queryId, e->counters.max_time[kind] = total_time; e->counters.mean_time[kind] = total_time; } + else if (kind == PGSS_EXEC && is_plan_type_generic) + e->counters.generic_calls += 1; else { /* @@ -1510,8 +1532,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS) #define PG_STAT_STATEMENTS_COLS_V1_1 18 #define PG_STAT_STATEMENTS_COLS_V1_2 19 #define PG_STAT_STATEMENTS_COLS_V1_3 23 -#define PG
Re: Get memory contexts of an arbitrary backend process
v7 that fixes recent conflicts. It also changed the behavior of requestor when another requestor is already working for simplicity. In this case, v6 patch makes the requestor wait. v7 patch makes the requestor quit. Regards, -- Atsushi TorikoshiFrom f20e48d99f2770bfec275805185aa5ce08661fce Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Tue, 12 Jan 2021 20:55:43 +0900 Subject: [PATCH v7] After commit 3e98c0bafb28de, we can display the usage of the memory contexts using pg_backend_memory_contexts system view. However, its target is limited to the process attached to the current session. This patch introduces pg_get_target_backend_memory_contexts() and makes it possible to collect memory contexts of the specified process. --- src/backend/access/transam/xlog.c| 7 + src/backend/catalog/system_views.sql | 3 +- src/backend/postmaster/pgstat.c | 3 + src/backend/replication/basebackup.c | 3 + src/backend/storage/ipc/ipci.c | 2 + src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lwlocknames.txt | 1 + src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/mcxtfuncs.c| 731 ++- src/backend/utils/init/globals.c | 1 + src/bin/initdb/initdb.c | 3 +- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 4 +- src/bin/pg_rewind/filemap.c | 3 + src/include/catalog/pg_proc.dat | 12 +- src/include/miscadmin.h | 1 + src/include/pgstat.h | 3 +- src/include/storage/procsignal.h | 1 + src/include/utils/mcxtfuncs.h| 44 ++ 18 files changed, 810 insertions(+), 21 deletions(-) create mode 100644 src/include/utils/mcxtfuncs.h diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ede93ad7fd..4cab47a61d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -73,6 +73,7 @@ #include "storage/sync.h" #include "utils/builtins.h" #include "utils/guc.h" +#include "utils/mcxtfuncs.h" #include "utils/memutils.h" #include "utils/ps_status.h" #include "utils/relmapper.h" @@ -6993,6 +6994,12 @@ StartupXLOG(void) */ pgstat_reset_all(); + /* + * Reset dump files in pg_memusage, because target processes do + * not exist any more. + */ + RemoveMemcxtFile(0); + /* * If there was a backup label file, it's done its job and the info * has now been propagated into pg_control. We must get rid of the diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 5d89e77dbe..7419c496b2 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -558,7 +558,8 @@ CREATE VIEW pg_backend_memory_contexts AS SELECT * FROM pg_get_backend_memory_contexts(); REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC; -REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC; +REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts FROM PUBLIC; +REVOKE EXECUTE ON FUNCTION pg_get_target_backend_memory_contexts FROM PUBLIC; -- Statistics views diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 3f24a33ef1..8eb2d062b0 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -4045,6 +4045,9 @@ pgstat_get_wait_ipc(WaitEventIPC w) case WAIT_EVENT_XACT_GROUP_UPDATE: event_name = "XactGroupUpdate"; break; + case WAIT_EVENT_DUMP_MEMORY_CONTEXT: + event_name = "DumpMemoryContext"; + break; /* no default case, so that compiler will warn */ } diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 0f54635550..c67e71d79b 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -184,6 +184,9 @@ static const char *const excludeDirContents[] = /* Contents zeroed on startup, see StartupSUBTRANS(). */ "pg_subtrans", + /* Skip memory context dump files. */ + "pg_memusage", + /* end of list */ NULL }; diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index f9bbe97b50..18a1dd5a74 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -45,6 +45,7 @@ #include "storage/procsignal.h" #include "storage/sinvaladt.h" #include "storage/spin.h" +#include "utils/mcxtfuncs.h" #include "utils/snapmgr.h" /* GUCs */ @@ -267,6 +268,7 @@ CreateSharedMemoryAndSemaphores(void) BTreeShmemInit(); SyncScanShmemInit(); AsyncShmemInit(); + McxtDumpShmemInit(); #ifdef EXEC_BACKEND diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index 583efaecff..106e125cc2 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -28,6 +28,7 @@ #include "storage/shmem.h" #include "storage/
Re: Get memory contexts of an arbitrary backend process
Since pg_get_target_backend_memory_contexts() waits to dump memory and it could lead dead lock as below. - session1 BEGIN; TRUNCATE t; - session2 BEGIN; TRUNCATE t; -- wait - session1 SELECT * FROM pg_get_target_backend_memory_contexts(2>); --wait Thanks for notifying me, Fujii-san. Attached v8 patch that prohibited calling the function inside transactions. Regards, -- Atsushi TorikoshiFrom 840185c1ad40cb7bc40333ab38927667c4d48c1d Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 14 Jan 2021 18:20:43 +0900 Subject: [PATCH v8] After commit 3e98c0bafb28de, we can display the usage of the memory contexts using pg_backend_memory_contexts system view. However, its target is limited to the process attached to the current session. This patch introduces pg_get_target_backend_memory_contexts() and makes it possible to collect memory contexts of the specified process. --- src/backend/access/transam/xlog.c| 7 + src/backend/catalog/system_views.sql | 3 +- src/backend/postmaster/pgstat.c | 3 + src/backend/replication/basebackup.c | 3 + src/backend/storage/ipc/ipci.c | 2 + src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lwlocknames.txt | 1 + src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/mcxtfuncs.c| 742 ++- src/backend/utils/init/globals.c | 1 + src/bin/initdb/initdb.c | 3 +- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 4 +- src/bin/pg_rewind/filemap.c | 3 + src/include/catalog/pg_proc.dat | 12 +- src/include/miscadmin.h | 1 + src/include/pgstat.h | 3 +- src/include/storage/procsignal.h | 1 + src/include/utils/mcxtfuncs.h| 44 ++ 18 files changed, 821 insertions(+), 21 deletions(-) create mode 100644 src/include/utils/mcxtfuncs.h diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b18257c198..45381c343a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -74,6 +74,7 @@ #include "storage/sync.h" #include "utils/builtins.h" #include "utils/guc.h" +#include "utils/mcxtfuncs.h" #include "utils/memutils.h" #include "utils/ps_status.h" #include "utils/relmapper.h" @@ -7009,6 +7010,12 @@ StartupXLOG(void) */ pgstat_reset_all(); + /* + * Reset dump files in pg_memusage, because target processes do + * not exist any more. + */ + RemoveMemcxtFile(0); + /* * If there was a backup label file, it's done its job and the info * has now been propagated into pg_control. We must get rid of the diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 5d89e77dbe..7419c496b2 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -558,7 +558,8 @@ CREATE VIEW pg_backend_memory_contexts AS SELECT * FROM pg_get_backend_memory_contexts(); REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC; -REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC; +REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts FROM PUBLIC; +REVOKE EXECUTE ON FUNCTION pg_get_target_backend_memory_contexts FROM PUBLIC; -- Statistics views diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 3f24a33ef1..8eb2d062b0 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -4045,6 +4045,9 @@ pgstat_get_wait_ipc(WaitEventIPC w) case WAIT_EVENT_XACT_GROUP_UPDATE: event_name = "XactGroupUpdate"; break; + case WAIT_EVENT_DUMP_MEMORY_CONTEXT: + event_name = "DumpMemoryContext"; + break; /* no default case, so that compiler will warn */ } diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 0f54635550..c67e71d79b 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -184,6 +184,9 @@ static const char *const excludeDirContents[] = /* Contents zeroed on startup, see StartupSUBTRANS(). */ "pg_subtrans", + /* Skip memory context dump files. */ + "pg_memusage", + /* end of list */ NULL }; diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index f9bbe97b50..18a1dd5a74 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -45,6 +45,7 @@ #include "storage/procsignal.h" #include "storage/sinvaladt.h" #include "storage/spin.h" +#include "utils/mcxtfuncs.h" #include "utils/snapmgr.h" /* GUCs */ @@ -267,6 +268,7 @@ CreateSharedMemoryAndSemaphores(void) BTreeShmemInit(); SyncScanShmemInit(); AsyncShmemInit(); + McxtDumpShmemInit(); #ifdef EXEC_BACKEND diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index 583efaecff..106e
Re: adding wait_start column to pg_locks
Thanks for your reviewing and comments! On 2021-01-14 12:39, Ian Lawrence Barwick wrote: Looking at the code, this happens as the wait start time is being recorded in the lock record itself, so always contains the value reported by the latest lock acquisition attempt. I think you are right and wait_start should not be recorded in the LOCK. On 2021-01-15 11:48, Ian Lawrence Barwick wrote: 2021年1月15日(金) 3:45 Robert Haas : On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick wrote: It looks like the logical place to store the value is in the PROCLOCK structure; ... That seems surprising, because there's one PROCLOCK for every combination of a process and a lock. But, a process can't be waiting for more than one lock at the same time, because once it starts waiting to acquire the first one, it can't do anything else, and thus can't begin waiting for a second one. So I would have thought that this would be recorded in the PROC. Umm, I think we're at cross-purposes here. The suggestion is to note the time when the process started waiting for the lock in the process's PROCLOCK, rather than in the lock itself (which in the original version of the patch resulted in all processes with an interest in the lock appearing to have been waiting to acquire it since the time a lock acquisition was most recently attempted). AFAIU, it seems possible to record wait_start in the PROCLOCK but redundant since each process can wait at most one lock. To confirm my understanding, I'm going to make another patch that records wait_start in the PGPROC. Regards, -- Atsushi Torikoshi
Re: adding wait_start column to pg_locks
On 2021-01-15 15:23, torikoshia wrote: Thanks for your reviewing and comments! On 2021-01-14 12:39, Ian Lawrence Barwick wrote: Looking at the code, this happens as the wait start time is being recorded in the lock record itself, so always contains the value reported by the latest lock acquisition attempt. I think you are right and wait_start should not be recorded in the LOCK. On 2021-01-15 11:48, Ian Lawrence Barwick wrote: 2021年1月15日(金) 3:45 Robert Haas : On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick wrote: It looks like the logical place to store the value is in the PROCLOCK structure; ... That seems surprising, because there's one PROCLOCK for every combination of a process and a lock. But, a process can't be waiting for more than one lock at the same time, because once it starts waiting to acquire the first one, it can't do anything else, and thus can't begin waiting for a second one. So I would have thought that this would be recorded in the PROC. Umm, I think we're at cross-purposes here. The suggestion is to note the time when the process started waiting for the lock in the process's PROCLOCK, rather than in the lock itself (which in the original version of the patch resulted in all processes with an interest in the lock appearing to have been waiting to acquire it since the time a lock acquisition was most recently attempted). AFAIU, it seems possible to record wait_start in the PROCLOCK but redundant since each process can wait at most one lock. To confirm my understanding, I'm going to make another patch that records wait_start in the PGPROC. Attached a patch. I noticed previous patches left the wait_start untouched even after it acquired lock. The patch also fixed it. Any thoughts? Regards, -- Atsushi TorikoshiFrom 62ff3e4dba7d45c260a62a33425cb2d1e6b822c9 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 18 Jan 2021 10:01:35 +0900 Subject: [PATCH v4] To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the exact lock duration in this way. This patch adds a new field preserving the time at which locks started waiting. --- contrib/amcheck/expected/check_btree.out | 4 ++-- doc/src/sgml/catalogs.sgml | 10 ++ src/backend/storage/lmgr/lock.c | 8 src/backend/storage/lmgr/proc.c | 4 src/backend/utils/adt/lockfuncs.c| 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 2 ++ src/include/storage/proc.h | 1 + src/test/regress/expected/rules.out | 5 +++-- 9 files changed, 41 insertions(+), 8 deletions(-) diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..c0aecb0288 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | wait_start +--+--+--+--+---++---+-+---+--++-+--+-+--+ (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 43d7a1ad90..a5ce0835a9 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10589,6 +10589,16 @@ SCRAM-SHA-256$<iteration count>:&l lock table + + + + wait_start timestamptz + + + Lock acquisition wait start time. NULL if + lock acquired. + + diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 20e50247ea..5b5fb474e0 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -3627,6 +3627,12 @@ GetLockStatusData(void) instance->leaderPid = proc->pid; instance->fastpath = true; + /* + * Successfully taking fast path lock means there were no + * conflicting locks. + */ + instance->wait_start = 0; + el++; } @@ -3
TOAST condition for column size
Hi, When I created a table consisting of 400 VARCHAR columns and tried to INSERT a record which rows were all the same size, there were cases where I got an error due to exceeding the size limit per row. =# -- create a table consisting of 400 VARCHAR columns =# CREATE TABLE t1 (c1 VARCHAR(100), c2 VARCHAR(100), ... c400 VARCHAR(100)); =# -- insert one record which rows are all 20 bytes =# INSERT INTO t1 VALUES (repeat('a', 20), repeat('a', 20), ... repeat('a', 20)); ERROR: row is too big: size 8424, maximum size 8160 What is interesting is that it failed only when the size of each column was 20~23 bytes, as shown below. size of each column | result --- 18 bytes | success 19 bytes | success 20 bytes | failure 21 bytes | failure 22 bytes | failure 23 bytes | failure 24 bytes | success 25 bytes | success When the size of each column was 19 bytes or less, it succeeds because the row size is within a page size. When the size of each column was 24 bytes or more, it also succeeds because columns are TOASTed and the row size is reduced to less than one page size. OTOH, when it's more than 19 bytes and less than 24 bytes, columns aren't TOASTed because it doesn't meet the condition of the following if statement. --src/backend/access/table/toast_helper.c toast_tuple_find_biggest_attribute(ToastTupleContext *ttc, bool for_compression, bool check_main) ...(snip)... int32biggest_size = MAXALIGN(TOAST_POINTER_SIZE); ...(snip)... if (ttc->ttc_attr[i].tai_size > biggest_size) // <- here { biggest_attno = i; biggest_size = ttc->ttc_attr[i].tai_size; } Since TOAST_POINTER_SIZE is 18 bytes but MAXALIGN(TOAST_POINTER_SIZE) is 24 bytes, columns are not TOASTed until its size becomes larger than 24 bytes. I confirmed these sizes in my environment but AFAIU they would be the same size in any environment. So, as a result of adjusting the alignment, 20~23 bytes seems to fail. I wonder if it might be better not to adjust the alignment here as an attached patch because it succeeded in inserting 20~23 bytes records. Or is there reasons to add the alignment here? I understand that TOAST is not effective for small data and it's not recommended to create a table containing hundreds of columns, but I think cases that can be successful should be successful. Any thoughts? Regards, -- Atsushi Torikoshidiff --git a/src/backend/access/table/toast_helper.c b/src/backend/access/table/toast_helper.c index fb36151ce5..e916c0f95c 100644 --- a/src/backend/access/table/toast_helper.c +++ b/src/backend/access/table/toast_helper.c @@ -183,7 +183,7 @@ toast_tuple_find_biggest_attribute(ToastTupleContext *ttc, TupleDesc tupleDesc = ttc->ttc_rel->rd_att; int numAttrs = tupleDesc->natts; int biggest_attno = -1; - int32 biggest_size = MAXALIGN(TOAST_POINTER_SIZE); + int32 biggest_size = TOAST_POINTER_SIZE; int32 skip_colflags = TOASTCOL_IGNORE; int i;
Re: TOAST condition for column size
On 2021-01-19 19:32, Amit Kapila wrote: On Mon, Jan 18, 2021 at 7:53 PM torikoshia Because no benefit is to be expected by compressing it. The size will be mostly the same. Also, even if we somehow try to fit this data via toast, I think reading speed will be slower because for all such columns an extra fetch from toast would be required. Another thing is you or others can still face the same problem with 17-byte column data. I don't this is the right way to fix it. I don't have many good ideas but I think you can try by (a) increasing block size during configure, (b) reduce the number of columns, (c) create char columns of somewhat bigger size say greater than 24 bytes to accommodate your case. I know none of these are good workarounds but at this moment I can't think of better alternatives. Thanks for your explanation and workarounds! On 2021-01-20 00:40, Tom Lane wrote: Dilip Kumar writes: On Tue, 19 Jan 2021 at 6:28 PM, Amit Kapila wrote: Won't it be safe because we don't align individual attrs of type varchar where length is less than equal to 127? Yeah right, I just missed that point. Yeah, the minimum on biggest_size has nothing to do with alignment decisions. It's just a filter to decide whether it's worth trying to toast anything. Having said that, I'm pretty skeptical of this patch: I think its most likely real-world effect is going to be to waste cycles (and create TOAST-table bloat) on the way to failing anyway. I do not think that toasting a 20-byte field down to 18 bytes is likely to be a productive thing to do in typical situations. The given example looks like a cherry-picked edge case rather than a useful case to worry about. I agree with you, it seems only work when there are many columns with 19 ~ 23 bytes of data and it's not a normal case. I'm not sure, but a rare exception might be some geographic data. That's the situation I heard that problem happened. Regards, -- Atsushi Torikoshi
Re: adding wait_start column to pg_locks
On 2021-01-21 12:48, Fujii Masao wrote: Thanks for updating the patch! I think that this is really useful feature!! Thanks for reviewing! I have two minor comments. + role="column_definition"> + wait_start timestamptz The column name "wait_start" should be "waitstart" for the sake of consistency with other column names in pg_locks? pg_locks seems to avoid including an underscore in column names, so "locktype" is used instead of "lock_type", "virtualtransaction" is used instead of "virtual_transaction", etc. + Lock acquisition wait start time. NULL if + lock acquired. Agreed. I also changed the variable name "wait_start" in struct PGPROC and LockInstanceData to "waitStart" for the same reason. There seems the case where the wait start time is NULL even when "grant" is false. It's better to add note about that case into the docs? For example, I found that the wait start time is NULL while the startup process is waiting for the lock. Is this only that case? Thanks, this is because I set 'waitstart' in the following condition. ---src/backend/storage/lmgr/proc.c > 1250 if (!InHotStandby) As far as considering this, I guess startup process would be the only case. I also think that in case of startup process, it seems possible to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I did it in the attached patch. Any thoughts? Regards, -- Atsushi TorikoshiFrom 6beb1c61e72c797c915427ae4e36d6bab9e0594c Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 22 Jan 2021 13:51:00 +0900 Subject: [PATCH v5] To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the exact lock duration in this way. --- contrib/amcheck/expected/check_btree.out | 4 ++-- doc/src/sgml/catalogs.sgml | 10 ++ src/backend/storage/ipc/standby.c| 8 ++-- src/backend/storage/lmgr/lock.c | 8 src/backend/storage/lmgr/proc.c | 4 src/backend/utils/adt/lockfuncs.c| 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 2 ++ src/include/storage/proc.h | 1 + src/test/regress/expected/rules.out | 5 +++-- 10 files changed, 47 insertions(+), 10 deletions(-) diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..5a3f1ef737 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart +--+--+--+--+---++---+-+---+--++-+--+-+--+--- (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 43d7a1ad90..ba003ce393 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10589,6 +10589,16 @@ SCRAM-SHA-256$:&l lock table + + + + waitstart timestamptz + + + Lock acquisition wait start time. NULL if + lock acquired + + diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 39a30c00f7..819e00e4ab 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -539,13 +539,17 @@ ResolveRecoveryConflictWithDatabase(Oid dbid) void ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict) { - TimestampTz ltime; + TimestampTz ltime, now; Assert(InHotStandby); ltime = GetStandbyLimitTime(); + now = GetCurrentTimestamp(); - if (GetCurrentTimestamp() >= ltime && ltime != 0) + if (MyProc->waitStart == 0) + MyProc->waitStart = now; + + if (now >= ltime && ltime != 0) { /* * We're already behind, so clear a path as quickly as possible. diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 20e50247ea..ffad4e94bc 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -3627,6 +3627,12 @@ GetLockStatusData(void) i
Re: adding wait_start column to pg_locks
On 2021-01-25 23:44, Fujii Masao wrote: Another comment is; Doesn't the change of MyProc->waitStart need the lock table's partition lock? If yes, we can do that by moving LWLockRelease(partitionLock) just after the change of MyProc->waitStart, but which causes the time that lwlock is being held to be long. So maybe we need another way to do that. Thanks for your comments! It would be ideal for the consistency of the view to record "waitstart" during holding the table partition lock. However, as you pointed out, it would give non-negligible performance impacts. I may miss something, but as far as I can see, the influence of not holding the lock is that "waitstart" can be NULL even though "granted" is false. I think people want to know the start time of the lock when locks are held for a long time. In that case, "waitstart" should have already been recorded. If this is true, I think the current implementation may be enough on the condition that users understand it can happen that "waitStart" is NULL and "granted" is false. Attached a patch describing this in the doc and comments. Any Thoughts? Regards, -- Atsushi TorikoshiFrom 03c6e1ed6ffa215ee898b5a6a75d77277fb8e672 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Tue, 2 Feb 2021 21:32:36 +0900 Subject: [PATCH v6] To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the lock duration in this way. This patch adds a new field "waitstart" preserving lock acquisition wait start time. Note that updating this field and lock acquisition are not performed synchronously for performance reasons. Therefore, depending on the timing, it can happen that waitstart is NULL even though granted is false. Author: Atsushi Torikoshi Reviewed-by: Ian Lawrence Barwick, Robert Haas, Fujii Masao Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a...@oss.nttdata.com --- contrib/amcheck/expected/check_btree.out | 4 ++-- doc/src/sgml/catalogs.sgml | 14 ++ src/backend/storage/ipc/standby.c| 16 ++-- src/backend/storage/lmgr/lock.c | 8 src/backend/storage/lmgr/proc.c | 10 ++ src/backend/utils/adt/lockfuncs.c| 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 2 ++ src/include/storage/proc.h | 1 + src/test/regress/expected/rules.out | 5 +++-- 10 files changed, 65 insertions(+), 10 deletions(-) diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..5a3f1ef737 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart +--+--+--+--+---++---+-+---+--++-+--+-+--+--- (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 865e826fb0..d81d6e1c52 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10592,6 +10592,20 @@ SCRAM-SHA-256$:&l lock table + + + + waitstart timestamptz + + + Lock acquisition wait start time. + Note that updating this field and lock acquisition are not performed + synchronously for performance reasons. Therefore, depending on the + timing, it can happen that waitstart is + NULL even though + granted is false. + + diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 39a30c00f7..2282229568 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -539,13 +539,25 @@ ResolveRecoveryConflictWithDatabase(Oid dbid) void ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict) { - TimestampTz ltime; + TimestampTz ltime, now; Assert(InHotStandby); ltime = GetStandbyLimitTime(); + now = GetCurrentTimestamp(); - if (GetCurrentTimestamp() >= ltime && ltime != 0
Re: Is it useful to record whether plans are generic or custom?
Chengxi Sun, Yamada-san, Horiguchi-san, Thanks for all your comments. Adding only the number of generic plan execution seems acceptable. On Mon, Jan 25, 2021 at 2:10 PM Kyotaro Horiguchi wrote: Note that ActivePortal is the closest nested portal. So it gives the wrong result for nested portals. I may be wrong, but I thought it was ok since the closest nested portal is the portal to be executed. ActivePortal is used in ExecutorStart hook in the patch. And as far as I read PortalStart(), ActivePortal is changed to the portal to be executed before ExecutorStart(). If possible, could you tell me the specific case which causes wrong results? Regards, -- Atsushi Torikoshi
Re: adding wait_start column to pg_locks
On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Regards, -- Atsushi TorikoshiFrom 38a3d8996c4b1690cf18cdb1015e270201d34330 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 4 Feb 2021 23:23:36 +0900 Subject: [PATCH v7] To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the lock duration in this way. This patch adds a new field "waitstart" preserving lock acquisition wait start time. Note that updating this field and lock acquisition are not performed synchronously for performance reasons. Therefore, depending on the timing, it can happen that waitstart is NULL even though granted is false. Author: Atsushi Torikoshi Reviewed-by: Ian Lawrence Barwick, Robert Haas, Fujii Masao Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a...@oss.nttdata.com modifies --- contrib/amcheck/expected/check_btree.out | 4 ++-- doc/src/sgml/catalogs.sgml | 13 + src/backend/storage/ipc/standby.c| 17 +++-- src/backend/storage/lmgr/lock.c | 8 src/backend/storage/lmgr/proc.c | 14 ++ src/backend/utils/adt/lockfuncs.c| 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 2 ++ src/include/storage/proc.h | 1 + src/test/regress/expected/rules.out | 5 +++-- 10 files changed, 69 insertions(+), 10 deletions(-) diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..5a3f1ef737 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart +--+--+--+--+---++---+-+---+--++-+--+-+--+--- (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 865e826fb0..7df4c30a65 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10592,6 +10592,19 @@ SCRAM-SHA-256$:&l lock table + + + + waitstart timestamptz + + + Time when the server process started waiting for this lock, + or null if the lock is held. + Note that this can be null for a very short period of time after + the wait started even though granted + is false. + + diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 39a30c00f7..1c8135ba74 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -539,13 +539,26 @@ ResolveRecoveryConflictWithDatabase(Oid dbid) void ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict) { - TimestampTz ltime; + TimestampTz ltime, now; Assert(InHotStandby); ltime = GetStandbyLimitTime(); + now = GetCurrentTimestamp(); - if (GetCurrentTimestamp() >= ltime && ltime != 0) + /* + * Record waitStart using the current time obtained for comparison + * with ltime. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ + if (pg_atomic_read_u64(&MyProc->waitStart) ==
Re: Is it useful to record whether plans are generic or custom?
On 2021-02-04 11:19, Kyotaro Horiguchi wrote: At Thu, 04 Feb 2021 10:16:47 +0900, torikoshia wrote in Chengxi Sun, Yamada-san, Horiguchi-san, Thanks for all your comments. Adding only the number of generic plan execution seems acceptable. On Mon, Jan 25, 2021 at 2:10 PM Kyotaro Horiguchi wrote: > Note that ActivePortal is the closest nested portal. So it gives the > wrong result for nested portals. I may be wrong, but I thought it was ok since the closest nested portal is the portal to be executed. After executing the inner-most portal, is_plan_type_generic has a value for the inner-most portal and it won't be changed ever after. At the ExecutorEnd of all the upper-portals see the value for the inner-most portal left behind is_plan_type_generic nevertheless the portals at every nest level are independent. ActivePortal is used in ExecutorStart hook in the patch. And as far as I read PortalStart(), ActivePortal is changed to the portal to be executed before ExecutorStart(). If possible, could you tell me the specific case which causes wrong results? Running a plpgsql function that does PREPRE in a query that does PREPARE? Thanks for your explanation! I confirmed that it in fact happened. To avoid it, attached patch preserves the is_plan_type_generic before changing it and sets it back at the end of pgss_ExecutorEnd(). Any thoughts? Regards, -- Atsushi Torikoshidiff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql index 0f63f08f7e..7fdef315ae 100644 --- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql +++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql @@ -44,7 +44,8 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean, OUT blk_write_time float8, OUT wal_records int8, OUT wal_fpi int8, -OUT wal_bytes numeric +OUT wal_bytes numeric, +OUT generic_calls int8 ) RETURNS SETOF record AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8' diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 62cccbfa44..f5801016d6 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -77,10 +77,12 @@ #include "storage/fd.h" #include "storage/ipc.h" #include "storage/spin.h" +#include "tcop/pquery.h" #include "tcop/utility.h" #include "utils/acl.h" #include "utils/builtins.h" #include "utils/memutils.h" +#include "utils/plancache.h" #include "utils/timestamp.h" PG_MODULE_MAGIC; @@ -192,6 +194,7 @@ typedef struct Counters int64 wal_records; /* # of WAL records generated */ int64 wal_fpi; /* # of WAL full page images generated */ uint64 wal_bytes; /* total amount of WAL generated in bytes */ + int64 generic_calls; /* # of times generic plans executed */ } Counters; /* @@ -277,6 +280,10 @@ static int exec_nested_level = 0; /* Current nesting depth of planner calls */ static int plan_nested_level = 0; +/* Current and previous plan type */ +static bool is_plan_type_generic = false; +static bool is_prev_plan_type_generic = false; + /* Saved hook values in case of unload */ static shmem_startup_hook_type prev_shmem_startup_hook = NULL; static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL; @@ -1034,6 +1041,23 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags) */ if (pgss_enabled(exec_nested_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0)) { + /* + * Since ActivePortal is not available at ExecutorEnd, we preserve + * the current and previous plan type here. + * Previous one is necessary since portals can be nested. + */ + Assert(ActivePortal); + + is_prev_plan_type_generic = is_plan_type_generic; + + if (ActivePortal->cplan) + { + if (ActivePortal->cplan->is_generic) +is_plan_type_generic = true; + else +is_plan_type_generic = false; + } + /* * Set up to track total elapsed time in ExecutorRun. Make sure the * space is allocated in the per-query context so it will go away at @@ -1122,6 +1146,9 @@ pgss_ExecutorEnd(QueryDesc *queryDesc) NULL); } + /* Storing has done. Set is_plan_type_generic back to the original. */ + is_plan_type_generic = is_prev_plan_type_generic; + if (prev_ExecutorEnd) prev_ExecutorEnd(queryDesc); else @@ -1427,6 +1454,8 @@ pgss_store(const char *query, uint64 queryId, e->counters.max_time[kind] = total_time; e->counters.mean_time[kind] = total_time; } + else if (kind == PGSS_EXEC && is_plan_type_generic) + e->counters.generic_calls += 1; else { /* @@ -1510,8 +1539,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS) #define PG_STAT_STATEMENTS_COLS_V1_1 18 #define PG_STAT_STATEMENTS_COLS_V1_2 19 #define PG_STAT_STAT
Re: adding wait_start column to pg_locks
On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(&MyProc->waitStart) == 0) + pg_atomic_write_u64(&MyProc->waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) &now)); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(&MyProc->waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart)); Same as above. + /* +* Record waitStart reusing the deadlock timeout timer. +* +* It would be ideal this can be synchronously done with updating +* lock information. Howerver, since it gives performance impacts +* to hold partitionLock longer time, we do it here asynchronously. +*/ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(&MyProc->waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Regards, -- Atsushi Torikoshi
Re: adding wait_start column to pg_locks
On 2021-02-09 22:54, Fujii Masao wrote: On 2021/02/09 19:11, Fujii Masao wrote: On 2021/02/09 18:13, Fujii Masao wrote: On 2021/02/09 17:48, torikoshia wrote: On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(&MyProc->waitStart) == 0) + pg_atomic_write_u64(&MyProc->waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) &now)); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(&MyProc->waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart)); Same as above. + /* + * Record waitStart reusing the deadlock timeout timer. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(&MyProc->waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Thanks for the test! I pushed the patch. But I reverted the patch because buildfarm members rorqual and prion don't like the patch. I'm trying to investigate the cause of this failures. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10 -relation | locktype |mode --+--+- - test_prepared_1 | relation | RowExclusiveLock - test_prepared_1 | relation | AccessExclusiveLock -(2 rows) - +ERROR: invalid spinlock number: 0 "rorqual" reported that the above error happened in the server built with --disable-atomics --disable-spinlocks when reading pg_locks after the transaction was prepared. The cause of this issue is that "waitStart" atomic variable in the dummy proc created at the end of prepare transaction was not initialized. I updated the patch so that pg_atomic_init_u64() is called for the "waitStart" in the dummy proc for prepared transaction. Patch attached. I confirmed that the patched server built with --disable-atomics --disable-spinlocks passed all the regression tests. Thanks for fixing the bug, I also tested v9.patch configured with --disable-atomics --disable-spinlocks on my environment and confirmed that all tests have passed. BTW, while investigating this issue, I found that pg_stat_wal_receiver also could cause this error even in the current master (without the patch). I will report that in separate thread. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-02-09%2009%3A13%3A16 "prion" reported the following error. But I'm not sure how the changes of pg_locks caused this error. I found that Heikki also reported at [1] that "prion" failed with the same error but was not sure how it happened. This makes me think for now that this issue is not directly related to the pg_locks changes. Thanks! I was wondering how these errors were related to the commit. Regards, -- Atsushi Torikoshi - pg_dump: error: query failed: ERROR: missing chunk number 0 for toast value 1 in pg_toast_2619 pg_dump: error: query was: SELECT a.attnum, a.attname, a.atttypmod, a.attstattarget, a.attstorage, t.typstorage
Re: RFC: Logging plan of the running query
On 2021-11-26 12:39, torikoshia wrote: Since the patch could not be applied to the HEAD anymore, I also updated it. Updated the patch for fixing compiler warning about the format on windows. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom b8367e22d7a9898e4b85627ba8c203be273fc22f Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 7 Jan 2022 12:31:03 +0900 Subject: [PATCH v15] Add function to log the untruncated query string and its plan for the query currently running on the backend with the specified process ID. Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Since some codes, tests and comments of pg_log_query_plan() are the same with pg_log_backend_memory_contexts(), this patch also refactors them to make them common. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby --- doc/src/sgml/func.sgml | 45 +++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 117 ++- src/backend/executor/execMain.c | 10 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/ipc/signalfuncs.c| 55 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 7 ++ src/backend/utils/adt/mcxtfuncs.c| 36 +- src/backend/utils/init/globals.c | 1 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 3 + src/include/miscadmin.h | 1 + src/include/storage/lock.h | 2 - src/include/storage/procsignal.h | 1 + src/include/storage/signalfuncs.h| 22 src/include/tcop/pquery.h| 1 + src/test/regress/expected/misc_functions.out | 54 +++-- src/test/regress/sql/misc_functions.sql | 42 +-- 19 files changed, 355 insertions(+), 63 deletions(-) create mode 100644 src/include/storage/signalfuncs.h diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e58efce586..9804574c10 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25430,6 +25430,26 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID along with the untruncated +query string. +They will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -25543,6 +25563,31 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_query_plan can be used +to log the plan of a backend process. For example: + +postgres=# SELECT pg_log_query_plan(201116); + pg_log_query_plan +--- + t +(1 row) + +The format of the query plan is the same as when VERBOSE, +COSTS, SETTINGS and +FORMAT TEXT are used in the EXPLAIN +command. For example: + +LOG: plan of the query running on backend with PID 17793 is: +Query Text: SELECT * FROM pgbench_accounts; +Seq Scan on public.pgbench_accounts (cost=0.00..52787.00 rows=200 width=97) + Output: aid, bid, abalance, filler +Settings: work_mem = '1MB' + +Note that nested statements (statements executed inside a function) are not +considered for logging. Only the plan of the most deeply nested query is logged. + + diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index 3a4fa9091b..173e268be3 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC; REVOKE EXECUTE ON FUNCTION pg_ls_replslotdi
Re: RFC: Logging plan of the running query
On 2022-01-07 14:30, torikoshia wrote: Updated the patch for fixing compiler warning about the format on windows. I got another compiler warning, updated the patch again. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom b8367e22d7a9898e4b85627ba8c203be273fc22f Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 7 Jan 2022 19:38:29 +0900 Subject: [PATCH v16] Add function to log the untruncated query string and its plan for the query currently running on the backend with the specified process ID. Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Since some codes, tests and comments of pg_log_query_plan() are the same with pg_log_backend_memory_contexts(), this patch also refactors them to make them common. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby --- doc/src/sgml/func.sgml | 45 +++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 117 ++- src/backend/executor/execMain.c | 10 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/ipc/signalfuncs.c| 55 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 7 ++ src/backend/utils/adt/mcxtfuncs.c| 36 +- src/backend/utils/init/globals.c | 1 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 3 + src/include/miscadmin.h | 1 + src/include/storage/lock.h | 2 - src/include/storage/procsignal.h | 1 + src/include/storage/signalfuncs.h| 22 src/include/tcop/pquery.h| 1 + src/test/regress/expected/misc_functions.out | 54 +++-- src/test/regress/sql/misc_functions.sql | 42 +-- 19 files changed, 355 insertions(+), 63 deletions(-) create mode 100644 src/include/storage/signalfuncs.h diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e58efce586..9804574c10 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25430,6 +25430,26 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID along with the untruncated +query string. +They will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -25543,6 +25563,31 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_query_plan can be used +to log the plan of a backend process. For example: + +postgres=# SELECT pg_log_query_plan(201116); + pg_log_query_plan +--- + t +(1 row) + +The format of the query plan is the same as when VERBOSE, +COSTS, SETTINGS and +FORMAT TEXT are used in the EXPLAIN +command. For example: + +LOG: plan of the query running on backend with PID 17793 is: +Query Text: SELECT * FROM pgbench_accounts; +Seq Scan on public.pgbench_accounts (cost=0.00..52787.00 rows=200 width=97) + Output: aid, bid, abalance, filler +Settings: work_mem = '1MB' + +Note that nested statements (statements executed inside a function) are not +considered for logging. Only the plan of the most deeply nested query is logged. + + diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index 3a4fa9091b..173e268be3 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC; REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
Re: Should we improve "PID XXXX is not a PostgreSQL server process" warning for pg_terminate_backend(<>)?
On 2021-03-16 20:51, Bharath Rupireddy wrote: On Mon, Mar 15, 2021 at 11:23 AM torikoshia wrote: On 2021-03-07 19:16, Bharath Rupireddy wrote: > On Fri, Feb 5, 2021 at 5:15 PM Bharath Rupireddy > wrote: >> >> pg_terminate_backend and pg_cancel_backend with postmaster PID produce >> "PID is not a PostgresSQL server process" warning [1], which >> basically implies that the postmaster is not a PostgreSQL process at >> all. This is a bit misleading because the postmaster is the parent of >> all PostgreSQL processes. Should we improve the warning message if the >> given PID is postmasters' PID? +1. I felt it was a bit confusing when reviewing a thread[1]. Hmmm. > I'm attaching a small patch that emits a warning "signalling > postmaster with PID %d is not allowed" for postmaster and "signalling > PostgreSQL server process with PID %d is not allowed" for auxiliary > processes such as checkpointer, background writer, walwriter. > > However, for stats collector and sys logger processes, we still get > "PID X is not a PostgreSQL server process" warning because they > don't have PGPROC entries(??). So BackendPidGetProc and > AuxiliaryPidGetProc will not help and even pg_stat_activity is not > having these processes' pid. I also ran into the same problem while creating a patch in [2]. I have not gone through that thread though. Is there any way we can detect those child processes(stats collector, sys logger) that are forked by the postmaster from a backend process? Thoughts? I couldn't find good ways to do that, and thus I'm now wondering just changing the message. I'm now wondering if changing the message to something like "PID is not a PostgreSQL backend process". "backend process' is now defined as "Process of an instance which acts on behalf of a client session and handles its requests." in Appendix. Yeah, that looks good to me. IIUC, we can just change the message from "PID is not a PostgreSQL server process" to "PID is not a PostgreSQL backend process" and we don't need look for AuxiliaryProcs or PostmasterPid. Changing log messages can affect operations, especially when people monitor the log message strings, but improving "PID is not a PostgreSQL server process" does not seem to cause such problems. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: Get memory contexts of an arbitrary backend process
On 2021-03-05 14:22, Fujii Masao wrote: On 2021/03/04 18:32, torikoshia wrote: On 2021-01-14 19:11, torikoshia wrote: Since pg_get_target_backend_memory_contexts() waits to dump memory and it could lead dead lock as below. - session1 BEGIN; TRUNCATE t; - session2 BEGIN; TRUNCATE t; -- wait - session1 SELECT * FROM pg_get_target_backend_memory_contexts(); --wait Thanks for notifying me, Fujii-san. Attached v8 patch that prohibited calling the function inside transactions. Regrettably, this modification could not cope with the advisory lock and I haven't come up with a good way to deal with it. It seems to me that the architecture of the requestor waiting for the dumper leads to this problem and complicates things. Considering the discussion printing backtrace discussion[1], it seems reasonable that the requestor just sends a signal and dumper dumps to the log file. +1 Thanks! I remade the patch and introduced a function pg_print_backend_memory_contexts(PID) which prints the memory contexts of the specified PID to elog. =# SELECT pg_print_backend_memory_contexts(450855); ** log output ** 2021-03-17 15:21:01.942 JST [450855] LOG: Printing memory contexts of PID 450855 2021-03-17 15:21:01.942 JST [450855] LOG: level: 0 TopMemoryContext: 68720 total in 5 blocks; 16312 free (15 chunks); 52408 used 2021-03-17 15:21:01.942 JST [450855] LOG: level: 1 Prepared Queries: 65536 total in 4 blocks; 35088 free (14 chunks); 30448 used 2021-03-17 15:21:01.942 JST [450855] LOG: level: 1 pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used ..(snip).. 2021-03-17 15:21:01.942 JST [450855] LOG: level: 2 CachedPlanSource: 4096 total in 3 blocks; 680 free (0 chunks); 3416 used: PREPARE hoge_200 AS SELECT * FROM pgbench_accounts WHERE aid = 1... 2021-03-17 15:21:01.942 JST [450855] LOG: level: 3 CachedPlanQuery: 4096 total in 3 blocks; 464 free (0 chunks); 3632 used ..(snip).. 2021-03-17 15:21:01.945 JST [450855] LOG: level: 1 Timezones: 104128 total in 2 blocks; 2584 free (0 chunks); 101544 used 2021-03-17 15:21:01.945 JST [450855] LOG: level: 1 ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used 2021-03-17 15:21:01.945 JST [450855] LOG: Grand total: 2802080 bytes in 1399 blocks; 480568 free (178 chunks); 2321512 used As above, the output is almost the same as MemoryContextStatsPrint() except for the way of expression of the level. MemoryContextStatsPrint() uses indents, but pg_print_backend_memory_contexts() writes it as "level: %d". Since there was discussion about enlarging StringInfo may cause errors on OOM[1], this patch calls elog for each context. As with MemoryContextStatsPrint(), each context shows 100 children at most. I once thought it should be configurable, but something like pg_print_backend_memory_contexts(PID, num_children) needs to send the 'num_children' from requestor to dumper and it seems to require another infrastructure. Creating a new GUC for this seems overkill. If MemoryContextStatsPrint(), i.e. showing 100 children at most is enough, this hard limit may be acceptable. Only superusers can call pg_print_backend_memory_contexts(). I'm going to add documentation and regression tests. Any thoughts? [1] https://www.postgresql.org/message-id/CAMsr%2BYGh%2Bsso5N6Q%2BFmYHLWC%3DBPCzA%2B5GbhYZSGruj2d0c7Vvg%40mail.gmail.com "r_d/strengthen_perf/print_memcon.md" 110L, 5642C written Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONdiff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index c6a8d4611e..e116f4a1be 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -30,6 +30,7 @@ #include "storage/shmem.h" #include "storage/sinval.h" #include "tcop/tcopprot.h" +#include "utils/memutils.h" /* * The SIGUSR1 signal is multiplexed to support signaling multiple event @@ -440,6 +441,20 @@ HandleProcSignalBarrierInterrupt(void) /* latch will be set by procsignal_sigusr1_handler */ } +/* + * HandleProcSignalPrintMemoryContext + * + * Handle receipt of an interrupt indicating print memory context. + * Signal handler portion of interrupt handling. + */ +static void +HandleProcSignalPrintMemoryContext(void) +{ + InterruptPending = true; + PrintMemoryContextPending = true; + /* latch will be set by procsignal_sigusr1_handler */ +} + /* * Perform global barrier related interrupt checking. * @@ -580,6 +595,25 @@ ProcessProcSignalBarrier(void) ConditionVariableBroadcast(&MyProcSignalSlot->pss_barrierCV); } +/* + * ProcessPrintMemoryContextInterrupt + * The portion of print memory context interrupt handling that runs + * outside of the signal handler. + */ +void +ProcessPrintMemoryContextInterrupt(void) +{ + PrintMemoryContextPending = false; +
Re: Get memory contexts of an arbitrary backend process
On 2021-03-18 15:09, Fujii Masao wrote: Thanks for your comments! On 2021/03/17 22:24, torikoshia wrote: I remade the patch and introduced a function pg_print_backend_memory_contexts(PID) which prints the memory contexts of the specified PID to elog. Thanks for the patch! =# SELECT pg_print_backend_memory_contexts(450855); ** log output ** 2021-03-17 15:21:01.942 JST [450855] LOG: Printing memory contexts of PID 450855 2021-03-17 15:21:01.942 JST [450855] LOG: level: 0 TopMemoryContext: 68720 total in 5 blocks; 16312 free (15 chunks); 52408 used 2021-03-17 15:21:01.942 JST [450855] LOG: level: 1 Prepared Queries: 65536 total in 4 blocks; 35088 free (14 chunks); 30448 used 2021-03-17 15:21:01.942 JST [450855] LOG: level: 1 pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used ..(snip).. 2021-03-17 15:21:01.942 JST [450855] LOG: level: 2 CachedPlanSource: 4096 total in 3 blocks; 680 free (0 chunks); 3416 used: PREPARE hoge_200 AS SELECT * FROM pgbench_accounts WHERE aid = 1... 2021-03-17 15:21:01.942 JST [450855] LOG: level: 3 CachedPlanQuery: 4096 total in 3 blocks; 464 free (0 chunks); 3632 used ..(snip).. 2021-03-17 15:21:01.945 JST [450855] LOG: level: 1 Timezones: 104128 total in 2 blocks; 2584 free (0 chunks); 101544 used 2021-03-17 15:21:01.945 JST [450855] LOG: level: 1 ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used 2021-03-17 15:21:01.945 JST [450855] LOG: Grand total: 2802080 bytes in 1399 blocks; 480568 free (178 chunks); 2321512 used As above, the output is almost the same as MemoryContextStatsPrint() except for the way of expression of the level. MemoryContextStatsPrint() uses indents, but pg_print_backend_memory_contexts() writes it as "level: %d". This format looks better to me. Since there was discussion about enlarging StringInfo may cause errors on OOM[1], this patch calls elog for each context. As with MemoryContextStatsPrint(), each context shows 100 children at most. I once thought it should be configurable, but something like pg_print_backend_memory_contexts(PID, num_children) needs to send the 'num_children' from requestor to dumper and it seems to require another infrastructure. Creating a new GUC for this seems overkill. If MemoryContextStatsPrint(), i.e. showing 100 children at most is enough, this hard limit may be acceptable. Can't this number be passed via shared memory? The attached patch uses static shared memory to pass the number. As documented, the current implementation allows that when multiple pg_print_backend_memory_contexts() called in succession or simultaneously, max_children can be the one of another pg_print_backend_memory_contexts(). I had tried to avoid this by adding some state information and using before_shmem_exit() in case of process termination for cleaning up the state information as in the patch I presented earlier, but since kill() returns success before the dumper called signal handler, it seemed there were times when we couldn't clean up the state. Since this happens only when multiple pg_print_backend_memory_contexts() are called and their specified number of children are different, and the effect is just the not intended number of children to print, it might be acceptable. Or it might be better to wait for some seconds if num_chilren on shared memory is not the initialized value(meaning some other process is requesting to print memory contexts). Only superusers can call pg_print_backend_memory_contexts(). + /* Only allow superusers to signal superuser-owned backends. */ + if (superuser_arg(proc->roleId) && !superuser()) The patch seems to allow even non-superuser to request to print the memory contexts if the target backend is owned by non-superuser. Is this intentional? I think that only superuser should be allowed to execute pg_print_backend_memory_contexts() whoever owns the target backend. Because that function can cause lots of log messages. Thanks, it's not intentional, modified it. I'm going to add documentation and regression tests. Added them. Regards,diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 9492a3c6b9..e834b923e4 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24781,6 +24781,33 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_print_backend_memory_contexts + +pg_print_backend_memory_contexts ( + pid integer, + max_children integer ) +boolean + + +Prints the memory contexts whose backend process has the specified +process ID. +max_children limits the max number of children +to print per one parent context. +Note that when multiple +pg_print_
Re: Is it useful to record whether plans are generic or custom?
On 2021-03-05 17:47, Fujii Masao wrote: Thanks for your comments! I just tried this feature. When I set plan_cache_mode to force_generic_plan and executed the following queries, I found that pg_stat_statements.generic_calls and pg_prepared_statements.generic_plans were not the same. Is this behavior expected? I was thinking that they are basically the same. It's not expected behavior, fixed. DEALLOCATE ALL; SELECT pg_stat_statements_reset(); PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1; EXECUTE hoge(1); EXECUTE hoge(1); EXECUTE hoge(1); SELECT generic_plans, statement FROM pg_prepared_statements WHERE statement LIKE '%hoge%'; generic_plans | statement ---+ 3 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1; SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query LIKE '%hoge%'; calls | generic_calls | query ---+---+--- 3 | 2 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1 When I executed the prepared statements via EXPLAIN ANALYZE, I found pg_stat_statements.generic_calls was not incremented. Is this behavior expected? Or we should count generic_calls even when executing the queries via ProcessUtility()? I think prepared statements via EXPLAIN ANALYZE also should be counted for consistency with pg_prepared_statements. Since ActivePortal did not keep the plan type in the ProcessUtility_hook, I moved the global variables 'is_plan_type_generic' and 'is_prev_plan_type_generic' from pg_stat_statements to plancache.c. DEALLOCATE ALL; SELECT pg_stat_statements_reset(); PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1; EXPLAIN ANALYZE EXECUTE hoge(1); EXPLAIN ANALYZE EXECUTE hoge(1); EXPLAIN ANALYZE EXECUTE hoge(1); SELECT generic_plans, statement FROM pg_prepared_statements WHERE statement LIKE '%hoge%'; generic_plans | statement ---+ 3 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1; SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query LIKE '%hoge%'; calls | generic_calls | query ---+---+--- 3 | 0 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1 3 | 0 | EXPLAIN ANALYZE EXECUTE hoge(1) Regards,diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index 16158525ca..887c4b2be8 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -251,6 +251,72 @@ FROM pg_stat_statements ORDER BY query COLLATE "C"; UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 |3 | t | t | t (7 rows) +-- +-- Track the number of generic plan +-- +CREATE TABLE pgss_test (i int, j int, k int); +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +SET plan_cache_mode TO force_generic_plan; +SET pg_stat_statements.track_utility = TRUE; +PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1; +EXECUTE pgss_p1(1); + i +--- +(0 rows) + +-- EXPLAIN ANALZE should be recorded +PREPARE pgss_p2 AS SELECT j FROM pgss_test WHERE j = $1; +EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE pgss_p2(1); + QUERY PLAN +--- + Seq Scan on pgss_test (actual rows=0 loops=1) + Filter: (j = $1) +(2 rows) + +-- Nested Portal +PREPARE pgss_p3 AS SELECT k FROM pgss_test WHERE k = $1; +BEGIN; +DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements; +FETCH IN pgss_c1; + name +- + pgss_p2 +(1 row) + +EXECUTE pgss_p3(1); + k +--- +(0 rows) + +FETCH IN pgss_c1; + name +- + pgss_p1 +(1 row) + +COMMIT; +SELECT calls, generic_calls, query FROM pg_stat_statements; + calls | generic_calls | query +---+---+-- + 1 | 0 | DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements + 0 | 0 | SELECT calls, generic_calls, query FROM pg_stat_statements + 1 | 1 | PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1 + 2 | 0 | FETCH IN pgss_c1 + 1 | 0 | BEGIN + 1 | 0 | SELECT pg_stat_statements_reset() + 1 | 1 | EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE p
Re: Get memory contexts of an arbitrary backend process
On 2021-03-23 17:24, Kyotaro Horiguchi wrote: Thanks for reviewing and suggestions! At Mon, 22 Mar 2021 15:09:58 +0900, torikoshia wrote in >> If MemoryContextStatsPrint(), i.e. showing 100 children at most is >> enough, this hard limit may be acceptable. > Can't this number be passed via shared memory? The attached patch uses static shared memory to pass the number. "pg_print_backend_memory_contexts" That name looks like as if it returns the result as text when used on command-line. We could have pg_get_backend_memory_context(bool dump_to_log (or where to dump), int limit). Or couldn't we name it differently even in the ase we add a separate function? Redefined pg_get_backend_memory_contexts() as pg_get_backend_memory_contexts(pid, int max_children). When pid equals 0, pg_get_backend_memory_contexts() prints local memory contexts as original pg_get_backend_memory_contexts() does. In this case, 'max_children' is ignored. When 'pid' does not equal 0 and it is the PID of the client backend, memory contexts are logged through elog(). +/* + * MaxChildrenPerContext + * Max number of children to print per one parent context. + */ +int *MaxChildrenPerContext = NULL; Perhaps it'd be better to have a struct even if it consists only of one member. (Aligned) C-int values are atomic so we can omit the McxtPrintLock. (I don't think it's a problem even if it is modifed while reading^^:) Fixed them. + if(max_children <= 0) + { + ereport(WARNING, + (errmsg("%d is invalid value", max_children), + errhint("second parameter is the number of context and it must be set to a value greater than or equal to 1"))); It's annoying to choose a number large enough when I want to dump children unlimitedly. Couldn't we use 0 to specify "unlimited"? Modified as you suggested. + (errmsg("%d is invalid value", max_children), + errhint("second parameter is the number of context and it must be set to a value greater than or equal to 1"))); For the main message, (I think) we usually spell the "%d is invalid value" as "maximum number of children must be positive" or such. For the hint, we don't need a copy of the primary section of the documentation here. Modified it to "The maximum number of children must be greater than 0". I think we should ERROR out for invalid parameters, at least for max_children. I'm not sure about pid since we might call it based on pg_stat_activity.. Changed to ERROR out when the 'max_children' is less than 0. Regarding pid, I left it untouched considering the consistency with other signal sending functions such as pg_cancel_backend(). + if(!SendProcSignal(pid, PROCSIG_PRINT_MEMORY_CONTEXT, InvalidBackendId)) We know the backendid of the process here. Added it. + if (is_dst_stderr) + { + for (i = 0; i <= level; i++) + fprintf(stderr, " "); The fprintf path is used nowhere in the patch at all. It can be used while attaching debugger but I'm not sure we need that code. The footprint of this patch is largely shrinked by removing it. According to the past discussion[1], people wanted MemoryContextStats as it was, so I think it's better that MemoryContextStats can be used as before. + strcat(truncated_ident, delimiter); strcpy is sufficient here. And we don't need the delimiter to be a variable. (we can copy a string literal into truncate_ident, then count the length of truncate_ident, instead of the delimiter variable.) True. + $current_logfiles = slurp_file($node->data_dir . '/current_logfiles'); ... +my $lfname = $current_logfiles; +$lfname =~ s/^stderr //; +chomp $lfname; $node->logfile is the current log file name. + 'target PID is not PostgreSQL server process'); Maybe "check if PID check is working" or such? And, we can do something like the following to exercise in a more practical way. select pg_print_backend...(pid,) from pg_stat_activity where backend_type = 'checkpointer'; It seems better. As documented, the current implementation allows that when multiple pg_print_backend_memory_contexts() called in succession or simultaneously, max_children can be the one of another pg_print_backend_memory_contexts(). I had tried to avoid this by adding some state information and using before_shmem_exit() in case of process termination for cleaning up the state information as in the patch I presented earlier, but since kill() returns success before the dumper called signal handler, it seemed there were time
Re: Get memory contexts of an arbitrary backend process
On 2021-03-25 22:02, Fujii Masao wrote: On 2021/03/25 0:17, torikoshia wrote: On 2021-03-23 17:24, Kyotaro Horiguchi wrote: Thanks for reviewing and suggestions! The patched version failed to be compiled as follows. Could you fix this issue? Sorry, it included a header file that's not contained in the current version patch. Attached new one. mcxtfuncs.c:22:10: fatal error: utils/mcxtfuncs.h: No such file or directory #include "utils/mcxtfuncs.h" ^~~ compilation terminated. make[4]: *** [: mcxtfuncs.o] Error 1 make[4]: *** Waiting for unfinished jobs make[3]: *** [../../../src/backend/common.mk:39: adt-recursive] Error 2 make[3]: *** Waiting for unfinished jobs make[2]: *** [common.mk:39: utils-recursive] Error 2 make[1]: *** [Makefile:42: all-backend-recurse] Error 2 make: *** [GNUmakefile:11: all-src-recurse] Error 2 https://cirrus-ci.com/task/4621477321375744 Regards, diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 1d3429fbd9..a4017a0760 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24821,6 +24821,37 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_get_backend_memory_contexts + +pg_get_backend_memory_contexts ( + pid integer, + max_children integer ) +setof record + + +Get memory contexts whose backend process has the specified process ID. +max_children limits the max number of children +to print per one parent context. 0 means unlimited. +When pid equals 0, +pg_get_backend_memory_contexts displays all +the memory contexts of the local process regardless of +max_children. +When pid does not equal 0, +memory contexts will be printed based on the log configuration set. +See for more information. +Only superusers can call this function even when the specified process +is non-superuser backend. +Note that when multiple +pg_get_backend_memory_contexts called in +succession or simultaneously, max_children can +be the one of another +pg_get_backend_memory_contexts. + + + diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 0dca65dc7b..48a1a0e958 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC; REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC; CREATE VIEW pg_backend_memory_contexts AS -SELECT * FROM pg_get_backend_memory_contexts(); +SELECT * FROM pg_get_backend_memory_contexts(0, 0); REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC; -REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC; +REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts FROM PUBLIC; -- Statistics views diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 3e4ec53a97..ed5393324a 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -46,6 +46,7 @@ #include "storage/sinvaladt.h" #include "storage/spin.h" #include "utils/snapmgr.h" +#include "utils/memutils.h" /* GUCs */ int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE; @@ -269,6 +270,7 @@ CreateSharedMemoryAndSemaphores(void) BTreeShmemInit(); SyncScanShmemInit(); AsyncShmemInit(); + McxtLogShmemInit(); #ifdef EXEC_BACKEND diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index c6a8d4611e..c61d5079e2 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -30,6 +30,7 @@ #include "storage/shmem.h" #include "storage/sinval.h" #include "tcop/tcopprot.h" +#include "utils/memutils.h" /* * The SIGUSR1 signal is multiplexed to support signaling multiple event @@ -440,6 +441,20 @@ HandleProcSignalBarrierInterrupt(void) /* latch will be set by procsignal_sigusr1_handler */ } +/* + * HandleProcSignalLogMemoryContext + * + * Handle receipt of an interrupt indicating logging memory context. + * Signal handler portion of interrupt handling. + */ +static void +HandleProcSignalLogMemoryContext(void) +{ + InterruptPending = true; + LogMemoryContextPending = true; + /* latch will be set by procsignal_sigusr1_handler */ +} + /* * Perform global barrier related interrupt checking. * @@ -580,6 +595,27 @@ ProcessProcSignalBarrier(void) ConditionVariableBroadcast(&MyProcSignalSlot->pss_barrierCV); } +/* + * ProcessLogMemoryContextInterrupt + * The portion of logging memory context interrupt handling that runs + * outside of the signal handler. + */ +void +ProcessLogMemoryContextInterrupt(void) +{ + int max_childre
Re: Is it useful to record whether plans are generic or custom?
On 2021-03-25 22:14, Fujii Masao wrote: On 2021/03/23 16:32, torikoshia wrote: On 2021-03-05 17:47, Fujii Masao wrote: Thanks for your comments! Thanks for updating the patch! PostgreSQL Patch Tester reported that the patched version failed to be compiled at Windows. Could you fix this issue? https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.131238 It seems PGDLLIMPORT was necessary.. Attached a new one. Regards.diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index 16158525ca..887c4b2be8 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -251,6 +251,72 @@ FROM pg_stat_statements ORDER BY query COLLATE "C"; UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 |3 | t | t | t (7 rows) +-- +-- Track the number of generic plan +-- +CREATE TABLE pgss_test (i int, j int, k int); +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +SET plan_cache_mode TO force_generic_plan; +SET pg_stat_statements.track_utility = TRUE; +PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1; +EXECUTE pgss_p1(1); + i +--- +(0 rows) + +-- EXPLAIN ANALZE should be recorded +PREPARE pgss_p2 AS SELECT j FROM pgss_test WHERE j = $1; +EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE pgss_p2(1); + QUERY PLAN +--- + Seq Scan on pgss_test (actual rows=0 loops=1) + Filter: (j = $1) +(2 rows) + +-- Nested Portal +PREPARE pgss_p3 AS SELECT k FROM pgss_test WHERE k = $1; +BEGIN; +DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements; +FETCH IN pgss_c1; + name +- + pgss_p2 +(1 row) + +EXECUTE pgss_p3(1); + k +--- +(0 rows) + +FETCH IN pgss_c1; + name +- + pgss_p1 +(1 row) + +COMMIT; +SELECT calls, generic_calls, query FROM pg_stat_statements; + calls | generic_calls | query +---+---+-- + 1 | 0 | DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements + 0 | 0 | SELECT calls, generic_calls, query FROM pg_stat_statements + 1 | 1 | PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1 + 2 | 0 | FETCH IN pgss_c1 + 1 | 0 | BEGIN + 1 | 0 | SELECT pg_stat_statements_reset() + 1 | 1 | EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE pgss_p2(1) + 1 | 0 | COMMIT + 1 | 1 | PREPARE pgss_p3 AS SELECT k FROM pgss_test WHERE k = $1 +(9 rows) + +SET pg_stat_statements.track_utility = FALSE; +DEALLOCATE ALL; +DROP TABLE pgss_test; -- -- pg_stat_statements.track = none -- diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql index 0f63f08f7e..7fdef315ae 100644 --- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql +++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql @@ -44,7 +44,8 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean, OUT blk_write_time float8, OUT wal_records int8, OUT wal_fpi int8, -OUT wal_bytes numeric +OUT wal_bytes numeric, +OUT generic_calls int8 ) RETURNS SETOF record AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8' diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 62cccbfa44..b14919c989 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -81,6 +81,7 @@ #include "utils/acl.h" #include "utils/builtins.h" #include "utils/memutils.h" +#include "utils/plancache.h" #include "utils/timestamp.h" PG_MODULE_MAGIC; @@ -192,6 +193,7 @@ typedef struct Counters int64 wal_records; /* # of WAL records generated */ int64 wal_fpi; /* # of WAL full page images generated */ uint64 wal_bytes; /* total amount of WAL generated in bytes */ + int64 generic_calls; /* # of times generic plans executed */ } Counters; /* @@ -1446,6 +1448,10 @@ pgss_store(const char *query, uint64 queryId, if (e->counters.max_time[kind] < total_time) e->counters.max_time[kind] = total_time; } + + if (kind == PGSS_EXEC && is_plan_type_generic) + e->counters.generic_calls += 1; + e->counters.rows += rows; e->counters.shared_blks_hit += bufusage->shared_blks_hit; e->counters.shared_blks_read += bufusage->shared_blks_read; @@ -1510,8 +1516,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS) #de
Re: Get memory contexts of an arbitrary backend process
On 2021-03-26 14:08, Kyotaro Horiguchi wrote: At Fri, 26 Mar 2021 14:02:49 +0900, Fujii Masao wrote in On 2021/03/26 13:28, Kyotaro Horiguchi wrote: >> "some contexts are omitted" >> "n child contexts: total_bytes = ..." > Sorry I missed that is already implemented. So my opnion is I agree > with limiting with a fixed-number, and preferablly sorted in > descending order of... totalspace/nblocks? This may be an improvement, but makes us modify MemoryContextStatsInternal() very much. I'm afraid that it's too late to do that at this stage... What about leaving the output order as it is at the first version? So I said "preferably":p (with a misspelling...) I'm fine with that. regards. Thanks for the comments! Attached a new patch. It adds pg_log_backend_memory_contexts(pid) which logs memory contexts of the specified backend process. The number of child contexts to be logged per parent is limited to 100 as with MemoryContextStats(). As written in commit 7b5ef8f2d07, which limits the verbosity of memory context statistics dumps, it supposes that practical cases where the dump gets long will typically be huge numbers of siblings under the same parent context; while the additional debugging value from seeing details about individual siblings beyond 100 will not be large. Thoughts? Regards.From e5ab553c1e5b7fa53c51e0e4fa4472bdaeced4e1 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 29 Mar 2021 09:30:23 +0900 Subject: [PATCH] After commit 3e98c0bafb28de, we can display the usage of memory contexts using pg_backend_memory_contexts system view. However, its target process is limited to the backend which is showing the view. This patch introduces pg_log_backend_memory_contexts(pid) which logs memory contexts of the specified backend process. Currently the number of child contexts to be logged per parent is limited to 100. As with MemoryContextStats(), it supposes that practical cases where the dump gets long will typically be huge numbers of siblings under the same parent context; while the additional debugging value from seeing details about individual siblings beyond 100 will not be large. --- doc/src/sgml/func.sgml| 20 +++ src/backend/storage/ipc/procsignal.c | 37 src/backend/tcop/postgres.c | 3 + src/backend/utils/adt/mcxtfuncs.c | 2 +- src/backend/utils/init/globals.c | 1 + src/backend/utils/mmgr/aset.c | 8 +- src/backend/utils/mmgr/generation.c | 8 +- src/backend/utils/mmgr/mcxt.c | 164 ++ src/backend/utils/mmgr/slab.c | 9 +- src/include/catalog/pg_proc.dat | 6 + src/include/miscadmin.h | 1 + src/include/nodes/memnodes.h | 6 +- src/include/storage/procsignal.h | 3 + src/include/utils/memutils.h | 3 +- .../t/002_log_memory_context_validation.pl| 31 15 files changed, 257 insertions(+), 45 deletions(-) create mode 100644 src/test/modules/test_misc/t/002_log_memory_context_validation.pl diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 19285ae136..7a80607366 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24871,6 +24871,26 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_backend_memory_contexts + +pg_log_backend_memory_contexts ( pid integer ) +boolean + + +Log the memory contexts whose backend process has the specified +process ID. +Memory contexts will be printed based on the log configuration set. +See for more information. +The number of child contexts per parent is limited to 100. +For contexts with more than 100 children, summary will be shown. +Only superusers can log the memory contexts even when the specified +process is non-superuser backend. + + + diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index c6a8d4611e..550aa2ffea 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -30,6 +30,7 @@ #include "storage/shmem.h" #include "storage/sinval.h" #include "tcop/tcopprot.h" +#include "utils/memutils.h" /* * The SIGUSR1 signal is multiplexed to support signaling multiple event @@ -440,6 +441,20 @@ HandleProcSignalBarrierInterrupt(void) /* latch will be set by procsignal_sigusr1_handler */ } +/* + * HandleProcSignalLogMemoryContext + * + * Handle receipt of an interrupt indicating log memory context. + * Signal handler portion of interrupt handling. + */ +static void +HandleProcSignalLogMemoryContext(void) +{ + InterruptPending = true; + LogMemoryContextPending = true; + /* latch will be set by procsignal_sigusr1_handler */ +} + /* * Perform global ba
Re: Get memory contexts of an arbitrary backend process
On 2021-03-30 02:28, Fujii Masao wrote: Thanks for reviewing and kind suggestions! It adds pg_log_backend_memory_contexts(pid) which logs memory contexts of the specified backend process. The number of child contexts to be logged per parent is limited to 100 as with MemoryContextStats(). As written in commit 7b5ef8f2d07, which limits the verbosity of memory context statistics dumps, it supposes that practical cases where the dump gets long will typically be huge numbers of siblings under the same parent context; while the additional debugging value from seeing details about individual siblings beyond 100 will not be large. Thoughts? I'm OK with 100. We should comment why we chose 100 for that. Added following comments. + /* +* When a backend process is consuming huge memory, logging all its +* memory contexts might overrun available disk space. To prevent +* this, we limit the number of child contexts per parent to 100. +* +* As with MemoryContextStats(), we suppose that practical cases +* where the dump gets long will typically be huge numbers of +* siblings under the same parent context; while the additional +* debugging value from seeing details about individual siblings +* beyond 100 will not be large. +*/ + MemoryContextStatsDetail(TopMemoryContext, 100, false); Here are some review comments. Isn't it better to move HandleProcSignalLogMemoryContext() and ProcessLogMemoryContextInterrupt() to mcxt.c from procsignal.c (like the functions for notify interrupt are defined in async.c) because they are the functions for memory contexts? Agreed. Also renamed HandleProcSignalLogMemoryContext to HandleLogMemoryContextInterrupt. + * HandleProcSignalLogMemoryContext + * + * Handle receipt of an interrupt indicating log memory context. + * Signal handler portion of interrupt handling. IMO it's better to comment why we need to separate the function into two, i.e., HandleProcSignalLogMemoryContext() and ProcessLogMemoryContextInterrupt(), like the comment for other similar function explains. What about the followings? Thanks! Changed them to the suggested one. --- HandleLogMemoryContextInterrupt Handle receipt of an interrupt indicating logging of memory contexts. All the actual work is deferred to ProcessLogMemoryContextInterrupt(), because we cannot safely emit a log message inside the signal handler. --- ProcessLogMemoryContextInterrupt Perform logging of memory contexts of this backend process. Any backend that participates in ProcSignal signaling must arrange to call this function if we see LogMemoryContextPending set. It is called from CHECK_FOR_INTERRUPTS(), which is enough because the target process for logging of memory contexts is a backend. --- + if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT)) + HandleProcSignalLogMemoryContext(); + if (CheckProcSignal(PROCSIG_BARRIER)) HandleProcSignalBarrierInterrupt(); The code for memory context logging interrupt came after barrier interrupt in other places, e.g., procsignal.h. Why is this order of code different? Fixed. +/* + * pg_log_backend_memory_contexts + * Print memory context of the specified backend process. Isn't it better to move pg_log_backend_memory_contexts() to mcxtfuncs.c from mcxt.c because this is the SQL function memory contexts? Agreed. IMO we should comment why we allow only superuser to call this function. What about the following? Thanks! Modified the patch according to the suggestions. - Signal a backend process to log its memory contexts. Only superusers are allowed to signal to log the memory contexts because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. - + PGPROC *proc = BackendPidGetProc(pid); + + /* Check whether the target process is PostgreSQL backend process. */ + if (proc == NULL) What about adding more comments as follows? - + /* + * BackendPidGetProc returns NULL if the pid isn't valid; but by the time + * we reach kill(), a process for which we get a valid proc here might + * have terminated on its own. There's no way to acquire a lock on an + * arbitrary process to prevent that. But since this mechanism is usually +* used to debug a backend running and consuming lots of memory, +* that it might end on its own first and its memory contexts are not +* logged is not a problem. +*/ + if (proc == NULL) + { + /* +* This is just a warning so a loop-through-resultset will not abort +* if one backend logged its memory contexts during the run. +*/ +
Re: Get memory contexts of an arbitrary backend process
On 2021-03-31 04:36, Fujii Masao wrote: On 2021/03/30 22:06, torikoshia wrote: Modified the patch according to the suggestions. Thanks for updating the patch! I applied the cosmetic changes to the patch and added the example of the function call into the document. Attached is the updated version of the patch. Could you check this version? Thanks a lot! +The memory contexts will be logged in the log file. For example: When 'log_destination = stderr' and 'logging_collector = off', it does not log in the file but in the stderr. Description like below would be a bit more accurate but I'm wondering it's repeating the same words. + The memory contexts will be logged based on the log configuration set. For example: How do you think? + +postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid()); + pg_log_backend_memory_contexts + + t +(1 row) + +The memory contexts will be logged in the log file. For example: +LOG: logging memory contexts of PID 10377 +STATEMENT: SELECT pg_log_backend_memory_contexts(pg_backend_pid()); +LOG: level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free (5 chunks); 66368 used +LOG: level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used The line "The memory contexts will be logged in the log file. For example:" is neither nor SQL command and its outputs, it might be better to differentiate it. What about the following like attached patch? + +postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid()); + pg_log_backend_memory_contexts + + t +(1 row) + +The memory contexts will be logged in the log file. For example: + +LOG: logging memory contexts of PID 10377 +STATEMENT: SELECT pg_log_backend_memory_contexts(pg_backend_pid()); +LOG: level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free (5 chunks); 66368 used +LOG: level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used ...(snip)... +LOG: level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used +LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 used + Regards.diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index fbf6062d0a..ce01d51b21 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24917,6 +24917,23 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_backend_memory_contexts + +pg_log_backend_memory_contexts ( pid integer ) +boolean + + +Logs the memory contexts whose backend process has the specified +process ID. +Memory contexts will be logged based on the log configuration set. +See for more information. +Only superusers can log the memory contexts. + + + @@ -24987,6 +25004,36 @@ SELECT collation for ('foo' COLLATE "de_DE"); pg_stat_activity view. + +pg_log_backend_memory_contexts can be used +to log the memory contexts of the backend process. For example, + +postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid()); + pg_log_backend_memory_contexts + + t +(1 row) + +The memory contexts will be logged in the log file. For example: + +LOG: logging memory contexts of PID 10377 +STATEMENT: SELECT pg_log_backend_memory_contexts(pg_backend_pid()); +LOG: level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free (5 chunks); 66368 used +LOG: level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used +LOG: level: 1; TopTransactionContext: 8192 total in 1 blocks; 7720 free (1 chunks); 472 used +LOG: level: 1; RowDescriptionContext: 8192 total in 1 blocks; 6880 free (0 chunks); 1312 used +LOG: level: 1; MessageContext: 16384 total in 2 blocks; 5152 free (0 chunks); 11232 used +LOG: level: 1; Operator class cache: 8192 total in 1 blocks; 512 free (0 chunks); 7680 used +LOG: level: 1; smgr relation table: 16384 total in 2 blocks; 4544 free (3 chunks); 11840 used +LOG: level: 1; TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used +... +LOG: level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used +LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 used + +For more than 100 child contexts under the same parent one, +100 child contexts and a summary of the remaining ones will be logged. + + diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index c6a8d4611e..eac6895141 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -30,6 +30,7 @@ #include "stora
Re: Get memory contexts of an arbitrary backend process
On 2021-04-01 19:13, Fujii Masao wrote: On 2021/03/31 15:16, Kyotaro Horiguchi wrote: + The memory contexts will be logged based on the log configuration set. For example: How do you think? How about "The memory contexts will be logged in the server log" ? I think "server log" doesn't suggest any concrete target. Or just using "logged" is enough? Also I'd like to document that one message for each memory context is logged. So what about the following? One message for each memory context will be logged. For example, Agreed. BTW, there was a conflict since c30f54ad732(Detect POLLHUP/POLLRDHUP while running queries), attached v9. Regards,diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 3cf243a16a..a20be435ca 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24913,6 +24913,23 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_backend_memory_contexts + +pg_log_backend_memory_contexts ( pid integer ) +boolean + + +Logs the memory contexts whose backend process has the specified +process ID. +Memory contexts will be logged based on the log configuration set. +See for more information. +Only superusers can log the memory contexts. + + + @@ -24983,6 +25000,36 @@ SELECT collation for ('foo' COLLATE "de_DE"); pg_stat_activity view. + +pg_log_backend_memory_contexts can be used +to log the memory contexts of the backend process. For example, + +postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid()); + pg_log_backend_memory_contexts + + t +(1 row) + +One message for each memory context will be logged. For example: + +LOG: logging memory contexts of PID 10377 +STATEMENT: SELECT pg_log_backend_memory_contexts(pg_backend_pid()); +LOG: level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free (5 chunks); 66368 used +LOG: level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used +LOG: level: 1; TopTransactionContext: 8192 total in 1 blocks; 7720 free (1 chunks); 472 used +LOG: level: 1; RowDescriptionContext: 8192 total in 1 blocks; 6880 free (0 chunks); 1312 used +LOG: level: 1; MessageContext: 16384 total in 2 blocks; 5152 free (0 chunks); 11232 used +LOG: level: 1; Operator class cache: 8192 total in 1 blocks; 512 free (0 chunks); 7680 used +LOG: level: 1; smgr relation table: 16384 total in 2 blocks; 4544 free (3 chunks); 11840 used +LOG: level: 1; TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used +... +LOG: level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used +LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 used + +For more than 100 child contexts under the same parent one, +100 child contexts and a summary of the remaining ones will be logged. + + diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index c6a8d4611e..eac6895141 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -30,6 +30,7 @@ #include "storage/shmem.h" #include "storage/sinval.h" #include "tcop/tcopprot.h" +#include "utils/memutils.h" /* * The SIGUSR1 signal is multiplexed to support signaling multiple event @@ -657,6 +658,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS) if (CheckProcSignal(PROCSIG_BARRIER)) HandleProcSignalBarrierInterrupt(); + if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT)) + HandleLogMemoryContextInterrupt(); + if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE)) RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE); diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index ad351e2fd1..330ec5b028 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -3327,6 +3327,9 @@ ProcessInterrupts(void) if (ParallelMessagePending) HandleParallelMessages(); + + if (LogMemoryContextPending) + ProcessLogMemoryContextInterrupt(); } diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c index c02fa47550..fe9b7979e2 100644 --- a/src/backend/utils/adt/mcxtfuncs.c +++ b/src/backend/utils/adt/mcxtfuncs.c @@ -18,6 +18,8 @@ #include "funcapi.h" #include "miscadmin.h" #include "mb/pg_wchar.h" +#include "storage/proc.h" +#include "storage/procarray.h" #include "utils/builtins.h" /* -- @@ -61,7 +63,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore, /* Examine the context itself */ memset(&stat, 0, sizeof(stat)); - (*context->methods->stats) (context, NULL, (void *) &level, &stat); + (*context->methods->stats) (context, NULL, (void *) &level, &stat, true); memset(values, 0, sizeof(values)); memset(nulls, 0, sizeof(nulls)); @@ -155,3 +157,59 @@ pg_
Re: Is it useful to record whether plans are generic or custom?
On 2021-03-26 17:46, Fujii Masao wrote: On 2021/03/26 0:33, torikoshia wrote: On 2021-03-25 22:14, Fujii Masao wrote: On 2021/03/23 16:32, torikoshia wrote: On 2021-03-05 17:47, Fujii Masao wrote: Thanks for your comments! Thanks for updating the patch! PostgreSQL Patch Tester reported that the patched version failed to be compiled at Windows. Could you fix this issue? https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.131238 It seems PGDLLIMPORT was necessary.. Attached a new one. Thanks for updating the patch! In my test, generic_calls for a utility command was not incremented before PL/pgSQL function was executed. Maybe this is expected behavior. But it was incremented after the function was executed. Is this a bug? Please see the following example. Thanks for reviewing! It's a bug and regrettably it seems difficult to fix it during this commitfest. Marked the patch as "Withdrawn". Regards,
Re: Get memory contexts of an arbitrary backend process
On 2021-04-05 12:59, Fujii Masao wrote: On 2021/04/05 12:20, Zhihong Yu wrote: Thanks for reviewing! + * On receipt of this signal, a backend sets the flag in the signal + * handler, and then which causes the next CHECK_FOR_INTERRUPTS() I think the 'and then' is not needed: Although I wonder either would be fine, removed the words. + * This is just a warning so a loop-through-resultset will not abort + * if one backend logged its memory contexts during the run. The pid given by arg 0 is not a PostgreSQL server process. Which other backend could it be ? This is the comment that I added wrongly. So the comment should be "This is just a warning so a loop-through-resultset will not abort if one backend terminated on its own during the run.", like pg_signal_backend(). Thought? +1. Attached v10 patch. Regards,From 8931099cbf3d6e6ef24150496cb795413785f808 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 5 Apr 2021 20:40:12 +0900 Subject: [PATCH v10] After commit 3e98c0bafb28de, we can display the usage of memory contexts using pg_backend_memory_contexts system view. However, its target process is limited to the backend which is showing the view. This patch introduces pg_log_backend_memory_contexts(pid) which logs memory contexts of the specified backend process. Currently the number of child contexts to be logged per parent is limited to 100. As with MemoryContextStats(), it supposes that practical cases where the dump gets long will typically be huge numbers of siblings under the same parent context; while the additional debugging value from seeing details about individual siblings beyond 100 will not be large. --- doc/src/sgml/func.sgml | 47 + src/backend/storage/ipc/procsignal.c | 4 + src/backend/tcop/postgres.c | 3 + src/backend/utils/adt/mcxtfuncs.c| 60 ++- src/backend/utils/init/globals.c | 1 + src/backend/utils/mmgr/aset.c| 8 +- src/backend/utils/mmgr/generation.c | 8 +- src/backend/utils/mmgr/mcxt.c| 171 +++ src/backend/utils/mmgr/slab.c| 9 +- src/include/catalog/pg_proc.dat | 6 + src/include/miscadmin.h | 1 + src/include/nodes/memnodes.h | 6 +- src/include/storage/procsignal.h | 1 + src/include/utils/memutils.h | 5 +- src/test/regress/expected/misc_functions.out | 13 ++ src/test/regress/sql/misc_functions.sql | 9 + 16 files changed, 305 insertions(+), 47 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 3cf243a16a..a20be435ca 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24913,6 +24913,23 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_backend_memory_contexts + +pg_log_backend_memory_contexts ( pid integer ) +boolean + + +Logs the memory contexts whose backend process has the specified +process ID. +Memory contexts will be logged based on the log configuration set. +See for more information. +Only superusers can log the memory contexts. + + + @@ -24983,6 +25000,36 @@ SELECT collation for ('foo' COLLATE "de_DE"); pg_stat_activity view. + +pg_log_backend_memory_contexts can be used +to log the memory contexts of the backend process. For example, + +postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid()); + pg_log_backend_memory_contexts + + t +(1 row) + +One message for each memory context will be logged. For example: + +LOG: logging memory contexts of PID 10377 +STATEMENT: SELECT pg_log_backend_memory_contexts(pg_backend_pid()); +LOG: level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free (5 chunks); 66368 used +LOG: level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used +LOG: level: 1; TopTransactionContext: 8192 total in 1 blocks; 7720 free (1 chunks); 472 used +LOG: level: 1; RowDescriptionContext: 8192 total in 1 blocks; 6880 free (0 chunks); 1312 used +LOG: level: 1; MessageContext: 16384 total in 2 blocks; 5152 free (0 chunks); 11232 used +LOG: level: 1; Operator class cache: 8192 total in 1 blocks; 512 free (0 chunks); 7680 used +LOG: level: 1; smgr relation table: 16384 total in 2 blocks; 4544 free (3 chunks); 11840 used +LOG: level: 1; TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used +... +LOG: level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used +LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 used + +For more than 100 child contexts under the same parent one, +100 child contexts and a summary of the r
Re: Get memory contexts of an arbitrary backend process
On 2021-04-06 00:08, Fujii Masao wrote: On 2021/04/05 21:03, torikoshia wrote: On 2021-04-05 12:59, Fujii Masao wrote: On 2021/04/05 12:20, Zhihong Yu wrote: Thanks for reviewing! + * On receipt of this signal, a backend sets the flag in the signal + * handler, and then which causes the next CHECK_FOR_INTERRUPTS() I think the 'and then' is not needed: Although I wonder either would be fine, removed the words. + * This is just a warning so a loop-through-resultset will not abort + * if one backend logged its memory contexts during the run. The pid given by arg 0 is not a PostgreSQL server process. Which other backend could it be ? This is the comment that I added wrongly. So the comment should be "This is just a warning so a loop-through-resultset will not abort if one backend terminated on its own during the run.", like pg_signal_backend(). Thought? +1. Attached v10 patch. Thanks for updating the patch! I updated the patch as follows. Could you check the attached patch? Thanks a lot! I don't have any objections to your improvements. Regards,
Re: Is it useful to record whether plans are generic or custom?
On 2020-07-20 13:57, torikoshia wrote: As I proposed earlier in this thread, I'm now trying to add information about generic/cudstom plan to pg_stat_statements. I'll share the idea and the poc patch soon. Attached a poc patch. Main purpose is to decide (1) the user interface and (2) the way to get the plan type from pg_stat_statements. (1) the user interface I added a new boolean column 'generic_plan' to both pg_stat_statements view and the member of the hash key of pg_stat_statements. This is because as Legrand pointed out the feature seems useful under the condition of differentiating all the counters for a queryid using a generic plan and the one using a custom one. I thought it might be preferable to make a GUC to enable or disable this feature, but changing the hash key makes it harder. (2) way to get the plan type from pg_stat_statements To know whether the plan is generic or not, I added a member to CachedPlan and get it in the ExecutorStart_hook from ActivePortal. I wished to do it in the ExecutorEnd_hook, but the ActivePortal is not available on executorEnd, so I keep it on a global variable newly defined in pg_stat_statements. Any thoughts? This is a poc patch and I'm going to do below things later: - update pg_stat_statements version - change default value for the newly added parameter in pg_stat_statements_reset() from -1 to 0(since default for other parameters are all 0) - add regression tests and update docs Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 793eafad8e988b6754c9d89e0ea14b64b07eef81 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Wed, 22 Jul 2020 16:00:04 +0900 Subject: [PATCH] [poc] Previously the number of custom and generic plans are recoreded only in pg_prepared_statements, meaning we could only track them regarding current session. This patch records them in pg_stat_statements and it enables to track them regarding all sessions of the PostgreSQL instance. --- .../pg_stat_statements--1.6--1.7.sql | 3 +- .../pg_stat_statements--1.7--1.8.sql | 1 + .../pg_stat_statements/pg_stat_statements.c | 44 +++ src/backend/utils/cache/plancache.c | 2 + src/include/utils/plancache.h | 1 + 5 files changed, 41 insertions(+), 10 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql b/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql index 6fc3fed4c9..5ab0a26b77 100644 --- a/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql +++ b/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql @@ -12,7 +12,8 @@ DROP FUNCTION pg_stat_statements_reset(); /* Now redefine */ CREATE FUNCTION pg_stat_statements_reset(IN userid Oid DEFAULT 0, IN dbid Oid DEFAULT 0, - IN queryid bigint DEFAULT 0 + IN queryid bigint DEFAULT 0, + IN generic_plan int DEFAULT -1 ) RETURNS void AS 'MODULE_PATHNAME', 'pg_stat_statements_reset_1_7' diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql index 0f63f08f7e..0d7c4e7343 100644 --- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql +++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql @@ -15,6 +15,7 @@ DROP FUNCTION pg_stat_statements(boolean); CREATE FUNCTION pg_stat_statements(IN showtext boolean, OUT userid oid, OUT dbid oid, +OUT generic_plan bool, OUT queryid bigint, OUT query text, OUT plans int8, diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 14cad19afb..5d74dc04cd 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -78,9 +78,11 @@ #include "storage/ipc.h" #include "storage/spin.h" #include "tcop/utility.h" +#include "tcop/pquery.h" #include "utils/acl.h" #include "utils/builtins.h" #include "utils/memutils.h" +#include "utils/plancache.h" PG_MODULE_MAGIC; @@ -156,6 +158,7 @@ typedef struct pgssHashKey Oid userid; /* user OID */ Oid dbid; /* database OID */ uint64 queryid; /* query identifier */ + bool is_generic_plan; } pgssHashKey; /* @@ -266,6 +269,9 @@ static int exec_nested_level = 0; /* Current nesting depth of planner calls */ static int plan_nested_level = 0; +/* Current plan type */ +static bool is_generic_plan = false; + /* Saved hook values in case of unload */ static shmem_startup_hook_type prev_shmem_startup_hook = NULL; static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL; @@ -367,7 +373,7 @@ static char *qtext_fetch(Size query_offset, int query_len, char *buffer, Size buffer_size); static bool need_gc_qtexts(void); static void gc_qtexts(void); -static void entry_reset(Oid userid, Oid dbid, uint64 queryid); +stati
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On 2020-07-14 20:24, Julien Rouhaud wrote: On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote: Hi, v9 patch fails to apply to HEAD, could you check and rebase it? Thanks for the notice, v10 attached! And here are minor typos. 79 +* utility statements. Note that we don't compute a queryId for prepared 80 +* statemets related utility, as those will inherit from the underlying 81 +* statements's one (except DEALLOCATE which is entirely untracked). statemets -> statements statements's -> statements' or statement's? Thanks! I went with "statement's". Thanks for updating! I tested the patch setting log_statement = 'all', but %Q in log_line_prefix was always 0 even when pg_stat_statements.queryid and pg_stat_activity.queryid are not 0. Is this an intentional behavior? ``` $ initdb --no-locale -D data $ edit postgresql.conf shared_preload_libraries = 'pg_stat_statements' logging_collector = on log_line_prefix = '%m [%p] queryid:%Q ' log_statement = 'all' $ pg_ctl start -D data $ psql =# CREATE EXTENSION pg_stat_statements; =# CREATE TABLE t1 (i int); =# INSERT INTO t1 VALUES (0),(1); =# SELECT queryid, query FROM pg_stat_activity; -- query ids are all 0 on the log $ view log 2020-07-28 15:57:58.475 EDT [4480] queryid:0 LOG: statement: CREATE TABLE t1 (i int); 2020-07-28 15:58:13.730 EDT [4480] queryid:0 LOG: statement: INSERT INTO t1 VALUES (0),(1); 2020-07-28 15:59:28.389 EDT [4480] queryid:0 LOG: statement: SELECT * FROM t1; -- on pg_stat_activity and pgss, query ids are not 0 $ psql =# SELECT queryid, query FROM pg_stat_activity WHERE query LIKE '%t1%'; queryid|query --+-- 1109063694563750779 | SELECT * FROM t1; -2582225123719476948 | SELECT queryid, query FROM pg_stat_activity WHERE query LIKE '%t1%'; (2 rows) =# SELECT queryid, query FROM pg_stat_statements WHERE query LIKE '%t1%'; queryid| query --+- -5028988130796701553 | CREATE TABLE t1 (i int) 1109063694563750779 | SELECT * FROM t1 2726469050076420724 | INSERT INTO t1 VALUES ($1),($2) ``` And here is a minor typo. optionnally -> optionally 753 + /* query identifier, optionnally computed using post_parse_analyze_hook */ Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: Creating a function for exposing memory usage of backend process
On 2020-07-30 15:13, Kasahara Tatsuhito wrote: Hi, On Fri, Jul 10, 2020 at 5:32 PM torikoshia wrote: - whether information for identifying parent-child relation is necessary or it's an overkill I think it's important to understand the parent-child relationship of the context. Personally, I often want to know the following two things .. - In which life cycle is the target context? (Remaining as long as the process is living? per query?) - Does the target context belong to the correct (parent) context? - if this information is necessary, memory address is suitable or other means like assigning unique numbers are required IMO, If each context can be uniquely identified (or easily guessed) by "name" and "ident", then I don't think the address information is necessary. Instead, I like the way that directly shows the context name of the parent, as in the 0005 patch. Thanks for your opinion! I also feel it'll be sufficient to know not the exact memory context of the parent but the name of the parent context. And as Fujii-san told me in person, exposing memory address seems not preferable considering there are security techniques like address space layout randomization. On 2020-07-10 08:30:22 +0900, torikoshia wrote: On 2020-07-08 22:12, Fujii Masao wrote: Another comment about parent column is: dynahash can be parent? If yes, its indent instead of name should be displayed in parent column? I'm not sure yet, but considering the changes in the future, it seems better to do so. Attached a patch which displays ident as parent when dynahash is a parent. I could not find the case when dynahash can be a parent so I tested it using attached test purposed patch. Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 31 Jul 2020 16:20:29 +0900 Subject: [PATCH] Add a function exposing memory usage of local backend. This patch implements a new SQL-callable function pg_get_backend_memory_contexts which exposes memory usage of the local backend. It also adds a new view pg_backend_memory_contexts for exposing local backend memory contexts. --- doc/src/sgml/catalogs.sgml | 122 +++ src/backend/catalog/system_views.sql | 3 + src/backend/utils/mmgr/mcxt.c| 140 +++ src/include/catalog/pg_proc.dat | 9 ++ src/test/regress/expected/rules.out | 10 ++ 5 files changed, 284 insertions(+) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 26fda20d19..5bfc983a90 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -9266,6 +9266,11 @@ SCRAM-SHA-256$<iteration count>:&l materialized views + + pg_backend_memory_contexts + backend memory contexts + + pg_policies policies @@ -10544,6 +10549,123 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx + + pg_backend_memory_contexts + + + pg_backend_memory_contexts + + + + The view pg_backend_memory_contexts displays all + the local backend memory contexts. + + + pg_backend_memory_contexts contains one row + for each memory context. + + + + pg_backend_memory_contexts Columns + + + + + Column Type + + + Description + + + + + + + + name text + + + Name of the memory context + + + + + + ident text + + + Identification information of the memory context. This field is truncated at 1024 bytes + + + + + + parent text + + + Name of the parent of this memory context + + + + + + level int4 + + + Distance from TopMemoryContext in context tree + + + + + + total_bytes int8 + + + Total bytes allocated for this memory context + + + + + + total_nblocks int8 + + + Total number of blocks allocated for this memory context + + + + + + free_bytes int8 + + + Free space in bytes + + + + + + free_chunks int8 + + + Total number of free chunks + + + + + + used_bytes int8 + + + Used space in bytes + + + + + + + + pg_matviews diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 8625cbeab6..ba5a23ac25 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -554,6 +554,9 @@ CREATE VIEW pg_shmem_allocations AS REVOKE ALL ON pg_shmem_allocations FROM PUBLIC; REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC; +CREAT
Re: Is it useful to record whether plans are generic or custom?
On 2020-07-30 14:31, Fujii Masao wrote: On 2020/07/22 16:49, torikoshia wrote: On 2020-07-20 13:57, torikoshia wrote: As I proposed earlier in this thread, I'm now trying to add information about generic/cudstom plan to pg_stat_statements. I'll share the idea and the poc patch soon. Attached a poc patch. Thanks for the POC patch! With the patch, when I ran "CREATE EXTENSION pg_stat_statements", I got the following error. ERROR: function pg_stat_statements_reset(oid, oid, bigint) does not exist Oops, sorry about that. I just fixed it there for now. Main purpose is to decide (1) the user interface and (2) the way to get the plan type from pg_stat_statements. (1) the user interface I added a new boolean column 'generic_plan' to both pg_stat_statements view and the member of the hash key of pg_stat_statements. This is because as Legrand pointed out the feature seems useful under the condition of differentiating all the counters for a queryid using a generic plan and the one using a custom one. I don't like this because this may double the number of entries in pgss. Which means that the number of entries can more easily reach pg_stat_statements.max and some entries will be discarded. I thought it might be preferable to make a GUC to enable or disable this feature, but changing the hash key makes it harder. What happens if the server was running with this option enabled and then restarted with the option disabled? Firstly two entries for the same query were stored in pgss because the option was enabled. But when it's disabled and the server is restarted, those two entries should be merged into one at the startup of server? If so, that's problematic because it may take a long time. Therefore I think that it's better and simple to just expose the number of times generic/custom plan was chosen for each query. Regards, Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 793eafad8e988b6754c9d89e0ea14b64b07eef81 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 31 Jul 2020 17:52:14 +0900 Subject: [PATCH] Previously the number of custom and generic plans are recoreded only in pg_prepared_statements, meaning we could only track them regarding current session. This patch records them in pg_stat_statements and it enables to track them regarding all sessions of the PostgreSQL instance. --- .../pg_stat_statements--1.6--1.7.sql | 5 ++- .../pg_stat_statements--1.7--1.8.sql | 1 + .../pg_stat_statements/pg_stat_statements.c | 44 +++ src/backend/utils/cache/plancache.c | 2 + src/include/utils/plancache.h | 1 + 5 files changed, 42 insertions(+), 11 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql b/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql index 6fc3fed4c9..fd7aa05c92 100644 --- a/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql +++ b/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql @@ -12,11 +12,12 @@ DROP FUNCTION pg_stat_statements_reset(); /* Now redefine */ CREATE FUNCTION pg_stat_statements_reset(IN userid Oid DEFAULT 0, IN dbid Oid DEFAULT 0, - IN queryid bigint DEFAULT 0 + IN queryid bigint DEFAULT 0, + IN generic_plan int DEFAULT -1 ) RETURNS void AS 'MODULE_PATHNAME', 'pg_stat_statements_reset_1_7' LANGUAGE C STRICT PARALLEL SAFE; -- Don't want this to be available to non-superusers. -REVOKE ALL ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint) FROM PUBLIC; +REVOKE ALL ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint, int) FROM PUBLIC; diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql index 0f63f08f7e..0d7c4e7343 100644 --- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql +++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql @@ -15,6 +15,7 @@ DROP FUNCTION pg_stat_statements(boolean); CREATE FUNCTION pg_stat_statements(IN showtext boolean, OUT userid oid, OUT dbid oid, +OUT generic_plan bool, OUT queryid bigint, OUT query text, OUT plans int8, diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 6b91c62c31..14c580a95e 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -78,9 +78,11 @@ #include "storage/ipc.h" #include "storage/spin.h" #include "tcop/utility.h" +#include "tcop/pquery.h" #include "utils/acl.h" #include "utils/builtins.h" #include "utils/memutils.h" +#include "utils/plancache.h" PG_MODULE_MAGIC; @@ -156,6 +158,7 @@ typedef struct pgssHashKey Oid userid; /* user OID */ Oid dbid; /* database OID */ uint64 queryid; /* query identifier */
Re: Creating a function for exposing memory usage of backend process
On 2020-08-08 10:44, Michael Paquier wrote: On Fri, Jul 31, 2020 at 03:23:52PM -0400, Robert Haas wrote: On Fri, Jul 31, 2020 at 4:25 AM torikoshia wrote: And as Fujii-san told me in person, exposing memory address seems not preferable considering there are security techniques like address space layout randomization. Yeah, exactly. ASLR wouldn't do anything to improve security if there were no other security bugs, but there are, and some of those bugs are harder to exploit if you don't know the precise memory addresses of certain data structures. Similarly, exposing the addresses of our internal data structures is harmless if we have no other security bugs, but if we do, it might make those bugs easier to exploit. I don't think this information is useful enough to justify taking that risk. FWIW, this is the class of issues where it is possible to print some areas of memory, or even manipulate the stack so as it was possible to pass down a custom pointer, so exposing the pointer locations is a real risk, and this has happened in the past. Anyway, it seems to me that if this part is done, we could just make it superuser-only with restrictive REVOKE privileges, but I am not sure that we have enough user cases to justify this addition. Thanks for your comments! I convinced that exposing pointer locations introduce security risks and it seems better not to do so. And I now feel identifying exact memory context by exposing memory address or other means seems overkill. Showing just the context name of the parent would be sufficient and 0007 pattch takes this way. On 2020-08-07 16:38, Kasahara Tatsuhito wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed I tested the latest patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch) with the latest PG-version (199cec9779504c08aaa8159c6308283156547409) and test was passed. It looks good to me. The new status of this patch is: Ready for Committer Thanks for your testing! Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: Creating a function for exposing memory usage of backend process
On 2020-08-17 21:19, Fujii Masao wrote: On 2020/08/17 21:14, Fujii Masao wrote: On 2020-08-07 16:38, Kasahara Tatsuhito wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: tested, passed I tested the latest patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch) with the latest PG-version (199cec9779504c08aaa8159c6308283156547409) and test was passed. It looks good to me. The new status of this patch is: Ready for Committer Thanks for your testing! Thanks for updating the patch! Here are the review comments. Thanks for reviewing! + + linkend="view-pg-backend-memory-contexts">pg_backend_memory_contexts + backend memory contexts + The above is located just after pg_matviews entry. But it should be located just after pg_available_extension_versions entry. Because the rows in the table "System Views" should be located in alphabetical order. + + pg_backend_memory_contexts Same as above. Modified both. + The view pg_backend_memory_contexts displays all + the local backend memory contexts. This description seems a bit confusing because maybe we can interpret this as "... displays the memory contexts of all the local backends" wrongly. Thought? What about the following description, instead? The view pg_backend_memory_contexts displays all the memory contexts of the server process attached to the current session. Thanks! it seems better. + const char *name = context->name; + const char *ident = context->ident; + + if (context == NULL) + return; The above check "context == NULL" is useless? If "context" is actually NULL, "context->name" would cause segmentation fault, so ISTM that the check will never be performed. If "context" can be NULL, the check should be performed before accessing to "contect". OTOH, if "context" must not be NULL per the specification of PutMemoryContextStatsTupleStore(), assertion test checking "context != NULL" should be used here, instead? Yeah, "context" cannot be NULL because "context" must be TopMemoryContext or it is already checked as not NULL as follows(child != NULL). I added the assertion check. | for (child = context->firstchild; child != NULL; child = child->nextchild) | { | ... | PutMemoryContextsStatsTupleStore(tupstore, tupdesc, | child, parentname, level + 1); | } Here is another comment. + if (parent == NULL) + nulls[2] = true; + else + /* + * We labeled dynahash contexts with just the hash table name. + * To make it possible to identify its parent, we also display + * parent's ident here. + */ + if (parent->ident && strcmp(parent->name, "dynahash") == 0) + values[2] = CStringGetTextDatum(parent->ident); + else + values[2] = CStringGetTextDatum(parent->name); PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context, but uses only the name of "parent" memory context. So isn't it better to use "const char *parent" instead of "MemoryContext parent", as the argument of the function? If we do that, we can simplify the above code. Thanks, the attached patch adopted the advice. However, since PutMemoryContextsStatsTupleStore() used not only the name but also the ident of the "parent", I could not help but adding similar codes before calling the function. The total amount of codes and complexity seem not to change so much. Any thoughts? Am I misunderstanding something? Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Tue, 18 Aug 2020 18:17:42 +0900 Subject: [PATCH] Add a function exposing memory usage of local backend. pg_get_backend_memory_contexts which exposes memory usage of the local backend. It also adds a new view pg_backend_memory_contexts for exposing local backend memory contexts. --- doc/src/sgml/catalogs.sgml | 122 ++ src/backend/catalog/system_views.sql | 3 + src/backend/utils/mmgr/mcxt.c| 147 +++ src/include/catalog/pg_proc.dat | 9 ++ src/test/regress/expected/rules.out | 10 ++ 5 files changed, 291 insertions(+) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index fc329c5cff..1232b24e74 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -9226,6 +9226,11 @@ SCRAM-SHA-256$:&l available versions of extensions + + pg_backend_memory_contexts + backend memory contexts + + pg_config compile-time configuration parameters @@ -9577
Re: Creating a function for exposing memory usage of backend process
On 2020-08-18 22:54, Fujii Masao wrote: On 2020/08/18 18:41, torikoshia wrote: On 2020-08-17 21:19, Fujii Masao wrote: On 2020/08/17 21:14, Fujii Masao wrote: On 2020-08-07 16:38, Kasahara Tatsuhito wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: tested, passed I tested the latest patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch) with the latest PG-version (199cec9779504c08aaa8159c6308283156547409) and test was passed. It looks good to me. The new status of this patch is: Ready for Committer Thanks for your testing! Thanks for updating the patch! Here are the review comments. Thanks for reviewing! + + linkend="view-pg-backend-memory-contexts">pg_backend_memory_contexts + backend memory contexts + The above is located just after pg_matviews entry. But it should be located just after pg_available_extension_versions entry. Because the rows in the table "System Views" should be located in alphabetical order. + + pg_backend_memory_contexts Same as above. Modified both. + The view pg_backend_memory_contexts displays all + the local backend memory contexts. This description seems a bit confusing because maybe we can interpret this as "... displays the memory contexts of all the local backends" wrongly. Thought? What about the following description, instead? The view pg_backend_memory_contexts displays all the memory contexts of the server process attached to the current session. Thanks! it seems better. + const char *name = context->name; + const char *ident = context->ident; + + if (context == NULL) + return; The above check "context == NULL" is useless? If "context" is actually NULL, "context->name" would cause segmentation fault, so ISTM that the check will never be performed. If "context" can be NULL, the check should be performed before accessing to "contect". OTOH, if "context" must not be NULL per the specification of PutMemoryContextStatsTupleStore(), assertion test checking "context != NULL" should be used here, instead? Yeah, "context" cannot be NULL because "context" must be TopMemoryContext or it is already checked as not NULL as follows(child != NULL). I added the assertion check. Isn't it better to add AssertArg(MemoryContextIsValid(context)), instead? Thanks, that's better. | for (child = context->firstchild; child != NULL; child = child->nextchild) | { | ... | PutMemoryContextsStatsTupleStore(tupstore, tupdesc, | child, parentname, level + 1); | } Here is another comment. + if (parent == NULL) + nulls[2] = true; + else + /* + * We labeled dynahash contexts with just the hash table name. + * To make it possible to identify its parent, we also display + * parent's ident here. + */ + if (parent->ident && strcmp(parent->name, "dynahash") == 0) + values[2] = CStringGetTextDatum(parent->ident); + else + values[2] = CStringGetTextDatum(parent->name); PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context, but uses only the name of "parent" memory context. So isn't it better to use "const char *parent" instead of "MemoryContext parent", as the argument of the function? If we do that, we can simplify the above code. Thanks, the attached patch adopted the advice. However, since PutMemoryContextsStatsTupleStore() used not only the name but also the ident of the "parent", I could not help but adding similar codes before calling the function. The total amount of codes and complexity seem not to change so much. Any thoughts? Am I misunderstanding something? I was thinking that we can simplify the code as follows. That is, we can just pass "name" as the argument of PutMemoryContextsStatsTupleStore() since "name" indicates context->name or ident (if name is "dynahash"). for (child = context->firstchild; child != NULL; child = child->nextchild) { - const char *parentname; - - /* -* We labeled dynahash contexts with just the hash table name. -* To make it possible to identify its parent, we also use -* the hash table as its context name. -*/ - if (context->ident && strcmp(context->name, "dynahas
Re: Creating a function for exposing memory usage of backend process
On 2020-08-19 15:48, Fujii Masao wrote: On 2020/08/19 9:43, torikoshia wrote: On 2020-08-18 22:54, Fujii Masao wrote: On 2020/08/18 18:41, torikoshia wrote: On 2020-08-17 21:19, Fujii Masao wrote: On 2020/08/17 21:14, Fujii Masao wrote: On 2020-08-07 16:38, Kasahara Tatsuhito wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: tested, passed I tested the latest patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch) with the latest PG-version (199cec9779504c08aaa8159c6308283156547409) and test was passed. It looks good to me. The new status of this patch is: Ready for Committer Thanks for your testing! Thanks for updating the patch! Here are the review comments. Thanks for reviewing! + + linkend="view-pg-backend-memory-contexts">pg_backend_memory_contexts + backend memory contexts + The above is located just after pg_matviews entry. But it should be located just after pg_available_extension_versions entry. Because the rows in the table "System Views" should be located in alphabetical order. + + pg_backend_memory_contexts Same as above. Modified both. + The view pg_backend_memory_contexts displays all + the local backend memory contexts. This description seems a bit confusing because maybe we can interpret this as "... displays the memory contexts of all the local backends" wrongly. Thought? What about the following description, instead? The view pg_backend_memory_contexts displays all the memory contexts of the server process attached to the current session. Thanks! it seems better. + const char *name = context->name; + const char *ident = context->ident; + + if (context == NULL) + return; The above check "context == NULL" is useless? If "context" is actually NULL, "context->name" would cause segmentation fault, so ISTM that the check will never be performed. If "context" can be NULL, the check should be performed before accessing to "contect". OTOH, if "context" must not be NULL per the specification of PutMemoryContextStatsTupleStore(), assertion test checking "context != NULL" should be used here, instead? Yeah, "context" cannot be NULL because "context" must be TopMemoryContext or it is already checked as not NULL as follows(child != NULL). I added the assertion check. Isn't it better to add AssertArg(MemoryContextIsValid(context)), instead? Thanks, that's better. | for (child = context->firstchild; child != NULL; child = child->nextchild) | { | ... | PutMemoryContextsStatsTupleStore(tupstore, tupdesc, | child, parentname, level + 1); | } Here is another comment. + if (parent == NULL) + nulls[2] = true; + else + /* + * We labeled dynahash contexts with just the hash table name. + * To make it possible to identify its parent, we also display + * parent's ident here. + */ + if (parent->ident && strcmp(parent->name, "dynahash") == 0) + values[2] = CStringGetTextDatum(parent->ident); + else + values[2] = CStringGetTextDatum(parent->name); PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context, but uses only the name of "parent" memory context. So isn't it better to use "const char *parent" instead of "MemoryContext parent", as the argument of the function? If we do that, we can simplify the above code. Thanks, the attached patch adopted the advice. However, since PutMemoryContextsStatsTupleStore() used not only the name but also the ident of the "parent", I could not help but adding similar codes before calling the function. The total amount of codes and complexity seem not to change so much. Any thoughts? Am I misunderstanding something? I was thinking that we can simplify the code as follows. That is, we can just pass "name" as the argument of PutMemoryContextsStatsTupleStore() since "name" indicates context->name or ident (if name is "dynahash"). for (child = context->firstchild; child != NULL; child = child->nextchild) { - const char *parentname; - - /* - * We labeled dynahash contexts with just the hash table name. - * To make it possible to identify its parent, we also use - * the hash table as its context name. - */ - if (context->ident && strcmp(context->name, "d
Re: Creating a function for exposing memory usage of backend process
Thanks for all your comments! Thankfully it seems that this feature is regarded as not meaningless one, I'm going to do some improvements. On Wed, Aug 19, 2020 at 10:56 PM Michael Paquier wrote: On Wed, Aug 19, 2020 at 06:12:02PM +0900, Fujii Masao wrote: On 2020/08/19 17:40, torikoshia wrote: Yes, I didn't add regression tests because of the unstability of the output. I thought it would be OK since other views like pg_stat_slru and pg_shmem_allocations didn't have tests for their outputs. You're right. If you can make a test with something minimal and with a stable output, adding a test is helpful IMO, or how can you make easily sure that this does not get broken, particularly in the event of future refactorings, or even with platform-dependent behaviors? OK. Added a regression test on sysviews.sql. (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch) Fujii-san gave us an example, but I added more simple one considering the simplicity of other tests on that. On Thu, Aug 20, 2020 at 12:02 AM Tom Lane wrote: Michael Paquier writes: > By the way, I was looking at the code that has been committed, and I > think that it is awkward to have a SQL function in mcxt.c, which is a > rather low-level interface. I think that this new code should be > moved to its own file, one suggestion for a location I have being > src/backend/utils/adt/mcxtfuncs.c. I agree with that, Thanks for pointing out. Added a patch for relocating the codes to mcxtfuncs.c. (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch) On Thu, Aug 20, 2020 at 11:09 AM Fujii Masao wrote: On 2020/08/20 0:01, Tom Lane wrote: Given the lack of clear use-case, and the possibility (admittedly not strong) that this is still somehow a security hazard, I think we should revert it. If it stays, I'd like to see restrictions on who can read the view. For example, allowing only the role with pg_monitor to see this view? Attached a patch adding that restriction. (0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch) Of course, this restriction makes pg_backend_memory_contexts hard to use when the user of the target session is not granted pg_monitor because the scope of this view is session local. In this case, I imagine additional operations something like temporarily granting pg_monitor to that user. Thoughts? Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 23fe541d5bd3cead787bb7c638f0086b9c2e13eb Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 21 Aug 2020 21:22:10 +0900 Subject: [PATCH] Added a regression test for pg_backend_memory_contexts. --- src/test/regress/expected/sysviews.out | 7 +++ src/test/regress/sql/sysviews.sql | 3 +++ 2 files changed, 10 insertions(+) diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index 06c4c3e476..06e09fd10b 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -19,6 +19,13 @@ select count(*) >= 0 as ok from pg_available_extensions; t (1 row) +-- There will surely be at least one context. +select count(*) > 0 as ok from pg_backend_memory_contexts; + ok + + t +(1 row) + -- At introduction, pg_config had 23 entries; it may grow select count(*) > 20 as ok from pg_config; ok diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql index 28e412b735..2c3b88c855 100644 --- a/src/test/regress/sql/sysviews.sql +++ b/src/test/regress/sql/sysviews.sql @@ -12,6 +12,9 @@ select count(*) >= 0 as ok from pg_available_extension_versions; select count(*) >= 0 as ok from pg_available_extensions; +-- There will surely be at least one context. +select count(*) > 0 as ok from pg_backend_memory_contexts; + -- At introduction, pg_config had 23 entries; it may grow select count(*) > 20 as ok from pg_config; -- 2.18.1 From 4eee73933874fbab91643e7461717ba9038d8d76 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 21 Aug 2020 19:01:38 +0900 Subject: [PATCH] Rellocated the codes for pg_backend_memory_contexts in mcxt.c to src/backend/utils/adt/mcxtfuncs.c as they are low low-level interface. --- src/backend/utils/adt/Makefile| 1 + src/backend/utils/adt/mcxtfuncs.c | 157 ++ src/backend/utils/mmgr/mcxt.c | 137 -- 3 files changed, 158 insertions(+), 137 deletions(-) create mode 100644 src/backend/utils/adt/mcxtfuncs.c diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 5d2aca8cfe..54d5c37947 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -57,6 +57,7 @@ OBJS = \ lockfuncs.o \ mac.o \ mac8.o \ + mcxtfuncs.o \ misc.o \ name.o \ network.o \ diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c new file mode 100644 index 00..50e1b07ff0 --- /de
Re: Creating a function for exposing memory usage of backend process
On 2020-08-22 21:18, Michael Paquier wrote: Thanks for reviewing! On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote: OK. Added a regression test on sysviews.sql. (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch) Fujii-san gave us an example, but I added more simple one considering the simplicity of other tests on that. What you have sent in 0001 looks fine to me. A small test is much better than nothing. Added a patch for relocating the codes to mcxtfuncs.c. (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch) The same code is moved around line-by-line. Of course, this restriction makes pg_backend_memory_contexts hard to use when the user of the target session is not granted pg_monitor because the scope of this view is session local. In this case, I imagine additional operations something like temporarily granting pg_monitor to that user. Hmm. I am not completely sure either that pg_monitor is the best fit here, because this view provides information about a bunch of internal structures. Something that could easily be done though is to revoke the access from public, and then users could just set up GRANT permissions post-initdb, with pg_monitor as one possible choice. This is the safest path by default, and this stuff is of a caliber similar to pg_shmem_allocations in terms of internal contents. I think this is a better way than what I did in 0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch. Attached a patch. It seems to me that you are missing one "REVOKE ALL on pg_backend_memory_contexts FROM PUBLIC" in patch 0003. By the way, if that was just for me, I would remove used_bytes, which is just a computation from the total and free numbers. I'll defer that point to Fujii-san. -- Michael On 2020/08/20 2:59, Kasahara Tatsuhito wrote: I totally agree that it's not *enough*, but in contrast to you I think it's a good step. Subsequently we should add a way to get any backends memory usage. It's not too hard to imagine how to serialize it in a way that can be easily deserialized by another backend. I am imagining something like sending a procsignal that triggers (probably at CFR() time) a backend to write its own memory usage into pg_memusage/ or something roughly like that. Sounds good. Maybe we can also provide the SQL-callable function or view to read pg_memusage/, to make the analysis easier. +1 I'm thinking about starting a new thread to discuss exposing other backend's memory context. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION From dc4fade9111dc3f91e992c4d5af393dd5ed03270 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 24 Jul 2020 11:14:32 +0900 Subject: [PATCH] Previously pg_backend_memory_contexts doesn't have any restriction and anyone could access it. However, this view contains some internal information of the memory context. This policy could cause security issues. This patch revokes all on pg_shmem_allocations from public and only the superusers can access it. --- doc/src/sgml/catalogs.sgml | 4 src/backend/catalog/system_views.sql | 3 +++ 2 files changed, 7 insertions(+) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 1232b24e74..9fe260ecff 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -9697,6 +9697,10 @@ SCRAM-SHA-256$<iteration count>:&l + + By default, the pg_backend_memory_contexts view can be + read only by superusers. + diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index ba5a23ac25..a2d61302f9 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -557,6 +557,9 @@ REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC; CREATE VIEW pg_backend_memory_contexts AS SELECT * FROM pg_get_backend_memory_contexts(); +REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC; +REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC; + -- Statistics views CREATE VIEW pg_stat_all_tables AS -- 2.18.1
Re: Creating a function for exposing memory usage of backend process
On 2020-08-24 13:13, Fujii Masao wrote: On 2020/08/24 13:01, torikoshia wrote: On 2020-08-22 21:18, Michael Paquier wrote: Thanks for reviewing! On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote: OK. Added a regression test on sysviews.sql. (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch) Fujii-san gave us an example, but I added more simple one considering the simplicity of other tests on that. What you have sent in 0001 looks fine to me. A small test is much better than nothing. +1 But as I proposed upthread, what about a bit complicated test as follows, e.g., to confirm that the internal logic for level works expectedly? SELECT name, ident, parent, level, total_bytes >= free_bytes FROM pg_backend_memory_contexts WHERE level = 0; OK! Attached an updated patch. Added a patch for relocating the codes to mcxtfuncs.c. (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch) Thanks for the patch! Looks good to me. Barring any objection, I will commit this patch at first. The same code is moved around line-by-line. Of course, this restriction makes pg_backend_memory_contexts hard to use when the user of the target session is not granted pg_monitor because the scope of this view is session local. In this case, I imagine additional operations something like temporarily granting pg_monitor to that user. Hmm. I am not completely sure either that pg_monitor is the best fit here, because this view provides information about a bunch of internal structures. Something that could easily be done though is to revoke the access from public, and then users could just set up GRANT permissions post-initdb, with pg_monitor as one possible choice. This is the safest path by default, and this stuff is of a caliber similar to pg_shmem_allocations in terms of internal contents. I think this is a better way than what I did in 0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch. You mean 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch? Oops, I meant 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch. Attached a patch. Thanks for updating the patch! This also looks good to me. It seems to me that you are missing one "REVOKE ALL on pg_backend_memory_contexts FROM PUBLIC" in patch 0003. By the way, if that was just for me, I would remove used_bytes, which is just a computation from the total and free numbers. I'll defer that point to Fujii-san. Yeah, I was just thinking that displaying also used_bytes was useful, but this might be inconsistent with the other views' ways. Regards, Regards, -- Atsushi Torikoshi NTT DATA CORPORATION From 335b9eb0c60a7f12debd4c45d435888109b2bfcf Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 24 Aug 2020 21:28:20 +0900 Subject: [PATCH] Added a regression test for pg_backend_memory_contexts. --- src/test/regress/expected/sysviews.out | 9 + src/test/regress/sql/sysviews.sql | 5 + 2 files changed, 14 insertions(+) diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index 06c4c3e476..1cffc3349d 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -19,6 +19,15 @@ select count(*) >= 0 as ok from pg_available_extensions; t (1 row) +-- The entire output of pg_backend_memory_contexts is not stable, +-- we test only the existance and basic condition of TopMemoryContext. +select name, ident, parent, level, total_bytes >= free_bytes + from pg_backend_memory_contexts where level = 0; + name | ident | parent | level | ?column? +--+---++---+-- + TopMemoryContext | || 0 | t +(1 row) + -- At introduction, pg_config had 23 entries; it may grow select count(*) > 20 as ok from pg_config; ok diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql index 28e412b735..ac4a0e1cbb 100644 --- a/src/test/regress/sql/sysviews.sql +++ b/src/test/regress/sql/sysviews.sql @@ -12,6 +12,11 @@ select count(*) >= 0 as ok from pg_available_extension_versions; select count(*) >= 0 as ok from pg_available_extensions; +-- The entire output of pg_backend_memory_contexts is not stable, +-- we test only the existance and basic condition of TopMemoryContext. +select name, ident, parent, level, total_bytes >= free_bytes + from pg_backend_memory_contexts where level = 0; + -- At introduction, pg_config had 23 entries; it may grow select count(*) > 20 as ok from pg_config; -- 2.18.1
Get memory contexts of an arbitrary backend process
Hi, After commit 3e98c0bafb28de, we can display the usage of the memory contexts using pg_backend_memory_contexts system view. However, its target is limited to the process attached to the current session. As discussed in the thread[1], it'll be useful to make it possible to get the memory contexts of an arbitrary backend process. Attached PoC patch makes pg_get_backend_memory_contexts() display meory contexts of the specified PID of the process. =# -- PID of the target process is 17051 =# SELECT * FROM pg_get_backend_memory_contexts(17051) ; name | ident | parent | level | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes ---+---+--+---+-+---++-+ TopMemoryContext | | | 0 | 68720 | 5 | 16816 | 16 | 51904 RowDescriptionContext | | TopMemoryContext | 1 | 8192 | 1 | 6880 | 0 | 1312 MessageContext| | TopMemoryContext | 1 | 65536 | 4 | 19912 | 1 | 45624 ... It doesn't display contexts of all the backends but only the contexts of specified process. I think it would be enough because I suppose this function is used after investigations using ps command or other OS level utilities. The rough idea of implementation is like below: 1. send a signal to the specified process 2. signaled process dumps its memory contexts to a file 3. read the dumped file and display it to the user Any thoughts? [1] https://www.postgresql.org/message-id/72a656e0f71d0860161e0b3f67e4d771%40oss.nttdata.com Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 7decfda337bbc422fece4c736a719b6fcfdc5cf3 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 31 Aug 2020 18:20:34 +0900 Subject: [PATCH] Enabled pg_get_backend_memory_contexts() to collect arbitrary backend process's memory contexts. Previously, pg_get_backend_memory_contexts() could only get the memory contexts of the process which kicked it. This patch enables to get memory contexts of the arbitrary process which PID is specified by the argument. --- doc/src/sgml/func.sgml | 13 + src/backend/catalog/system_views.sql | 4 +- src/backend/replication/basebackup.c | 3 + src/backend/storage/ipc/procsignal.c | 4 + src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/mcxtfuncs.c| 315 ++- src/backend/utils/init/globals.c | 1 + src/bin/initdb/initdb.c | 3 +- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 2 +- src/bin/pg_rewind/filemap.c | 3 + src/include/catalog/pg_proc.dat | 7 +- src/include/miscadmin.h | 1 + src/include/storage/procsignal.h | 1 + src/include/utils/mcxtfuncs.h| 21 ++ src/test/regress/expected/rules.out | 2 +- 15 files changed, 364 insertions(+), 21 deletions(-) create mode 100644 src/include/utils/mcxtfuncs.h diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index b9f591296a..cc9a458334 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -21062,6 +21062,19 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); + + + + pg_get_backend_memory_contexts + +pg_get_backend_memory_contexts ( integer ) +setof records + + +Returns all the memory contexts of the specified process ID. + + + diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index a2d61302f9..88fb837ecd 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC; REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC; CREATE VIEW pg_backend_memory_contexts AS -SELECT * FROM pg_get_backend_memory_contexts(); +SELECT * FROM pg_get_backend_memory_contexts(-1); REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC; -REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC; +REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts FROM PUBLIC; -- Statistics views diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 6064384e32..f69d851b6b 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -184,6 +184,9 @@ static const char *const excludeDirContents[] = /* Contents zeroed on startup, see StartupSUBTRANS(). */ "pg_subtrans", + /* Skip memory context dumped files. */ + "pg_memusage", + /* end of list */ NULL }; diff --git a/src/backend/storage/ipc/pro
Re: Get memory contexts of an arbitrary backend process
On 2020-09-01 03:29, Pavel Stehule wrote: Hi po 31. 8. 2020 v 17:03 odesílatel Kasahara Tatsuhito napsal: Hi, On Mon, Aug 31, 2020 at 8:22 PM torikoshia wrote: As discussed in the thread[1], it'll be useful to make it possible to get the memory contexts of an arbitrary backend process. +1 Attached PoC patch makes pg_get_backend_memory_contexts() display meory contexts of the specified PID of the process. Thanks, it's a very good patch for discussion. It doesn't display contexts of all the backends but only the contexts of specified process. or we can "SELECT (pg_get_backend_memory_contexts(pid)).* FROM pg_stat_activity WHERE ...", so I don't think it's a big deal. The rough idea of implementation is like below: 1. send a signal to the specified process 2. signaled process dumps its memory contexts to a file 3. read the dumped file and display it to the user I agree with the overview of the idea. Here are some comments and questions. Thanks for the comments! - Currently, "the signal transmission for dumping memory information" and "the read & output of dump information " are on the same interface, but I think it would be better to separate them. How about providing the following three types of functions for users? - send a signal to specified pid - check the status of the signal sent and received - read the dumped information Is this for future extensibility to make it possible to get other information like the current execution plan which was suggested by Pavel? If so, I agree with considering extensibility, but I'm not sure it's necessary whether providing these types of functions for 'users'. - How about managing the status of signal send/receive and dump operations on a shared hash or others ? Sending and receiving signals, dumping memory information, and referencing dump information all work asynchronously. Therefore, it would be good to have management information to check the status of each process. A simple idea is that .. - send a signal to dump to a PID, it first record following information into the shared hash. pid (specified pid) loc (dump location, currently might be ASAP) recv (did the pid process receive a signal? first false) dumped (did the pid process dump a mem information? first false) - specified process receive the signal, update the status in the shared hash, then dumped at specified location. - specified process finish dump mem information, update the status in the shared hash. Adding management information on shared memory seems necessary when we want to have more controls over dumping like 'dump location' or any other information such as 'current execution plan'. I'm going to consider this. - Does it allow one process to output multiple dump files? It appears to be a specification to overwrite at present, but I thought it would be good to be able to generate multiple dump files in different phases (e.g., planning phase and execution phase) in the future. - How is the dump file cleaned up? For a very long time there has been similar discussion about taking session query and session execution plans from other sessions. I am not sure how necessary information is in the memory dump, but I am sure so taking the current execution plan and complete text of the current query is pretty necessary information. but can be great so this infrastructure can be used for any debugging purpose. Thanks! It would be good if some part of this effort can be an infrastructure of other debugging. It may be hard, but I will keep your comment in mind. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION Regards Pavel Best regards, -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com [1] Links: -- [1] http://gmail.com
Re: Get memory contexts of an arbitrary backend process
Thanks for reviewing! I'm going to modify the patch according to your comments. On 2020-09-01 10:54, Andres Freund wrote: Hi, On 2020-08-31 20:22:18 +0900, torikoshia wrote: After commit 3e98c0bafb28de, we can display the usage of the memory contexts using pg_backend_memory_contexts system view. However, its target is limited to the process attached to the current session. As discussed in the thread[1], it'll be useful to make it possible to get the memory contexts of an arbitrary backend process. Attached PoC patch makes pg_get_backend_memory_contexts() display meory contexts of the specified PID of the process. Awesome! It doesn't display contexts of all the backends but only the contexts of specified process. I think it would be enough because I suppose this function is used after investigations using ps command or other OS level utilities. It can be used as a building block if all are needed. Getting the infrastructure right is the big thing here, I think. Adding more detailed views on top of that data later is easier. diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index a2d61302f9..88fb837ecd 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC; REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC; CREATE VIEW pg_backend_memory_contexts AS -SELECT * FROM pg_get_backend_memory_contexts(); +SELECT * FROM pg_get_backend_memory_contexts(-1); -1 is odd. Why not use NULL or even 0? + else + { + int rc; + int parent_len = strlen(parent); + int name_len = strlen(name); + + /* +* write out the current memory context information. +* Since some elements of values are reusable, we write it out. Not sure what the second comment line here is supposed to mean? +*/ + fputc('D', fpout); + rc = fwrite(values, sizeof(values), 1, fpout); + rc = fwrite(nulls, sizeof(nulls), 1, fpout); + + /* write out information which is not resuable from serialized values */ s/resuable/reusable/ + rc = fwrite(&name_len, sizeof(int), 1, fpout); + rc = fwrite(name, name_len, 1, fpout); + rc = fwrite(&idlen, sizeof(int), 1, fpout); + rc = fwrite(clipped_ident, idlen, 1, fpout); + rc = fwrite(&level, sizeof(int), 1, fpout); + rc = fwrite(&parent_len, sizeof(int), 1, fpout); + rc = fwrite(parent, parent_len, 1, fpout); + (void) rc; /* we'll check for error with ferror */ + + } This format is not descriptive. How about serializing to json or something? Or at least having field names? Alternatively, build the same tuple we build for the SRF, and serialize that. Then there's basically no conversion needed. @@ -117,6 +157,8 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore, Datum pg_get_backend_memory_contexts(PG_FUNCTION_ARGS) { + int pid = PG_GETARG_INT32(0); + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; TupleDesc tupdesc; Tuplestorestate *tupstore; @@ -147,11 +189,258 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS) MemoryContextSwitchTo(oldcontext); - PutMemoryContextsStatsTupleStore(tupstore, tupdesc, - TopMemoryContext, NULL, 0); + if (pid == -1) + { + /* +* Since pid -1 indicates target is the local process, simply +* traverse memory contexts. +*/ + PutMemoryContextsStatsTupleStore(tupstore, tupdesc, + TopMemoryContext, "", 0, NULL); + } + else + { + /* +* Send signal for dumping memory contexts to the target process, +* and read the dumped file. +*/ + FILE *fpin; + chardumpfile[MAXPGPATH]; + + SendProcSignal(pid, PROCSIG_DUMP_MEMORY, InvalidBackendId); + + snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", pid); + + while (true) + { + CHECK_FOR_INTERRUPTS(); + + pg_usleep(1L); + Need better signalling back/forth here. Do you mean I should also send another signal from the dumped process to the caller of the pg_get_backend_memory_contexts() when it finishes dumping? Regards, -- Atsushi Torikoshi NTT DATA CORPORATION +/* + * dump_memory_contexts + * Dumping local
Re: Get memory contexts of an arbitrary backend process
On 2020-09-04 21:46, Tomas Vondra wrote: On Fri, Sep 04, 2020 at 11:47:30AM +0900, Kasahara Tatsuhito wrote: On Fri, Sep 4, 2020 at 2:40 AM Tom Lane wrote: Kasahara Tatsuhito writes: > Yes, but it's not only for future expansion, but also for the > usability and the stability of this feature. > For example, if you want to read one dumped file multiple times and analyze it, > you will want the ability to just read the dump. If we design it to make that possible, how are we going to prevent disk space leaks from never-cleaned-up dump files? In my thought, with features such as a view that allows us to see a list of dumped files, it would be better to have a function that simply deletes the dump files associated with a specific PID, or to delete all dump files. Some files may be dumped with unexpected delays, so I think the cleaning feature will be necessary. ( Also, as the pgsql_tmp file, it might better to delete dump files when PostgreSQL start.) Or should we try to delete the dump file as soon as we can read it? IMO making the cleanup a responsibility of the users (e.g. by exposing the list of dumped files through a view and expecting users to delete them in some way) is rather fragile. I don't quite see what's the point of designing it this way. It was suggested this improves stability and usability of this feature, but surely making it unnecessarily complex contradicts both points? IMHO if the user needs to process the dump repeatedly, what's preventing him/her from storing it in a file, or something like that? At that point it's clear it's up to them to remove the file. So I suggest to keep the feature as simple as possible - hand the dump over and delete. +1. If there are no other objections, I'm going to accept this suggestion. Regards
Re: RFC: Logging plan of the running query
On 2021-10-13 23:28, Ekaterina Sokolova wrote: Hi, hackers! • The last version of patch is correct applied. It changes 8 files from /src/backend, and 9 other files. • I have 1 error and 1 warning during compilation on Mac. explain.c:4985:25: error: implicit declaration of function 'GetLockMethodLocalHash' is invalid in C99 [-Werror,-Wimplicit-function-declaration] hash_seq_init(&status, GetLockMethodLocalHash()); explain.c:4985:25: warning: incompatible integer to pointer conversion passing 'int' to parameter of type 'HTAB *' (aka 'struct HTAB *') [-Wint-conversion] hash_seq_init(&status, GetLockMethodLocalHash()); This error doesn't appear at my second machine with Ubuntu. I found the reason. You delete #ifdef USE_ASSERT_CHECKING from implementation of function GetLockMethodLocalHash(void), but this ifdef exists around function declaration. There may be a situation, when implementation exists without declaration, so files with using of function produce errors. I create new version of patch with fix of this problem. Thanks for fixing that! I'm agree that seeing the details of a query is a useful feature, but I have several doubts: 1) There are lots of changes of core's code. But not all users need this functionality. So adding this functionality like extension seemed more reasonable. It would be good if we can implement this feature in an extension, but as pg_query_state extension needs applying patches to PostgreSQL, I think this kind of feature needs PostgreSQL core modification. IMHO extensions which need core modification are not easy to use in production environments.. 2) There are many tools available to monitor the status of a query. How much do we need another one? For example: • pg_stat_progress_* is set of views with current status of ANALYZE, CREATE INDEX, VACUUM, CLUSTER, COPY, Base Backup. You can find it in PostgreSQL documentation [1]. • pg_query_state is contrib with 2 patches for core (I hope someday Community will support adding this patches to PostgreSQL). It contains function with printing table with pid, full query text, plan and current progress of every node like momentary EXPLAIN ANALYSE for SELECT, UPDATE, INSERT, DELETE. So it supports every flags and formats of EXPLAIN. You can find current version of pg_query_state on github [2]. Also I found old discussion about first version of it in Community [3]. Thanks for introducing the extension! I only took a quick look at pg_query_state, I have some questions. pg_query_state seems using shm_mq to expose the plan information, but there was a discussion that this kind of architecture would be tricky to do properly [1]. Does pg_query_state handle difficulties listed on the discussion? It seems the caller of the pg_query_state() has to wait until the target process pushes the plan information into shared memory, can it lead to deadlock situations? I came up with this question because when trying to make a view for memory contexts of other backends, we encountered deadlock situations. After all, we gave up view design and adopted sending signal and logging. Some of the comments of [3] seem useful for my patch, I'm going to consider them. Thanks! 3) Have you measured the overload of your feature? It would be really interesting to know the changes in speed and performance. I haven't measured it yet, but I believe that the overhead for backends which are not called pg_log_current_plan() would be slight since the patch just adds the logic for saving QueryDesc on ExecutorRun(). The overhead for backends which is called pg_log_current_plan() might not slight, but since the target process are assumed dealing with long-running query and the user want to know its plan, its overhead would be worth the cost. Thank you for working on this issue. I would be glad to continue to follow the development of this issue. Thanks for your help! -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: RFC: Logging plan of the running query
On 2021-10-15 15:17, torikoshia wrote: I only took a quick look at pg_query_state, I have some questions. pg_query_state seems using shm_mq to expose the plan information, but there was a discussion that this kind of architecture would be tricky to do properly [1]. Does pg_query_state handle difficulties listed on the discussion? Sorry, I forgot to add the URL. [1] https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com It seems the caller of the pg_query_state() has to wait until the target process pushes the plan information into shared memory, can it lead to deadlock situations? I came up with this question because when trying to make a view for memory contexts of other backends, we encountered deadlock situations. After all, we gave up view design and adopted sending signal and logging. Discussion at the following URL. https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: RFC: Logging plan of the running query
On 2021-11-02 20:32, Ekaterina Sokolova wrote: Thanks for your response! Hi! I'm here to answer your questions about contrib/pg_query_state. I only took a quick look at pg_query_state, I have some questions. pg_query_state seems using shm_mq to expose the plan information, but there was a discussion that this kind of architecture would be tricky to do properly [1]. Does pg_query_state handle difficulties listed on the discussion? [1] https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com I doubt that it was the right link. Sorry for make you confused, here is the link. https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com But on the topic I will say that extension really use shared memory, interaction is implemented by sending / receiving messages. This architecture provides the required reliability and convenience. As described in the link, using shared memory for this kind of work would need DSM and It'd be also necessary to exchange information between requestor and responder. For example, when I looked at a little bit of pg_query_state code, it looks like the size of the queue is fixed at QUEUE_SIZE, and I wonder how plans that exceed QUEUE_SIZE are handled. It seems the caller of the pg_query_state() has to wait until the target process pushes the plan information into shared memory, can it lead to deadlock situations? I came up with this question because when trying to make a view for memory contexts of other backends, we encountered deadlock situations. After all, we gave up view design and adopted sending signal and logging. Discussion at the following URL. https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com Before extracting information about side process we check its state. Information will only be retrieved for a process willing to provide it. Otherwise, we will receive an error message about impossibility of getting query execution statistics + process status. Also checking fact of extracting your own status exists. This is even verified in tests. Thanks for your attention. Just in case, I am ready to discuss this topic in more detail. I imagined the following procedure. Does it cause dead lock in pg_query_state? - session1 BEGIN; TRUNCATE t; - session2 BEGIN; TRUNCATE t; -- wait - session1 SELECT * FROM pg_query_state(); -- wait and dead locked? About overhead: I haven't measured it yet, but I believe that the overhead for backends which are not called pg_log_current_plan() would be slight since the patch just adds the logic for saving QueryDesc on ExecutorRun(). The overhead for backends which is called pg_log_current_plan() might not slight, but since the target process are assumed dealing with long-running query and the user want to know its plan, its overhead would be worth the cost. I think it would be useful for us to have couple of examples with a different number of rows compared to using without this functionality. Do you have any expectaion that the number of rows would affect the performance of this functionality? This patch adds some codes to ExecutorRun(), but I thought the number of rows would not give impact on the performance. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: RFC: Logging plan of the running query
On 2021-11-13 22:29, Bharath Rupireddy wrote: Thanks for your review! On Wed, Oct 13, 2021 at 7:58 PM Ekaterina Sokolova wrote: Thank you for working on this issue. I would be glad to continue to follow the development of this issue. Thanks for the patch. I'm not sure if v11 is the latest patch, if yes, I have the following comments: 1) Firstly, v11 patch isn't getting applied on the master - http://cfbot.cputube.org/patch_35_3142.log. Updated the patch. 2) I think we are moving away from if (!superuser()) checks, see the commit [1]. The goal is to let the GRANT-REVOKE system deal with who is supposed to run these system functions. Since pg_log_current_query_plan also writes the info to server logs, I think it should do the same thing as commit [1] did for pg_log_backend_memory_contexts. With v11, you are re-introducing the superuser() check in the pg_log_backend_memory_contexts which is wrong. Yeah, I removed superuser() check and make it possible to be executed by non-superusers when users are granted to do so. 3) I think SendProcSignalForLogInfo can be more generic, meaning, it can also send signal to auxiliary processes if asked to do this will simplify the things for pg_log_backend_memory_contexts and other patches like pg_print_backtrace. I would imagine it to be "bool SendProcSignalForLogInfo(pid_t pid, ProcSignalReason reason, bool signal_aux_proc);". I agree with your idea. Since sending signals to auxiliary processes to dump memory contexts and pg_print_backtrace is still under discussion, IMHO it would be better to refactor SendProcSignalForLogInfo after these patches are commited. Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 5499167a7ecc6f040d5fec817cf36a7ba0b5cbff Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 15 Nov 2021 21:20:43 +0900 Subject: [PATCH v12] Add function to log the untruncated query string and its plan for the query currently running on the backend with the specified process ID. Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_current_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Since some codes, tests and comments of pg_log_current_query_plan() are the same with pg_log_backend_memory_contexts(), this patch also refactors them to make them common. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova --- doc/src/sgml/func.sgml | 45 +++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 117 ++- src/backend/executor/execMain.c | 10 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/ipc/signalfuncs.c| 55 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 7 ++ src/backend/utils/adt/mcxtfuncs.c| 36 +- src/backend/utils/init/globals.c | 1 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 3 + src/include/miscadmin.h | 1 + src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h| 1 + src/test/regress/expected/misc_functions.out | 54 +++-- src/test/regress/sql/misc_functions.sql | 42 +-- 17 files changed, 333 insertions(+), 61 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 24447c0017..e12e1feeca 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25345,6 +25345,26 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_current_query_plan + +pg_log_current_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID along with the untruncated +query string. +They will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -25458,6 +25478,31 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunk
Re: RFC: Logging plan of the running query
On 2021-11-13 03:37, Justin Pryzby wrote: I reviewed this version of the patch - I have some language fixes. Thanks for your review! Attached patch that reflects your comments. Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom b8367e22d7a9898e4b85627ba8c203be273fc22f Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 15 Nov 2021 22:31:00 +0900 Subject: [PATCH v13] Add function to log the untruncated query string and its plan for the query currently running on the backend with the specified process ID. Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Since some codes, tests and comments of pg_log_query_plan() are the same with pg_log_backend_memory_contexts(), this patch also refactors them to make them common. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby --- doc/src/sgml/func.sgml | 45 +++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 117 ++- src/backend/executor/execMain.c | 10 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/ipc/signalfuncs.c| 55 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 7 ++ src/backend/utils/adt/mcxtfuncs.c| 36 +- src/backend/utils/init/globals.c | 1 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 3 + src/include/miscadmin.h | 1 + src/include/storage/procsignal.h | 1 + src/include/storage/signalfuncs.h| 22 src/include/tcop/pquery.h| 1 + src/test/regress/expected/misc_functions.out | 54 +++-- src/test/regress/sql/misc_functions.sql | 42 +-- 18 files changed, 355 insertions(+), 61 deletions(-) create mode 100644 src/include/storage/signalfuncs.h diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 24447c0017..7ffaa9a55d 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25345,6 +25345,26 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID along with the untruncated +query string. +They will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -25458,6 +25478,31 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_query_plan can be used +to log the plan of a backend process. For example: + +postgres=# SELECT pg_log_query_plan(201116); + pg_log_query_plan +--- + t +(1 row) + +The format of the query plan is the same as when VERBOSE, +COSTS, SETTINGS and +FORMAT TEXT are used in the EXPLAIN +command. For example: + +LOG: plan of the query running on backend with PID 17793 is: +Query Text: SELECT * FROM pgbench_accounts; +Seq Scan on public.pgbench_accounts (cost=0.00..52787.00 rows=200 width=97) + Output: aid, bid, abalance, filler +Settings: work_mem = '1MB' + +Note that nested statements (statements executed inside a function) are not +considered for logging. Only the plan of the most deeply nested query is logged. + + diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index 54c93b16c4..d7f0010e47 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -701,6 +701,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public; REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer) FROM PUBLIC; +REVOKE EXECUTE ON FUNCTION pg_log_query_plan(integer) F
Re: RFC: Logging plan of the running query
On 2021-11-15 23:15, Bharath Rupireddy wrote: I have another comment: isn't it a good idea that an overloaded version of the new function pg_log_query_plan can take the available explain command options as a text argument? I'm not sure if it is possible to get the stats like buffers, costs etc of a running query, if yes, something like pg_log_query_plan(pid, 'buffers', 'costs');? It looks to be an overkill at first sight, but these can be useful to know more detailed plan of the query. I also think the overloaded version would be useful. However as discussed in [1], it seems to introduce other difficulties. I think it would be enough that the first version of pg_log_query_plan doesn't take any parameters. [1] https://www.postgresql.org/message-id/ce86e4f72f09d5497e8ad3a162861d33%40oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: RFC: Logging plan of the running query
On 2021-11-17 22:44, Ekaterina Sokolova wrote: Hi! You forgot my last fix to build correctly on Mac. I have added it. Thanks for the notification! Since the patch could not be applied to the HEAD anymore, I also updated it. About our discussion of pg_query_state: torikoshia писал 2021-11-04 15:49: I doubt that it was the right link. Sorry for make you confused, here is the link. https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com Thank you. I'll see it soon. I imagined the following procedure. Does it cause dead lock in pg_query_state? - session1 BEGIN; TRUNCATE t; - session2 BEGIN; TRUNCATE t; -- wait - session1 SELECT * FROM pg_query_state(); -- wait and dead locked? As I know, pg_query_state use non-blocking read and write. I have wrote few tests trying to deadlock it (on 14 version), but all finished correctly. Have a nice day. Please feel free to contact me if you need any further information. Thanks for your information and help! -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom b8367e22d7a9898e4b85627ba8c203be273fc22f Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 26 Nov 2021 10:31:00 +0900 Subject: [PATCH v14] Add function to log the untruncated query string and its plan for the query currently running on the backend with the specified process ID. Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Since some codes, tests and comments of pg_log_query_plan() are the same with pg_log_backend_memory_contexts(), this patch also refactors them to make them common. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby --- doc/src/sgml/func.sgml | 45 +++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 117 ++- src/backend/executor/execMain.c | 10 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/ipc/signalfuncs.c| 55 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 7 ++ src/backend/utils/adt/mcxtfuncs.c| 36 +- src/backend/utils/init/globals.c | 1 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 3 + src/include/miscadmin.h | 1 + src/include/storage/lock.h | 2 - src/include/storage/procsignal.h | 1 + src/include/storage/signalfuncs.h| 22 src/include/tcop/pquery.h| 1 + src/test/regress/expected/misc_functions.out | 54 +++-- src/test/regress/sql/misc_functions.sql | 42 +-- 19 files changed, 355 insertions(+), 63 deletions(-) create mode 100644 src/include/storage/signalfuncs.h diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 0a725a6711..b84ead4341 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25358,6 +25358,26 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID along with the untruncated +query string. +They will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -25471,6 +25491,31 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_query_plan can be used +to log the plan of a backend process. For example: + +postgres=# SELECT pg_log_query_plan(201116); + pg_log_query_plan +--- + t +(1 row) + +The format of the query plan is the same as when VERBOSE, +COSTS, SETTINGS and +FORMAT TEXT are used in the EXPLAIN +command. For example: +
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On Wed, Jan 10, 2024 at 4:42 PM Masahiko Sawada wrote: Yeah, I'm still thinking it's better to implement this feature incrementally. Given we're closing to feature freeze, I think it's unlikely to get the whole feature into PG17 since there are still many design discussions we need in addition to what Torikoshi-san pointed out. The feature like "ignore errors" or "logging errors" would have higher possibilities. Even if we get only these parts of the whole "error table" feature into PG17, it will make it much easier to implement "error tables" feature. +1. I'm also going to make patch for "logging errors", since this functionality is isolated from v7 patch. Seems promising. I'll look at the patch. Thanks a lot! Sorry to attach v2 if you already reviewed v1.. On 2024-01-11 12:13, jian he wrote: On Tue, Jan 9, 2024 at 10:36 PM torikoshia wrote: On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada wrote: > If we want only such a feature we need to implement it together (the > patch could be split, though). But if some parts of the feature are > useful for users as well, I'd recommend implementing it incrementally. > That way, the patches can get small and it would be easy for reviewers > and committers to review/commit them. Jian, how do you think this comment? Looking back at the discussion so far, it seems that not everyone thinks saving table information is the best idea[1] and some people think just skipping error data is useful.[2] Since there are issues to be considered from the design such as physical/logical replication treatment, putting error information to table is likely to take time for consensus building and development. Wouldn't it be better to follow the following advice and develop the functionality incrementally? On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada wrote: > So I'm thinking we may be able to implement this > feature incrementally. The first step would be something like an > option to ignore all errors or an option to specify the maximum number > of errors to tolerate before raising an ERROR. The second step would > be to support logging destinations such as server logs and tables. Attached a patch for this "first step" with reference to v7 patch, which logged errors and simpler than latest one. - This patch adds new option SAVE_ERROR_TO, but currently only supports 'none', which means just skips error data. It is expected to support 'log' and 'table'. - This patch Skips just soft errors and don't handle other errors such as missing column data. Hi. I made the following change based on your patch (v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch) * when specified SAVE_ERROR_TO, move the initialization of ErrorSaveContext to the function BeginCopyFrom. I think that's the right place to initialize struct CopyFromState field. * I think your patch when there are N rows have malformed data, then it will initialize N ErrorSaveContext. In the struct CopyFromStateData, I changed it to ErrorSaveContext *escontext. So if an error occurred, you can just set the escontext accordingly. * doc: mention "If this option is omitted, COPY stops operation at the first error." * Since we only support 'none' for now, 'none' means we don't want ErrorSaveContext metadata, so we should set cstate->escontext->details_wanted to false. BTW I have question and comment about v15 patch: > + { > + /* > + * > + * InputFunctionCall is more faster than > InputFunctionCallSafe. > + * > + */ Have you measured this? When I tested it in an older patch, there were no big difference[3]. Thanks for pointing it out, I probably was over thinking. > - SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT > + SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT There was a comment that we shouldn't add new keyword for this[4]. Thanks for pointing it out. Thanks for reviewing! Updated the patch merging your suggestions except below points: + cstate->num_errors = 0; Since cstate is already initialized in below lines, this may be redundant. | /* Allocate workspace and zero all fields */ | cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData)); + Assert(!cstate->escontext->details_wanted); I'm not sure this is necessary, considering we're going to add other options like 'table' and 'log', which need details_wanted soon. -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom a3f14a0e7e9a7b5fb961ad6b6b7b163cf6534a26 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 12 Jan 2024 11:32
doc: add LITERAL tag to RETURNING
Hi, RETURNING is usually tagged with appropriate tags, such as , but not in the 'query' section of COPY. https://www.postgresql.org/docs/devel/sql-copy.html Would it be better to put here as well? -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 3c9efe404310bf01d79b2f0f006541ebc0b170a0 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 12 Jan 2024 14:33:47 +0900 Subject: [PATCH v1] Added literal tag for RETURNING. --- doc/src/sgml/ref/copy.sgml | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 18ecc69c33..e2ffbbdf84 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -128,10 +128,10 @@ COPY { table_name [ ( For INSERT, UPDATE and - DELETE queries a RETURNING clause must be provided, - and the target relation must not have a conditional rule, nor - an ALSO rule, nor an INSTEAD rule - that expands to multiple statements. + DELETE queries a RETURNING clause + must be provided, and the target relation must not have a conditional + rule, nor an ALSO rule, nor an + INSTEAD rule that expands to multiple statements. base-commit: 08c3ad27eb5348d0cbffa843a3edb11534f9904a -- 2.39.2
Re: doc: add LITERAL tag to RETURNING
On 2024-01-12 20:56, Alvaro Herrera wrote: On 2024-Jan-12, Ashutosh Bapat wrote: On Fri, Jan 12, 2024 at 11:27 AM torikoshia wrote: > > RETURNING is usually tagged with appropriate tags, such as , > but not in the 'query' section of COPY. The patch looks good. Good catch, pushed. It has user-visible effect, so I backpatched it. Thanks for your review and push. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2024-01-16 00:17, Alexander Korotkov wrote: On Mon, Jan 15, 2024 at 8:44 AM Masahiko Sawada wrote: On Mon, Jan 15, 2024 at 8:21 AM Alexander Korotkov wrote: > > On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada wrote: > > Thank you for updating the patch. Here are two comments: > > > > --- > > + if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED && > > + cstate->num_errors > 0) > > + ereport(WARNING, > > + errmsg("%zd rows were skipped due to data type incompatibility", > > + cstate->num_errors)); > > + > > /* Done, clean up */ > > error_context_stack = errcallback.previous; > > > > If a malformed input is not the last data, the context message seems odd: > > > > postgres(1:1769258)=# create table test (a int); > > CREATE TABLE > > postgres(1:1769258)=# copy test from stdin (save_error_to none); > > Enter data to be copied followed by a newline. > > End with a backslash and a period on a line by itself, or an EOF signal. > > >> a > > >> 1 > > >> > > 2024-01-15 05:05:53.980 JST [1769258] WARNING: 1 rows were skipped > > due to data type incompatibility > > 2024-01-15 05:05:53.980 JST [1769258] CONTEXT: COPY test, line 3: "" > > COPY 1 > > > > I think it's better to report the WARNING after resetting the > > error_context_stack. Or is a WARNING really appropriate here? The > > v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but > > the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to > > WARNING without explanation. > > Thank you for noticing this. I think NOTICE is more appropriate here. > There is nothing to "worry" about: the user asked to ignore the errors > and we did. And yes, it doesn't make sense to use the last line as > the context. Fixed. > > > --- > > +-- test missing data: should fail > > +COPY check_ign_err FROM STDIN WITH (save_error_to none); > > +1 {1} > > +\. > > > > We might want to cover the extra data cases too. > > Agreed, the relevant test is added. Thank you for updating the patch. I have one minor point: + if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED && + cstate->num_errors > 0) + ereport(NOTICE, + errmsg("%zd rows were skipped due to data type incompatibility", + cstate->num_errors)); + We can use errmsg_plural() instead. Makes sense. Fixed. I have a question about the option values; do you think we need to have another value of SAVE_ERROR_TO option to explicitly specify the current default behavior, i.e. not accept any error? With the v4 patch, the user needs to omit SAVE_ERROR_TO option to accept errors during COPY FROM. If we change the default behavior in the future, many users will be affected and probably end up changing their applications to keep the current default behavior. Valid point. I've implemented the handling of CopySaveErrorToChoice in a similar way to CopyHeaderChoice. Please, check the revised patch attached. Thanks for updating the patch! Here is a minor comment: +/* + * Extract a defGetCopySaveErrorToChoice value from a DefElem. + */ Should be Extract a "CopySaveErrorToChoice"? BTW I'm thinking we should add a column to pg_stat_progress_copy that counts soft errors. I'll suggest this in another thread. -- Regards, Alexander Korotkov -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Add tuples_skipped to pg_stat_progress_copy
Hi, 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to skip malformed data, but there is no way to watch the number of skipped rows during COPY. Attached patch adds tuples_skipped to pg_stat_progress_copy, which counts the number of skipped tuples because source data is malformed. If SAVE_ERROR_TO is not specified, this column remains zero. The advantage would be that users can quickly notice and stop COPYing when there is a larger amount of skipped data than expected, for example. As described in commit log, it is expected to add more choices for SAVE_ERROR_TO like 'log' and using such options may enable us to know the number of skipped tuples during COPY, but exposed in pg_stat_progress_copy would be easier to monitor. What do you think? -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 98e546ff2de380175708ce003f67c993299a3fb3 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Wed, 17 Jan 2024 13:41:44 +0900 Subject: [PATCH v1] Add tuples_skipped to pg_stat_progress_copy 132de9968840c enabled COPY to skip malformed data, but there is no way to watch the number of skipped rows during COPY. This patch adds tuples_skipped to pg_stat_progress_copy, which counts the number of skipped tuple because source data is malformed. If SAVE_ERROR_TO is not specified, this column remains zero. Needs catalog bump. --- doc/src/sgml/monitoring.sgml | 10 ++ src/backend/catalog/system_views.sql | 3 ++- src/backend/commands/copyfrom.c | 5 + src/include/commands/progress.h | 1 + src/test/regress/expected/rules.out | 3 ++- 5 files changed, 20 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index b804eb8b5e..96ed774670 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5779,6 +5779,16 @@ FROM pg_stat_get_backend_idset() AS backendid; WHERE clause of the COPY command. + + + + tuples_skipped bigint + + + Number of tuples skipped because they contain malformed data + (if SAVE_ERROR_TO is specified, otherwise zero). + + diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e43e36f5ac..6288270e2b 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1318,7 +1318,8 @@ CREATE VIEW pg_stat_progress_copy AS S.param1 AS bytes_processed, S.param2 AS bytes_total, S.param3 AS tuples_processed, -S.param4 AS tuples_excluded +S.param4 AS tuples_excluded, +S.param7 AS tuples_skipped FROM pg_stat_get_progress_info('COPY') AS S LEFT JOIN pg_database D ON S.datid = D.oid; diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 4058b08134..fe33b0facf 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -650,6 +650,7 @@ CopyFrom(CopyFromState cstate) CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */ int64 processed = 0; int64 excluded = 0; + int64 skipped = 0; bool has_before_insert_row_trig; bool has_instead_insert_row_trig; bool leafpart_use_multi_insert = false; @@ -1012,6 +1013,10 @@ CopyFrom(CopyFromState cstate) */ cstate->escontext->error_occurred = false; + /* Report that this tuple was skipped by the SAVE_ERROR_TO clause */ + pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED, + ++skipped); + continue; } diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h index a458c8c50a..73afa77a9c 100644 --- a/src/include/commands/progress.h +++ b/src/include/commands/progress.h @@ -142,6 +142,7 @@ #define PROGRESS_COPY_TUPLES_EXCLUDED 3 #define PROGRESS_COPY_COMMAND 4 #define PROGRESS_COPY_TYPE 5 +#define PROGRESS_COPY_TUPLES_SKIPPED 6 /* Commands of COPY (as advertised via PROGRESS_COPY_COMMAND) */ #define PROGRESS_COPY_COMMAND_FROM 1 diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 55f2e95352..5e846b01e6 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1988,7 +1988,8 @@ pg_stat_progress_copy| SELECT s.pid, s.param1 AS bytes_processed, s.param2 AS bytes_total, s.param3 AS tuples_processed, -s.param4 AS tuples_excluded +s.param4 AS tuples_excluded, +s.param7 AS tuples_skipped FROM (pg_stat_get_progress_info('COPY'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20) LEFT JOIN pg_database d ON ((s.datid = d.oid))); pg_stat_progress_create_index| SELECT s.pid, base-commit: 65c5864d7fac46516f17ee89085e349a87ee5bd7 -- 2.39.2
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Hi, Thanks for applying! + errmsg_plural("%zd row were skipped due to data type incompatibility", Sorry, I just noticed it, but 'were' should be 'was' here? BTW I'm thinking we should add a column to pg_stat_progress_copy that counts soft errors. I'll suggest this in another thread. Please do! I've started it here: https://www.postgresql.org/message-id/d12fd8c99adcae2744212cb23feff...@oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2024-01-18 10:10, jian he wrote: On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada wrote: On Thu, Jan 18, 2024 at 6:38 AM Tom Lane wrote: > > Alexander Korotkov writes: > > On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi > > wrote: > >> On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which > >> indicate "immediately error out" and 'just ignore the failure' > >> respectively, but these options hardly seem to denote a 'location', > >> and appear more like an 'action'. I somewhat suspect that this > >> parameter name intially conceived with the assupmtion that it would > >> take file names or similar parameters. I'm not sure if others will > >> agree, but I think the parameter name might not be the best > >> choice. For instance, considering the addition of the third value > >> 'log', something like on_error_action (error, ignore, log) would be > >> more intuitively understandable. What do you think? > > > Probably, but I'm not sure about that. The name SAVE_ERROR_TO assumes > > the next word will be location, not action. With some stretch we can > > assume 'error' to be location. I think it would be even more stretchy > > to think that SAVE_ERROR_TO is followed by action. > > The other problem with this terminology is that with 'none', what it > is doing is the exact opposite of "saving" the errors. I agree we > need a better name. Agreed. > > Kyotaro-san's suggestion isn't bad, though I might shorten it to > error_action {error|ignore|log} (or perhaps "stop" instead of "error")? > You will need a separate parameter anyway to specify the destination > of "log", unless "none" became an illegal table name when I wasn't > looking. I don't buy that one parameter that has some special values > while other values could be names will be a good design. Moreover, > what if we want to support (say) log-to-file along with log-to-table? > Trying to distinguish a file name from a table name without any other > context seems impossible. I've been thinking we can add more values to this option to log errors not only to the server logs but also to the error table (not sure details but I imagined an error table is created for each table on error), without an additional option for the destination name. The values would be like error_action {error|ignore|save-logs|save-table}. another idea: on_error {error|ignore|other_future_option} if not specified then by default ERROR. You can also specify ERROR or IGNORE for now. I agree, the parameter "error_action" is better than "location". I'm not sure whether error_action or on_error is better, but either way "error_action error" and "on_error error" seems a bit odd to me. I feel "stop" is better for both cases as Tom suggested. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2024-01-18 16:59, Alexander Korotkov wrote: On Thu, Jan 18, 2024 at 4:16 AM torikoshia wrote: On 2024-01-18 10:10, jian he wrote: > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada > wrote: >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane wrote: >> > Kyotaro-san's suggestion isn't bad, though I might shorten it to >> > error_action {error|ignore|log} (or perhaps "stop" instead of "error")? >> > You will need a separate parameter anyway to specify the destination >> > of "log", unless "none" became an illegal table name when I wasn't >> > looking. I don't buy that one parameter that has some special values >> > while other values could be names will be a good design. Moreover, >> > what if we want to support (say) log-to-file along with log-to-table? >> > Trying to distinguish a file name from a table name without any other >> > context seems impossible. >> >> I've been thinking we can add more values to this option to log errors >> not only to the server logs but also to the error table (not sure >> details but I imagined an error table is created for each table on >> error), without an additional option for the destination name. The >> values would be like error_action {error|ignore|save-logs|save-table}. >> > > another idea: > on_error {error|ignore|other_future_option} > if not specified then by default ERROR. > You can also specify ERROR or IGNORE for now. > > I agree, the parameter "error_action" is better than "location". I'm not sure whether error_action or on_error is better, but either way "error_action error" and "on_error error" seems a bit odd to me. I feel "stop" is better for both cases as Tom suggested. OK. What about this? on_error {stop|ignore|other_future_option} where other_future_option might be compound like "file 'copy.log'" or "table 'copy_log'". Thanks, also +1 from me. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Parent/child context relation in pg_get_backend_memory_contexts()
On 2024-01-16 18:41, Melih Mutlu wrote: Hi, Thanks for reviewing. torikoshia , 10 Oca 2024 Çar, 09:37 tarihinde şunu yazdı: + + + context_id int4 + + + Current context id. Note that the context id is a temporary id and may + change in each invocation + + + + + + path int4[] + + + Path to reach the current context from TopMemoryContext. Context ids in + this list represents all parents of the current context. This can be + used to build the parent and child relation + + + + + + total_bytes_including_children int8 + + + Total bytes allocated for this memory context including its children + + These columns are currently added to the bottom of the table, but it may be better to put semantically similar items close together and change the insertion position with reference to other system views. For example, - In pg_group and pg_user, 'id' is placed on the line following 'name', so 'context_id' be placed on the line following 'name' - 'path' is similar with 'parent' and 'level' in that these are information about the location of the context, 'path' be placed to next to them. If we do this, orders of columns in the system view should be the same, I think. I've done what you suggested. Also moved "total_bytes_including_children" right after "total_bytes". 14dd0f27d have introduced new macro foreach_int. It seems to be able to make the code a bit simpler and the commit log says this macro is primarily intended for use in new code. For example: Makes sense. Done. Thanks for updating the patch! + Current context id. Note that the context id is a temporary id and may + change in each invocation + + It clearly states that the context id is temporary, but I am a little concerned about users who write queries that refer to this view multiple times without using CTE. If you agree, how about adding some description like below you mentioned before? We still need to use cte since ids are not persisted and might change in each run of pg_backend_memory_contexts. Materializing the result can prevent any inconsistencies due to id change. Also it can be even good for performance reasons as well. We already have additional description below the table which explains each column of the system view. For example pg_locks: https://www.postgresql.org/docs/devel/view-pg-locks.html Also giving an example query something like this might be useful. -- show all the parent context names of ExecutorState with contexts as ( select * from pg_backend_memory_contexts ) select name from contexts where array[context_id] <@ (select path from contexts where name = 'ExecutorState'); -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2024-01-18 23:59, jian he wrote: Hi. patch refactored based on "on_error {stop|ignore}" doc changes: --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -43,7 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * } FORCE_NOT_NULL { ( column_name [, ...] ) | * } FORCE_NULL { ( column_name [, ...] ) | * } -SAVE_ERROR_TO 'class="parameter">location' +ON_ERROR 'class="parameter">error_action' ENCODING 'class="parameter">encoding_name' @@ -375,20 +375,20 @@ COPY { table_name [ ( -SAVE_ERROR_TO +ON_ERROR - Specifies to save error information to class="parameter"> - location when there is malformed data in the input. - Currently, only error (default) and none + Specifies which + error_action to perform when there is malformed data in the input. + Currently, only stop (default) and ignore values are supported. - If the error value is specified, + If the stop value is specified, COPY stops operation at the first error. - If the none value is specified, + If the ignore value is specified, COPY skips malformed data and continues copying data. The option is allowed only in COPY FROM. - The none value is allowed only when - not using binary format. + Only stop value is allowed only when + using binary format. Thanks for making the patch! Here are some comments: - The none value is allowed only when - not using binary format. + Only stop value is allowed only when + using binary format. The second 'only' may be unnecessary. - /* If SAVE_ERROR_TO is specified, skip rows with soft errors */ + /* If ON_ERROR is specified with IGNORE, skip rows with soft errors */ This is correct now, but considering future works which add other options like "file 'copy.log'" and "table 'copy_log'", it may be better not to limit the case to 'IGNORE'. How about something like this? If ON_ERROR is specified and the value is not STOP, skip rows with soft errors -COPY x from stdin (format BINARY, save_error_to none); -COPY x to stdin (save_error_to none); +COPY x from stdin (format BINARY, ON_ERROR ignore); +COPY x from stdin (ON_ERROR unsupported); COPY x to stdin (format TEXT, force_quote(a)); COPY x from stdin (format CSV, force_quote(a)); In the existing test for copy2.sql, the COPY options are written in lower case(e.g. 'format') and option value(e.g. 'BINARY') are written in upper case. It would be more consistent to align them. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2024-01-19 22:27, Alexander Korotkov wrote: Hi! On Fri, Jan 19, 2024 at 2:37 PM torikoshia wrote: Thanks for making the patch! The patch is pushed! The proposed changes are incorporated excluding this. > - /* If SAVE_ERROR_TO is specified, skip rows > with soft errors */ > + /* If ON_ERROR is specified with IGNORE, skip > rows with soft errors */ This is correct now, but considering future works which add other options like "file 'copy.log'" and "table 'copy_log'", it may be better not to limit the case to 'IGNORE'. How about something like this? If ON_ERROR is specified and the value is not STOP, skip rows with soft errors I think when we have more options, then we wouldn't just skip rows with soft errors but rather save them. So, I left this comment as is for now. Agreed. Thanks for the notification! -- Regards, Alexander Korotkov -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Add tuples_skipped to pg_stat_progress_copy
On 2024-01-17 14:47, Masahiko Sawada wrote: On Wed, Jan 17, 2024 at 2:22 PM torikoshia wrote: Hi, 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to skip malformed data, but there is no way to watch the number of skipped rows during COPY. Attached patch adds tuples_skipped to pg_stat_progress_copy, which counts the number of skipped tuples because source data is malformed. If SAVE_ERROR_TO is not specified, this column remains zero. The advantage would be that users can quickly notice and stop COPYing when there is a larger amount of skipped data than expected, for example. As described in commit log, it is expected to add more choices for SAVE_ERROR_TO like 'log' and using such options may enable us to know the number of skipped tuples during COPY, but exposed in pg_stat_progress_copy would be easier to monitor. What do you think? +1 The patch is pretty simple. Here is a comment: + (if SAVE_ERROR_TO is specified, otherwise zero). + + To be precise, this counter only advances when a value other than 'ERROR' is specified to SAVE_ERROR_TO option. Thanks for your comment and review! Updated the patch according to your comment and option name change by b725b7eec. BTW, based on this patch, I think we can add another option which specifies the maximum tolerable number of malformed rows. I remember this was discussed in [1], and feel it would be useful when loading 'dirty' data but there is a limit to how dirty it can be. Attached 0002 is WIP patch for this(I haven't added doc yet). This may be better discussed in another thread, but any comments(e.g. necessity of this option, option name) are welcome. [1] https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 571ada768bdb68a31f295cbcb28f4348f253989d Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 22 Jan 2024 23:57:24 +0900 Subject: [PATCH v2 1/2] Add tuples_skipped to pg_stat_progress_copy 132de9968840c enabled COPY to skip malformed data, but there is no way to watch the number of skipped rows during COPY. This patch adds tuples_skipped to pg_stat_progress_copy, which counts the number of skipped tuple because source data is malformed. This column only advances when a value other than stop is specified to ON_ERROR. Needs catalog bump. --- doc/src/sgml/monitoring.sgml | 11 +++ src/backend/catalog/system_views.sql | 3 ++- src/backend/commands/copyfrom.c | 5 + src/include/commands/progress.h | 1 + src/test/regress/expected/rules.out | 3 ++- 5 files changed, 21 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 6e74138a69..cfc13b3580 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5780,6 +5780,17 @@ FROM pg_stat_get_backend_idset() AS backendid; WHERE clause of the COPY command. + + + + tuples_skipped bigint + + + Number of tuples skipped because they contain malformed data. + This counter only advances when a value other than + stop is specified to ON_ERROR. + + diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e43e36f5ac..6288270e2b 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1318,7 +1318,8 @@ CREATE VIEW pg_stat_progress_copy AS S.param1 AS bytes_processed, S.param2 AS bytes_total, S.param3 AS tuples_processed, -S.param4 AS tuples_excluded +S.param4 AS tuples_excluded, +S.param7 AS tuples_skipped FROM pg_stat_get_progress_info('COPY') AS S LEFT JOIN pg_database D ON S.datid = D.oid; diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 173a736ad5..8ab3777664 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -650,6 +650,7 @@ CopyFrom(CopyFromState cstate) CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */ int64 processed = 0; int64 excluded = 0; + int64 skipped = 0; bool has_before_insert_row_trig; bool has_instead_insert_row_trig; bool leafpart_use_multi_insert = false; @@ -1012,6 +1013,10 @@ CopyFrom(CopyFromState cstate) */ cstate->escontext->error_occurred = false; + /* Report that this tuple was skipped by the ON_ERROR clause */ + pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED, + ++skipped); + continue; } diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h index a458c8c50a..73afa77a9c 100644 --- a/src/include/commands/progress.h +++ b/src/include/commands/progress.h @@ -142,6 +142,7 @@ #define PROGRESS_COPY_TUPLES_EXCLUDED 3 #define PROGRESS_COPY_COMMAND 4 #define PROGRESS_
Re: Add tuples_skipped to pg_stat_progress_copy
On 2024-01-24 17:05, Masahiko Sawada wrote: On Tue, Jan 23, 2024 at 1:02 AM torikoshia wrote: On 2024-01-17 14:47, Masahiko Sawada wrote: > On Wed, Jan 17, 2024 at 2:22 PM torikoshia > wrote: >> >> Hi, >> >> 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to >> skip malformed data, but there is no way to watch the number of >> skipped >> rows during COPY. >> >> Attached patch adds tuples_skipped to pg_stat_progress_copy, which >> counts the number of skipped tuples because source data is malformed. >> If SAVE_ERROR_TO is not specified, this column remains zero. >> >> The advantage would be that users can quickly notice and stop COPYing >> when there is a larger amount of skipped data than expected, for >> example. >> >> As described in commit log, it is expected to add more choices for >> SAVE_ERROR_TO like 'log' and using such options may enable us to know >> the number of skipped tuples during COPY, but exposed in >> pg_stat_progress_copy would be easier to monitor. >> >> >> What do you think? > > +1 > > The patch is pretty simple. Here is a comment: > > + (if SAVE_ERROR_TO is specified, otherwise > zero). > + > + > > To be precise, this counter only advances when a value other than > 'ERROR' is specified to SAVE_ERROR_TO option. Thanks for your comment and review! Updated the patch according to your comment and option name change by b725b7eec. Thanks! The patch looks good to me. I'm going to push it tomorrow, barring any objections. Thanks! BTW, based on this patch, I think we can add another option which specifies the maximum tolerable number of malformed rows. I remember this was discussed in [1], and feel it would be useful when loading 'dirty' data but there is a limit to how dirty it can be. Attached 0002 is WIP patch for this(I haven't added doc yet). Yeah, it could be a good option. This may be better discussed in another thread, but any comments(e.g. necessity of this option, option name) are welcome. I'd recommend forking a new thread for this option. As far as I remember, there also was an opinion that "reject limit" stuff is not very useful. OK, I'll make another thread for this. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Add new error_action COPY ON_ERROR "log"
Hi, As described in 9e2d870119, COPY ON_EEOR is expected to have more "error_action". (Note that option name was changed by b725b7eec) I'd like to have a new option "log", which skips soft errors and logs information that should have resulted in errors to PostgreSQL log. I think this option has some advantages like below: 1) We can know which number of line input data was not loaded and reason. Example: =# copy t1 from stdin with (on_error log); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1 >> 2 >> 3 >> z >> \. LOG: invalid input syntax for type integer: "z" NOTICE: 1 row was skipped due to data type incompatibility COPY 3 =# \! tail data/log/postgresql*.log LOG: 22P02: invalid input syntax for type integer: "z" CONTEXT: COPY t1, line 4, column i: "z" LOCATION: pg_strtoint32_safe, numutils.c:620 STATEMENT: copy t1 from stdin with (on_error log); 2) Easier maintenance than storing error information in tables or proprietary log files. For example, in case a large number of soft errors occur, some mechanisms are needed to prevent an infinite increase in the size of the destination data, but we can left it to PostgreSQL's log rotation. Attached a patch. This basically comes from previous discussion[1] which did both "ignore" and "log" soft error. As shown in the example above, the log output to the client does not contain CONTEXT, so I'm a little concerned that client cannot see what line of the input data had a problem without looking at the server log. What do you think? [1] https://www.postgresql.org/message-id/c0fb57b82b150953f26a5c7e340412e8%40oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 04e643facfea4b4e8dd174d22fbe5e008747a91a Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 26 Jan 2024 01:17:59 +0900 Subject: [PATCH v1] Add new error_action "log" to ON_ERROR option Currently ON_ERROR option only has "ignore" to skip malformed data and there are no ways to know where and why COPY skipped them. "log" skips malformed data as well as "ignore", but it logs information that should have resulted in errors to PostgreSQL log. --- doc/src/sgml/ref/copy.sgml | 8 ++-- src/backend/commands/copy.c | 4 +++- src/backend/commands/copyfrom.c | 24 src/include/commands/copy.h | 1 + src/test/regress/expected/copy2.out | 14 +- src/test/regress/sql/copy2.sql | 9 + 6 files changed, 48 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 21a5c4a052..9662c90a8b 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -380,12 +380,16 @@ COPY { table_name [ ( Specifies which error_action to perform when there is malformed data in the input. - Currently, only stop (default) and ignore - values are supported. + Currently, only stop (default), ignore + and log values are supported. If the stop value is specified, COPY stops operation at the first error. If the ignore value is specified, COPY skips malformed data and continues copying data. + If the log value is specified, + COPY behaves the same as ignore, exept that + it logs information that should have resulted in errors to PostgreSQL log at + INFO level. The option is allowed only in COPY FROM. Only stop value is allowed when using binary format. diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cc0786c6f4..812ca63350 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -415,13 +415,15 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) return COPY_ON_ERROR_STOP; /* - * Allow "stop", or "ignore" values. + * Allow "stop", "ignore" or "log" values. */ sval = defGetString(def); if (pg_strcasecmp(sval, "stop") == 0) return COPY_ON_ERROR_STOP; if (pg_strcasecmp(sval, "ignore") == 0) return COPY_ON_ERROR_IGNORE; + if (pg_strcasecmp(sval, "log") == 0) + return COPY_ON_ERROR_LOG; ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 1fe70b9133..7886bd5353 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1013,6 +1013,23 @@ CopyFrom(CopyFromState cstate) */ cstate->escontext->error_occurred = false; + else if (cstate->opts.on_error == COPY_ON_ERROR_LOG) + { +/* Adjust elevel so we don't jump out */ +cstate->escontext->error_data->elevel = LOG; + +/* + * Despite the name, this won't raise an error since elevel is + * LOG now. + */ +ThrowErrorData(cstate->escontext->error_data); + +/* Initialize escontext in preparation for next soft error */
Add new COPY option REJECT_LIMIT
Hi, 9e2d870 enabled the COPY command to skip soft error, and I think we can add another option which specifies the maximum tolerable number of soft errors. I remember this was discussed in [1], and feel it would be useful when loading 'dirty' data but there is a limit to how dirty it can be. Attached a patch for this. What do you think? [1] https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 7f111e98e21654c4ca338c93d7cbb4ec9acaabcb Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 26 Jan 2024 18:32:40 +0900 Subject: [PATCH v1] Add new COPY option REJECT_LIMIT REJECT_LIMIT specifies the maximum tolerable number of malformed rows. If input data has more malformed errors than this value, entire COPY fails. This option must be used with ON_ERROR to be set to other than stop. --- doc/src/sgml/ref/copy.sgml | 13 + src/backend/commands/copy.c | 16 src/backend/commands/copyfrom.c | 6 ++ src/include/commands/copy.h | 1 + src/test/regress/expected/copy2.out | 10 ++ src/test/regress/sql/copy2.sql | 21 + 6 files changed, 67 insertions(+) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 21a5c4a052..8982e8464a 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -393,6 +393,19 @@ COPY { table_name [ ( + +REJECT_LIMIT + + + Specifies the maximum tolerable number of malformed rows. + If input data has caused more malformed errors than this value, entire + COPY fails. + This option must be used with ON_ERROR to be set to + other than stop. + + + + ENCODING diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cc0786c6f4..ca5263d588 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -615,6 +615,22 @@ ProcessCopyOptions(ParseState *pstate, on_error_specified = true; opts_out->on_error = defGetCopyOnErrorChoice(defel, pstate, is_from); } + else if (strcmp(defel->defname, "reject_limit") == 0) + { + int64 reject_limit = defGetInt64(defel); + + if (!opts_out->on_error) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("REJECT_LIMIT requires ON_ERROR to be set to other than stop"))); + if (opts_out->reject_limit > 0) +errorConflictingDefElem(defel, pstate); + if (reject_limit <= 0) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("REJECT_LIMIT must be greater than zero"))); + opts_out->reject_limit = reject_limit; + } else ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 1fe70b9133..15066887ea 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1017,6 +1017,12 @@ CopyFrom(CopyFromState cstate) pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED, ++skipped); + if (cstate->opts.reject_limit > 0 && skipped > cstate->opts.reject_limit) +ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("exceeded the number specified by REJECT LIMIT \"%d\"", +cstate->opts.reject_limit))); + continue; } diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index b3da3cb0be..8f8dab9524 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -73,6 +73,7 @@ typedef struct CopyFormatOptions bool *force_null_flags; /* per-column CSV FN flags */ bool convert_selectively; /* do selective binary conversion? */ CopyOnErrorChoice on_error; /* what to do when error happened */ + int reject_limit; /* tolerable number of malformed rows */ List *convert_select; /* list of column names (can be NIL) */ } CopyFormatOptions; diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index 25c401ce34..28de7a2685 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -108,6 +108,10 @@ COPY x to stdin (format BINARY, on_error unsupported); ERROR: COPY ON_ERROR cannot be used with COPY TO LINE 1: COPY x to stdin (format BINARY, on_error unsupported); ^ +COPY x from stdin with (reject_limit 3); +ERROR: REJECT_LIMIT requires ON_ERROR to be set to other than stop +COPY x from stdin with (on_error ignore, reject_limit 0); +ERROR: REJECT_LIMIT must be greater than zero -- too many columns in column list: should fail COPY x (a, b, c, d, e, d, c) from stdin; ERROR: column "d" specified more than once @@ -751,6 +755,12 @@ CONTEXT: COPY check_ign_err, line 1: "1 {1}" COPY check_ign_err FROM STDIN WITH (on_error ignore); ERROR: extra data after last expected column CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc" +--
Re: Small fix on COPY ON_ERROR document
On 2024-01-27 00:04, David G. Johnston wrote: On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA wrote: On Fri, 26 Jan 2024 00:00:57 -0700 "David G. Johnston" wrote: I will need to make this tweak and probably a couple others to my own suggestions in 12 hours or so. And here is my v2. Notably I choose to introduce the verbiage "soft error" and then define in the ON_ERROR clause the specific soft error that matters here - "invalid input syntax". I also note the log message behavior when ignore mode is chosen. I haven't confirmed that it is accurate but that is readily tweaked if approved of. David J. Thanks for refining the doc. + Specifies which how to behave when encountering a soft error. To be consistent with other parts in the manual[1][2], should be “soft” error? + An error_action value of + stop means fail the command, while + ignore means discard the input row and continue with the next one. + The default is stop Is "." required at the end of the line? + + The only relevant soft error is "invalid input syntax", which manifests when attempting + to create a column value from the text input. + I think it is not restricted to "invalid input syntax". We can handle out of range error: =# create table t1(i int); CREATE TABLE =# copy t1 from stdin with(ON_ERROR ignore); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1 >> \. NOTICE: 1 row was skipped due to data type incompatibility COPY 0 Also, I'm a little concerned that users might wonder what soft error is. Certainly there are already references to "soft" errors in the manual, but they seem to be for developer, such as creating new TYPE for PostgreSQL. It might be better to describe what soft error is like below: -- src/backend/utils/fmgr/README An error reported "softly" must be safe, in the sense that there is no question about our ability to continue normal processing of the transaction. [1] https://www.postgresql.org/docs/devel/sql-createtype.html [2] https://www.postgresql.org/docs/devel/functions-info.html -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Add new error_action COPY ON_ERROR "log"
On Fri, Jan 26, 2024 at 10:44 PM jian he wrote: I doubt the following part: If the log value is specified, COPY behaves the same as ignore, exept that it logs information that should have resulted in errors to PostgreSQL log at INFO level. I think it does something like: When an error happens, cstate->escontext->error_data->elevel will be ERROR you manually change the cstate->escontext->error_data->elevel to LOG, then you call ThrowErrorData. but it's not related to `INFO level`? my log_min_messages is default, warning. Thanks! Modified them to NOTICE in accordance with the following summary message: NOTICE: x row was skipped due to data type incompatibility On 2024-01-27 00:43, David G. Johnston wrote: On Thu, Jan 25, 2024 at 9:42 AM torikoshia wrote: Hi, As described in 9e2d870119, COPY ON_EEOR is expected to have more "error_action". (Note that option name was changed by b725b7eec) I'd like to have a new option "log", which skips soft errors and logs information that should have resulted in errors to PostgreSQL log. Seems like an easy win but largely unhelpful in the typical case. I suppose ETL routines using this feature may be running on their machine under root or "postgres" but in a system where they are not this very useful information is inaccessible to them. I suppose the DBA could set up an extractor to send these specific log lines elsewhere but that seems like enough hassle to disfavor this approach and favor one that can place the soft error data and feedback into user-specified tables in the same database. Setting up temporary tables or unlogged tables probably is going to be a more acceptable methodology than trying to get to the log files. David J. I agree that not a few people would prefer to store error information in tables and there have already been suggestions[1]. OTOH not everyone thinks saving table information is the best idea[2]. I think it would be desirable for ON_ERROR to be in a form that allows the user to choose where to store error information from among some options, such as table, log and file. "ON_ERROR log" would be useful at least in the case of 'running on their machine under root or "postgres"' as you pointed out. [1] https://www.postgresql.org/message-id/CACJufxEkkqnozdnvNMGxVAA94KZaCPkYw_Cx4JKG9ueNaZma_A%40mail.gmail.com [2] https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgm...@awork3.anarazel.de -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 5f44cc7525641302842a3d67c14ebb09615bf67b Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 29 Jan 2024 12:02:32 +0900 Subject: [PATCH v2] Add new error_action "log" to ON_ERROR option Currently ON_ERROR option only has "ignore" to skip malformed data and there are no ways to know where and why COPY skipped them. "log" skips malformed data as well as "ignore", but it logs information that should have resulted in errors to PostgreSQL log. --- doc/src/sgml/ref/copy.sgml | 9 +++-- src/backend/commands/copy.c | 4 +++- src/backend/commands/copyfrom.c | 24 src/include/commands/copy.h | 1 + src/test/regress/expected/copy2.out | 18 +- src/test/regress/sql/copy2.sql | 9 + 6 files changed, 53 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 21a5c4a052..3d949f04a4 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -380,12 +380,17 @@ COPY { table_name [ ( Specifies which error_action to perform when there is malformed data in the input. - Currently, only stop (default) and ignore - values are supported. + Currently, only stop (default), ignore + and log values are supported. If the stop value is specified, COPY stops operation at the first error. If the ignore value is specified, COPY skips malformed data and continues copying data. + If the log value is specified, + COPY behaves the same as ignore, + except that it logs information that should have resulted in errors to + PostgreSQL log at NOTICE + level. The option is allowed only in COPY FROM. Only stop value is allowed when using binary format. diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cc0786c6f4..812ca63350 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -415,13 +415,15 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) return COPY_ON_ERROR_STOP; /* - * Allow "stop", or "ignore" values. + * Allow "stop", "ignore" or "log" values. */ sval = defGetString(def); if (pg_strcasecmp(sval, "stop") =
Re: RFC: Logging plan of the running query
Hi, Updated the patch to fix typos and move ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction. BTW since the thread is getting long, I list the some points of the discussion so far: # Safety concern ## Catalog access inside CFI - it seems safe if the CFI call is inside an existing valid transaction/query state[1] - We did some tests, for example calling ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS()[2]. This test passed on my env but was stucked on James's env, so I modified to exit ProcessLogQueryPlanInterrupt() when target process is inside of lock acquisition code[3] ## Risk of calling EXPLAIN code in CFI - EXPLAIN is not a simple logic code, and there exists risk calling it from CFI. For example, if there is a bug, we may find ourselves in a situation where we can't cancel the query - it's a trade-off that's worth making for the introspection benefits this patch would provide?[4] # Design - Although some suggested it should be in auto_explain, current patch introduces this feature to the core[5] - When the target query is nested, only the most inner query's plan is explained. In future, all the nested queries' plans are expected to explained optionally like auto_explain.log_nested_statements[6] - When the target process is a parallel worker, the plan is not shown[6] - When the target query is nested and its subtransaction is aborted, pg_log_query_plan cannot log the parental query plan after the abort even parental query is running[7] - The output corresponds to EXPLAIN with VERBOSE, COST, SETTINGS and FORMAT text. It doesn't do ANALYZE or show the progress of the query execution. Future work proposed by Rafael Thofehrn Castro may realize this[8] - To prevent assertion error, this patch ensures no page lock is held by checking all the LocalLock entries before running explain code, but there is a discussion that ginInsertCleanup() should be modified[9] It may be not so difficult to improve some of restrictions in "Design", but I'd like to limit the scope of the 1st patch to make it simpler. [1] https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com [3] https://www.postgresql.org/message-id/0e0e7ca08dff077a625c27a5e0c2ef0a%40oss.nttdata.com [4] https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com [5] https://www.postgresql.org/message-id/CAAaqYe_1EuoTudAz8mr8-qtN5SoNtYRm4JM2J9CqeverpE3B2A%40mail.gmail.com [6] https://www.postgresql.org/message-id/CAExHW5sh4ahrJgmMAGfptWVmESt1JLKCNm148XVxTunRr%2B-6gA%40mail.gmail.com [7] https://www.postgresql.org/message-id/3d121ed5f81cef588bac836b43f5d1f9%40oss.nttdata.com [8] https://www.postgresql.org/message-id/c161b5e7e1888eb9c9eb182a7d9dcf89%40oss.nttdata.com [9] https://www.postgresql.org/message-id/20220201.172757.1480996662235658750.horikyota.ntt%40gmail.com -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 65786ad6c2a9b656c3fd36a45118a39a66da0236 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 29 Jan 2024 21:40:04 +0900 Subject: [PATCH v35] Add function to log the plan of the query Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro Horiguchi, Robert Treat, Alena Rybakina, Ashutosh Bapat Co-authored-by: James Coleman --- contrib/auto_explain/auto_explain.c | 23 +- doc/src/sgml/func.sgml | 50 + src/backend/access/transam/xact.c| 17 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 208 ++- src/backend/executor/execMain.c | 14 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 4 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 9 + src/include/mis
Re: Small fix on COPY ON_ERROR document
On 2024-02-01 15:16, Yugo NAGATA wrote: On Mon, 29 Jan 2024 15:47:25 +0900 Yugo NAGATA wrote: On Sun, 28 Jan 2024 19:14:58 -0700 "David G. Johnston" wrote: > > Also, I think "invalid input syntax" is a bit ambiguous. For example, > > COPY FROM raises an error when the number of input column does not match > > to the table schema, but this error is not ignored by ON_ERROR while > > this seems to fall into the category of "invalid input syntax". > > > > It is literally the error text that appears if one were not to ignore it. > It isn’t a category of errors. But I’m open to ideas here. But being > explicit with what on actually sees in the system seemed preferable to > inventing new classification terms not otherwise used. Thank you for explanation! I understood the words was from the error messages that users actually see. However, as Torikoshi-san said in [1], errors other than valid input syntax (e.g. range error) can be also ignored, therefore it would be better to describe to be ignored errors more specifically. [1] https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com > > > > > So, keeping consistency with the existing description, we can say: > > > > "Specifies which how to behave when encountering an error due to > > column values unacceptable to the input function of each attribute's > > data type." > > > Yeah, I was considering something along those lines as an option as well. > But I’d rather add that wording to the glossary. Although I am still be not convinced if we have to introduce the words "soft error" to the documentation, I don't care it if there are no other opposite opinions. Attached is a updated patch v3, which is a version that uses the above wording instead of "soft error". > > > Currently, ON_ERROR doesn't support other soft errors, so it can explain > > it more simply without introducing the new concept, "soft error" to users. > > > > > Good point. Seems we should define what user-facing errors are ignored > anywhere in the system and if we aren’t consistently leveraging these in > all areas/commands make the necessary qualifications in those specific > places. > > > I think "left in a deleted state" is also unclear for users because this > > explains the internal state but not how looks from user's view.How about > > leaving the explanation "These rows will not be visible or accessible" in > > the existing statement? > > > > Just visible then, I don’t like an “or” there and as tuples at least they > are accessible to the system, in vacuum especially. But I expected the > user to understand “as if you deleted it” as their operational concept more > readily than visible. I think this will be read by people who haven’t read > MVCC to fully understand what visible means but know enough to run vacuum > to clean up updated and deleted data as a rule. Ok, I agree we can omit "or accessible". How do you like the followings? Still redundant? "If the command fails, these rows are left in a deleted state; these rows will not be visible, but they still occupy disk space. " Also, the above statement is used in the patch. Thanks for updating the patch! I like your description which doesn't use the word soft error. Here are minor comments: + ignore means discard the input row and continue with the next one. + The default is stop Is "." required at the end of the line? An NOTICE level context message containing the ignored row count is Should 'An' be 'A'? Also, I wasn't sure the necessity of 'context'. It might be possible to just say "A NOTICE message containing the ignored row count.." considering below existing descriptions: doc/src/sgml/pltcl.sgml: a NOTICE message each time a supported command is doc/src/sgml/pltcl.sgml- executed: doc/src/sgml/plpgsql.sgml: This example trigger simply raises a NOTICE message doc/src/sgml/plpgsql.sgml- each time a supported command is executed. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Add new COPY option REJECT_LIMIT
On 2024-01-27 00:20, David G. Johnston wrote: Thanks for your comments! On Fri, Jan 26, 2024 at 2:49 AM torikoshia wrote: Hi, 9e2d870 enabled the COPY command to skip soft error, and I think we can add another option which specifies the maximum tolerable number of soft errors. I remember this was discussed in [1], and feel it would be useful when loading 'dirty' data but there is a limit to how dirty it can be. Attached a patch for this. What do you think? I'm opposed to adding this particular feature. When implementing this kind of business rule I'd need the option to specify a percentage, not just an absolute value. Yeah, it seems useful for some cases. Actually, Greenplum enables to specify not only the max number of bad rows but also its percentage[1]. I may be wrong, but considering some dataloaders support something like reject_limit(Redshift supports MAXERROR[2], pg_bulkload supports PARSE_ERRORS[3]), specifying the "number" of the bad row might also be useful. I think we can implement reject_limit specified by percentage simply calculating the ratio of skipped and processed at the end of CopyFrom() like this: if (cstate->opts.reject_limit > 0 && (double) skipped / (processed + skipped) > cstate->opts.reject_limit_percent) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("exceeded the ratio specified by .. I would focus on trying to put the data required to make this kind of determination into a place where applications implementing such business rules and monitoring can readily get at it. The "ERRORS TO" and maybe a corresponding "STATS TO" option where a table can be specified for the system to place the problematic data and stats about the copy itself. It'd be nice to have such informative tables, but I believe the benefit of reject_limit is it fails the entire loading when the threshold is exceeded. I imagine when we just have error and stats information tables for COPY, users have to delete the rows when they confirmed too many errors in these tables. [1]https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/admin_guide-load-topics-g-handling-load-errors.html [2]https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-load.html [3]https://ossc-db.github.io/pg_bulkload/pg_bulkload.html -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Hi, On 2024-02-03 15:22, jian he wrote: The idea of on_error is to tolerate errors, I think. if a column has a not null constraint, let it cannot be used with (on_error 'null') + /* +* we can specify on_error 'null', but it can only apply to columns +* don't have not null constraint. + */ + if (att->attnotnull && cstate->opts.on_error == COPY_ON_ERROR_NULL) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), +errmsg("copy on_error 'null' cannot be used with not null constraint column"))); This means we cannot use ON_ERROR 'null' even when there is one column which have NOT NULL constraint, i.e. primary key, right? IMHO this is strong constraint and will decrease the opportunity to use this feature. It might be better to allow error_action 'null' for tables which have NOT NULL constraint columns, and when facing soft errors for those rows, skip that row or stop COPY. Based on this, I've made a patch. based on COPY Synopsis: ON_ERROR 'error_action' on_error 'null', the keyword NULL should be single quoted. As you mentioned, single quotation seems a little odd.. I'm not sure what is the best name and syntax for this feature, but since current error_action are verbs('stop' and 'ignore'), I feel 'null' might not be appropriate. demo: COPY check_ign_err FROM STDIN WITH (on_error 'null'); 1 {1} a 2 {2} 1 3 {3} 2 4 {4} b a {5} c \. \pset null NULL SELECT * FROM check_ign_err; n | m | k --+-+-- 1 | {1} | NULL 2 | {2} |1 3 | {3} |2 4 | {4} | NULL NULL | {5} | NULL Since we notice the number of ignored rows when ON_ERROR is 'ignore', users may want to know the number of rows which was changed to NULL when using ON_ERROR 'null'. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
Hi Ashutosh, On 2024-02-06 19:51, Ashutosh Bapat wrote: Thanks for the summary. It is helpful. I think patch is also getting better. I have a few questions and suggestions Thanks for your comments. 1. Prologue of GetLockMethodLocalHash() mentions * NOTE: When there are many entries in LockMethodLocalHash, calling this * function and looking into all of them can lead to performance problems. */ How bad this performance could be. Let's assume that a query is taking time and pg_log_query_plan() is invoked to examine the plan of this query. Is it possible that the looping over all the locks itself takes a lot of time delaying the query execution further? I think it depends on the number of local locks, but I've measured cpu time for this page lock check by adding below codes and v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for your information: diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 5f7d77d567..65b7cb4925 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -44,6 +44,8 @@ +#include "time.h" ... @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void) * we check all the LocalLock entries and when finding even one, give up * logging the plan. */ + start = clock(); hash_seq_init(&status, GetLockMethodLocalHash()); while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL) { if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) { ereport(LOG_SERVER_ONLY, errmsg("ignored request for logging query plan due to page lock conflicts"), errdetail("You can try again in a moment.")); hash_seq_term(&status); ProcessLogQueryPlanInterruptActive = false; return; } } + end = clock(); + cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; + + ereport(LOG, + errmsg("all locallock entry search took: %f", cpu_time_used)); + There were about 3 million log lines which recorded the cpu time, and the duration was quite short: =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d. =# select max(d), min(d), avg(d) from cpu_time ; max| min | avg --+-+--- 0.000116 | 0 | 4.706274625332238e-07 I'm not certain that this is valid for actual use cases, but these results seem to suggest that it will not take that long. 2. What happens if auto_explain is enabled in the backend and pg_log_query_plan() is called on the same backend? Will they conflict? I think we should add a test for the same. Hmm, I think they don't conflict since they just refer QueryDesc and don't modify it and don't use same objects for locking. (I imagine 'conflict' here is something like 'hard conflict' in replication[2].) Actually using both auto_explain and pg_log_query_plan() output each logs separately: (pid:62835)=# select pg_sleep(10); (pid:7)=# select pg_log_query_plan(62835); (pid:7)=# \! cat data/log/postgres.log ... 2024-02-06 21:44:17.837 JST [62835:4:0] LOG: 0: query plan running on backend with PID 62835 is: Query Text: select pg_sleep(10); Result (cost=0.00..0.01 rows=1 width=4) Output: pg_sleep('10'::double precision) Query Identifier: 3506829283127886044 2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION: ProcessLogQueryPlanInterrupt, explain.c:5336 2024-02-06 21:44:26.974 JST [62835:6:0] LOG: 0: duration: 1.868 ms plan: Query Text: select pg_sleep(10); Result (cost=0.00..0.01 rows=1 width=4) (actual time=1.802..1.804 rows=1 loops=1) Using injection point support we should be able to add tests for testing pg_log_query_plan behaviour when there are page locks held or when auto_explain (with instrumentation) and pg_log_query_plan() work on the same query plan. Use injection point to make the backend running query wait at a suitable point to delay its execution and fire pg_log_query_plan() from other backend. May be the same test could examine the server log file to see if the plan is indeed output to the server log file. Given that the feature will be used when the things have already gone wrong, it should not make things more serious. So more testing and especially automated would help. Thanks for the advice, it seems a good idea. I'm going to try to add tests using injection point. [1] https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com [2] https://www.postgresql.org/docs/devel/hot-standby.html#HOT-STANDBY-CONFLICT -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
On 2024-02-07 13:58, Ashutosh Bapat wrote: On Wed, Feb 7, 2024 at 9:38 AM torikoshia wrote: Hi Ashutosh, On 2024-02-06 19:51, Ashutosh Bapat wrote: > Thanks for the summary. It is helpful. I think patch is also getting > better. > > I have a few questions and suggestions Thanks for your comments. > 1. Prologue of GetLockMethodLocalHash() mentions > * NOTE: When there are many entries in LockMethodLocalHash, calling > this > * function and looking into all of them can lead to performance > problems. > */ > How bad this performance could be. Let's assume that a query is taking > time and pg_log_query_plan() is invoked to examine the plan of this > query. Is it possible that the looping over all the locks itself takes > a lot of time delaying the query execution further? I think it depends on the number of local locks, but I've measured cpu time for this page lock check by adding below codes and v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for your information: diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 5f7d77d567..65b7cb4925 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -44,6 +44,8 @@ +#include "time.h" ... @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void) * we check all the LocalLock entries and when finding even one, give up * logging the plan. */ + start = clock(); hash_seq_init(&status, GetLockMethodLocalHash()); while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL) { if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) { ereport(LOG_SERVER_ONLY, errmsg("ignored request for logging query plan due to page lock conflicts"), errdetail("You can try again in a moment.")); hash_seq_term(&status); ProcessLogQueryPlanInterruptActive = false; return; } } + end = clock(); + cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; + + ereport(LOG, + errmsg("all locallock entry search took: %f", cpu_time_used)); + There were about 3 million log lines which recorded the cpu time, and the duration was quite short: =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d. =# select max(d), min(d), avg(d) from cpu_time ; max| min | avg --+-+--- 0.000116 | 0 | 4.706274625332238e-07 I'm not certain that this is valid for actual use cases, but these results seem to suggest that it will not take that long. What load did you run? I don't think any query in make check would take say thousands of locks. Sorry, I forgot to write it but ran make check as you imagined. The prologue refers to a very populated lock hash table. I think that will happen if thousands of tables are queried in a single query OR a query runs on a partitioned table with thousands of partitions. May be we want to try that scenario. OK, I'll try such cases. > 2. What happens if auto_explain is enabled in the backend and > pg_log_query_plan() is called on the same backend? Will they conflict? > I think we should add a test for the same. Hmm, I think they don't conflict since they just refer QueryDesc and don't modify it and don't use same objects for locking. (I imagine 'conflict' here is something like 'hard conflict' in replication[2].) By conflict, I mean the two features behave weird when used together e.g give wrong results or crash etc. Actually using both auto_explain and pg_log_query_plan() output each logs separately: (pid:62835)=# select pg_sleep(10); (pid:7)=# select pg_log_query_plan(62835); (pid:7)=# \! cat data/log/postgres.log ... 2024-02-06 21:44:17.837 JST [62835:4:0] LOG: 0: query plan running on backend with PID 62835 is: Query Text: select pg_sleep(10); Result (cost=0.00..0.01 rows=1 width=4) Output: pg_sleep('10'::double precision) Query Identifier: 3506829283127886044 2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION: ProcessLogQueryPlanInterrupt, explain.c:5336 2024-02-06 21:44:26.974 JST [62835:6:0] LOG: 0: duration: 1.868 ms plan: Query Text: select pg_sleep(10); Result (cost=0.00..0.01 rows=1 width=4) (actual time=1.802..1.804 rows=1 loops=1) > Using injection point support we should be able to add tests for > testing pg_log_query_plan behaviour when there are page locks held or > when auto_explain (with in
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2023-02-06 15:00, Tom Lane wrote: Andres Freund writes: On February 5, 2023 9:12:17 PM PST, Tom Lane wrote: Damir Belyalov writes: InputFunctionCallSafe() is good for detecting errors from input-functions but there are such errors from NextCopyFrom () that can not be detected with InputFunctionCallSafe(), e.g. "wrong number of columns in row''. If you want to deal with those, then there's more work to be done to make those bits non-error-throwing. But there's a very finite amount of code involved and no obvious reason why it couldn't be done. I'm not even sure it makes sense to avoid that kind of error. And invalid column count or such is something quite different than failing some data type input routine, or falling a constraint. I think it could be reasonable to put COPY's overall-line-format requirements on the same level as datatype input format violations. I agree that trying to trap every kind of error is a bad idea, for largely the same reason that the soft-input-errors patches only trap certain kinds of errors: it's too hard to tell whether an error is an "internal" error that it's scary to continue past. Is it a bad idea to limit the scope of allowing errors to 'soft' errors in InputFunctionCallSafe()? I think it could be still useful for some usecases. diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql +-- tests for IGNORE_DATATYPE_ERRORS option +CREATE TABLE check_ign_err (n int, m int[], k int); +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS; +1 {1} 1 +a {2} 2 +3 {3} 33 +4 {a, 4} 4 + +5 {5} 5 +\. +SELECT * FROM check_ign_err; diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index 090ef6c7a8..08e8056fc1 100644 +-- tests for IGNORE_DATATYPE_ERRORS option +CREATE TABLE check_ign_err (n int, m int[], k int); +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS; +WARNING: invalid input syntax for type integer: "a" +WARNING: value "33" is out of range for type integer +WARNING: invalid input syntax for type integer: "a" +WARNING: invalid input syntax for type integer: "" +SELECT * FROM check_ign_err; + n | m | k +---+-+--- + 1 | {1} | 1 + 5 | {5} | 5 +(2 rows) -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 16877d4cdd64db5f85bed9cd559e618d8211e598 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 27 Feb 2023 12:02:16 +0900 Subject: [PATCH v1] Add COPY option IGNORE_DATATYPE_ERRORS --- src/backend/commands/copy.c | 8 src/backend/commands/copyfrom.c | 11 +++ src/backend/commands/copyfromparse.c | 12 ++-- src/backend/parser/gram.y| 8 +++- src/bin/psql/tab-complete.c | 3 ++- src/include/commands/copy.h | 1 + src/include/commands/copyfrom_internal.h | 2 ++ src/include/parser/kwlist.h | 1 + src/test/regress/expected/copy2.out | 14 ++ src/test/regress/sql/copy2.sql | 12 10 files changed, 68 insertions(+), 4 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index e34f583ea7..2f1cfb3f4d 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -410,6 +410,7 @@ ProcessCopyOptions(ParseState *pstate, bool format_specified = false; bool freeze_specified = false; bool header_specified = false; + bool ignore_datatype_errors_specified= false; ListCell *option; /* Support external use for option sanity checking */ @@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate, freeze_specified = true; opts_out->freeze = defGetBoolean(defel); } + else if (strcmp(defel->defname, "ignore_datatype_errors") == 0) + { + if (ignore_datatype_errors_specified) +errorConflictingDefElem(defel, pstate); + ignore_datatype_errors_specified= true; + opts_out->ignore_datatype_errors = defGetBoolean(defel); + } else if (strcmp(defel->defname, "delimiter") == 0) { if (opts_out->delim) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index af52faca6d..24eec6a27d 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -959,6 +959,7 @@ CopyFrom(CopyFromState cstate) { TupleTableSlot *myslot; bool skip_tuple; + ErrorSaveContext escontext = {T_ErrorSaveContext}; CHECK_FOR_INTERRUPTS(); @@ -991,10 +992,20 @@ CopyFrom(CopyFromState cstate) ExecClearTuple(myslot); + if (cstate->opts.ignore_datatype_errors) + { + escontext.details_wanted = true; + cstate->escontext = escontext; + } + /* Directly store the values/nulls array in the slot */ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull)) break; + /* Soft error occured, skip this tuple */ + if(cstate->escontext.error_occurred) + continue; + ExecStoreVirtualTuple(myslot)
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2023-03-06 23:03, Daniel Gustafsson wrote: On 28 Feb 2023, at 15:28, Damir Belyalov wrote: Tested patch on all cases: CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITION. As expected it works. Also added a description to copy.sgml and made a review on patch. Thanks for your tests and improvements! I added 'ignored_errors' integer parameter that should be output after the option is finished. All errors were added to the system logfile with full detailed context. Maybe it's better to log only error message. Certainly. FWIW, Greenplum has a similar construct (but which also logs the errors in the db) where data type errors are skipped as long as the number of errors don't exceed a reject limit. If the reject limit is reached then the COPY fails: LOG ERRORS [ SEGMENT REJECT LIMIT [ ROWS | PERCENT ]] IIRC the gist of this was to catch then the user copies the wrong input data or plain has a broken file. Rather than finding out after copying n rows which are likely to be garbage the process can be restarted. This version of the patch has a compiler error in the error message: copyfrom.c: In function ‘CopyFrom’: copyfrom.c:1008:29: error: format ‘%ld’ expects argument of type ‘long int’, but argument 2 has type ‘uint64’ {aka ‘long long unsigned int’} [-Werror=format=] 1008 | ereport(WARNING, errmsg("Errors: %ld", cstate->ignored_errors)); | ^ ~~ | | | uint64 {aka long long unsigned int} On that note though, it seems to me that this error message leaves a bit to be desired with regards to the level of detail. +1. I felt just logging "Error: %ld" would make people wonder the meaning of the %ld. Logging something like ""Error: %ld data type errors were found" might be clearer. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: Record queryid when auto_explain.log_verbose is on
On 2023-03-07 08:50, Imseih (AWS), Sami wrote: I am wondering if this patch should be backpatched? The reason being is in auto_explain documentation [1], there is a claim of equivalence of the auto_explain.log_verbose option and EXPLAIN(verbose) ". it's equivalent to the VERBOSE option of EXPLAIN." This can be quite confusing for users of the extension. The documentation should either be updated or a backpatch all the way down to 14, which the version the query identifier was moved to core. I am in favor of the latter. Any thoughts? We discussed a bit whether to backpatch this, but agreed that it would be better not to do so for the following reasons: It's a bit annoying that the info is missing since pg 14, but we probably can't backpatch this as it might break log parser tools. What do you think? -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2023-03-07 18:09, Daniel Gustafsson wrote: On 7 Mar 2023, at 09:35, Damir Belyalov wrote: I felt just logging "Error: %ld" would make people wonder the meaning of the %ld. Logging something like ""Error: %ld data type errors were found" might be clearer. Thanks. For more clearance change the message to: "Errors were found: %". I'm not convinced that this adds enough clarity to assist the user. We also shouldn't use "error" in a WARNING log since the user has explicitly asked to skip rows on error, so it's not an error per se. +1 How about something like: ereport(WARNING, (errmsg("%ld rows were skipped due to data type incompatibility", cstate->ignored_errors), errhint("Skipped rows can be inspected in the database log for reprocessing."))); Since skipped rows cannot be inspected in the log when log_error_verbosity is set to terse, it might be better without this errhint. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION