Posting updated version of this patch with comments above addressed. 1) pg_signal_autovacuum -> pg_signal_autovacuum_worker, as there seems to be no objections to that.
2) There are comments on how to write if statement: > In core, do_autovacuum() is only called in a process without > a role specified > It's feeling more natural here to check that we have a superuser-owned > backend first, and then do a lookup of the process type. > I figured since there's no reason to rely on that behavior, we might as > well do a bit of future-proofing in case autovacuum workers are ever not > run as InvalidOid. I have combined them into this: if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) && pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER) This is both future-proofing and natural, I suppose. Downside of this is double checking condition (!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)), but i think that is ok for the sake of simplicity. 3) pg_signal_autovacuum_worker Oid changed to random one: 8916 4) > An invalid BackendType is not false, but B_INVALID. fixed, thanks 5) >>>> There is pg_read_all_stats as well, so I don't see a big issue in >>>> requiring to be a member of this role as well for the sake of what's >>>> proposing here. >>> >>> Well, that tells you quite a bit more than just which PIDs correspond to >>> autovacuum workers, but maybe that's good enough for now. >> >> That may be a good initial compromise, for now. >Sounds good to me. I will update the documentation. @Anthony if you feel that documentation update adds much value here, please do. Given that we know autovacuum worker PIDs from pg_stat_progress_vacuum, I don't know how to reflect something about pg_stat_autovac_worker in doc, and if it is worth it. 6) > + INJECTION_POINT("autovacuum-start"); > Perhaps autovacuum-worker-start is more suited here fixed, thanks 7) > +# Copyright (c) 2022-2024, PostgreSQL Global Development Group > [...] > +# Copyright (c) 2024-2024, PostgreSQL Global Development Group > These need to be cleaned up. > +# Makefile for src/test/recovery > +# > +# src/test/recovery/Makefile > This is incorrect, twice. Cleaned up, thanks! 8) > Not sure that there is a huge point in checking after a role that > holds pg_signal_backend. Ok. Removed. Then: > +like($psql_err, qr/ERROR: permission denied to terminate ... > Checking only the ERRROR, and not the DETAIL should be sufficient > here. After removing the pg_signal_backend test case we have only one place where errors check is done. So, I think we should keep DETAIL here to ensure detail is correct (it differs from regular backend case). 9) > +# Test signaling for pg_signal_autovacuum role. > This needs a better documentation: Updated. Hope now the test documentation helps to understand it. 10) > +ok($terminate_with_pg_signal_av == 0, "Terminating autovacuum worker should > succeed with pg_signal_autovacuum role"); > Is that enough for the validation? Added: ok($node->log_contains(qr/FATAL: terminating autovacuum process due to administrator command/, $offset), "Autovacuum terminates when role is granted with pg_signal_autovacuum_worker"); 11) references to `passcheck` extension removed. errors messages rephrased. 12) injection_point_detach added. 13) > + INSERT INTO tab_int SELECT * FROM generate_series(1, 1000000); > A good chunk of the test would be spent on that, but you don't need > that many tuples to trigger an autovacuum worker as the short naptime > is able to do it. I would recommend to reduce that to a minimum. +1 Single tuple works. 14) v3 suffers from segfault: 2024-04-11 11:28:31.116 UTC [147437] 001_signal_autovacuum.pl LOG: statement: SELECT pg_terminate_backend(147427); 2024-04-11 11:28:31.116 UTC [147427] FATAL: terminating autovacuum process due to administrator command 2024-04-11 11:28:31.116 UTC [147410] LOG: server process (PID 147427) was terminated by signal 11: Segmentation fault 2024-04-11 11:28:31.116 UTC [147410] LOG: terminating any other active server processes 2024-04-11 11:28:31.117 UTC [147410] LOG: shutting down because restart_after_crash is off 2024-04-11 11:28:31.121 UTC [147410] LOG: database system is shut down The test doesn't fail because pg_terminate_backend actually meets his point: autovac is killed. But while dying, autovac also receives segfault. Thats because of injections points: (gdb) bt #0 0x000056361c3379ea in tas (lock=0x7fbcb9632224 <error: Cannot access memory at address 0x7fbcb9632224>) at ../../../../src/include/storage/s_lock.h:228 #1 ConditionVariableCancelSleep () at condition_variable.c:238 #2 0x000056361c337e4b in ConditionVariableBroadcast (cv=0x7fbcb66f498c) at condition_variable.c:310 #3 0x000056361c330a40 in CleanupProcSignalState (status=<optimized out>, arg=<optimized out>) at procsignal.c:240 #4 0x000056361c328801 in shmem_exit (code=code@entry=1) at ipc.c:276 #5 0x000056361c3288fc in proc_exit_prepare (code=code@entry=1) at ipc.c:198 #6 0x000056361c3289bf in proc_exit (code=code@entry=1) at ipc.c:111 #7 0x000056361c49ffa8 in errfinish (filename=<optimized out>, lineno=<optimized out>, funcname=0x56361c654370 <__func__.16> "ProcessInterrupts") at elog.c:592 #8 0x000056361bf7191e in ProcessInterrupts () at postgres.c:3264 #9 0x000056361c3378d7 in ConditionVariableTimedSleep (cv=0x7fbcb9632224, timeout=timeout@entry=-1, wait_event_info=117440513) at condition_variable.c:196 #10 0x000056361c337d0b in ConditionVariableTimedSleep (wait_event_info=<optimized out>, timeout=-1, cv=<optimized out>) at condition_variable.c:135 #11 ConditionVariableSleep (cv=<optimized out>, wait_event_info=<optimized out>) at condition_variable.c:98 #12 0x00000000b96347d0 in ?? () #13 0x3a3f1d9baa4f5500 in ?? () #14 0x000056361cc6cbd0 in ?? () #15 0x000056361ccac300 in ?? () #16 0x000056361c62be63 in ?? () #17 0x00007fbcb96347d0 in ?? () at injection_points.c:201 from /home/reshke/postgres/tmp_install/home/reshke/postgres/pgbin/lib/injection_points.so #18 0x00007fffe4122b10 in ?? () #19 0x00007fffe4122b70 in ?? () #20 0x0000000000000000 in ?? () discovered because of # Release injection point. $node->safe_psql('postgres', "SELECT injection_point_detach('autovacuum-worker-start');"); added v4 also suffers from that. i will try to fix that.
v4-0001-Introduce-pg_signal_autovacuum-role-to-allow-non-.patch
Description: Binary data
v4-0002-Add-tap-test-for-pg_signal_autovacuum-role.patch
Description: Binary data