On Sat, 21 Mar 2020 at 09:38, Justin Pryzby <pry...@telsasoft.com> wrote: > > On Thu, Mar 19, 2020 at 03:44:49PM -0700, Andres Freund wrote: > > But uh, unfortunately the vacuum delay code just sleeps without setting > > a wait event: > ... > > Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT > > class? > > > > Given how frequently we run into trouble with [auto]vacuum throttling > > being a problem, and there not being any way to monitor that currently, > > that seems like it'd be a significant improvement, given the effort? > > I see that pg_sleep sets WAIT_EVENT_PG_SLEEP, but pg_usleep doesn't, I guess > because the overhead is significant for a small number of usecs, and because it > doesn't work for pg_usleep to set a generic event if callers want to be able to > set a more specific wait event. > > Also, I noticed that SLEEP_ON_ASSERT comment (31338352b) wants to use pg_usleep > "which seems too short.". Surely it should use pg_sleep, added at 782eefc58 a > few years later ? > > Also, there was a suggestion recently that this should have a separate > vacuum_progress phase: > |vacuumlazy.c:#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL 50 /* ms */ > |vacuumlazy.c:pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L); > > I was planning to look at that eventually ; should it have a wait event instead > or in addition ? > > > It'd probably also be helpful to report the total time [auto]vacuum > > spent being delayed for vacuum verbose/autovacuum logging, but imo > > that'd be a parallel feature to a wait event, not a replacement. > > This requires wider changes than I anticipated. > > 2020-03-20 22:35:51.308 CDT [16534] LOG: automatic aggressive vacuum of table "template1.pg_catalog.pg_class": index scans: 1 > pages: 0 removed, 11 remain, 0 skipped due to pins, 0 skipped frozen > tuples: 6 removed, 405 remain, 0 are dead but not yet removable, oldest xmin: 1574 > buffer usage: 76 hits, 7 misses, 8 dirtied > avg read rate: 16.378 MB/s, avg write rate: 18.718 MB/s > Cost-based delay: 2 msec > system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s > > VACUUM VERBOSE wouldn't normally be run with cost_delay > 0, so that field will > typically be zero, so I made it conditional, which is supposedly why it's > written like that, even though that's not otherwise being used since 17eaae98. > > Added at https://commitfest.postgresql.org/28/2515/ > > -- > Justin
Thanks Justin for quick patch. I haven't reviewed your full patch but I can see that "make installcheck" is failing with segment fault. *Stack trace;* Core was generated by `postgres: autovacuum worker regression '. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000560080cc827c in ginInsertCleanup (ginstate=0x7ffe0b648980, full_clean=false, fill_fsm=true, forceCleanup=true, stats=0x0) at ginfast.c:895 895 stats->delay_msec += vacuum_delay_point(); (gdb) bt #0 0x0000560080cc827c in ginInsertCleanup (ginstate=0x7ffe0b648980, full_clean=false, fill_fsm=true, forceCleanup=true, stats=0x0) at ginfast.c:895 #1 0x0000560080cdd0c3 in ginvacuumcleanup (info=0x7ffe0b64b0c0, stats=0x0) at ginvacuum.c:706 #2 0x0000560080d791d4 in index_vacuum_cleanup (info=0x7ffe0b64b0c0, stats=0x0) at indexam.c:711 #3 0x0000560080fa790e in do_analyze_rel (onerel=0x56008259e6e0, params=0x560082206de4, va_cols=0x0, acquirefunc=0x560080fa8a75 <acquire_sample_rows>, relpages=25, inh=false, in_outer_xact=false, elevel=13) at analyze.c:683 #4 0x0000560080fa5f3e in analyze_rel (relid=37789, relation=0x5600822ba1a0, params=0x560082206de4, va_cols=0x0, in_outer_xact=false, bstrategy=0x5600822064e0) at analyze.c:263 #5 0x00005600810d9eb7 in vacuum (relations=0x56008227e5b8, params=0x560082206de4, bstrategy=0x5600822064e0, isTopLevel=true) at vacuum.c:468 #6 0x0000560081357608 in autovacuum_do_vac_analyze (tab=0x560082206de0, bstrategy=0x5600822064e0) at autovacuum.c:3115 #7 0x00005600813557dd in do_autovacuum () at autovacuum.c:2466 #8 0x000056008135373d in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1693 #9 0x0000560081352f75 in StartAutoVacWorker () at autovacuum.c:1487 #10 0x000056008137ed5f in StartAutovacuumWorker () at postmaster.c:5580 #11 0x000056008137e199 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5297 #12 <signal handler called> #13 0x00007f18b778bff7 in __GI___select (nfds=9, readfds=0x7ffe0b64c050, writefds=0x0, exceptfds=0x0, timeout=0x7ffe0b64bfc0) at ../sysdeps/unix/sysv/linux/select.c:41 #14 0x000056008137499a in ServerLoop () at postmaster.c:1691 #15 0x0000560081373e63 in PostmasterMain (argc=3, argv=0x560082189020) at postmaster.c:1400 #16 0x00005600811d37ea in main (argc=3, argv=0x560082189020) at main.c:210 Here, stats is null so it is crashing. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com