May I know the progress? sometimes, standby only offload some query and with lower cpu and ram. warning is flexible instead of fatal ? We track_commit_timestamp =on and max_worker_process, wal_level=logical change and restart primary, it lead to standby failed and recovery from last restart point that lead to another issue, because old transaction before turn on track_commit_timestamp, and failed in get that transaction commit_ts. FATAL: recovery aborted because of insufficient parameter settings DETAIL: max_worker_processes = 10 is a lower setting than on the primary server, where its value was 30. HINT: You can restart the server after making the necessary configuration changes. CONTEXT: WAL redo at 1F4F/xxxxxxxx for XLOG/PARAMETER_CHANGE: max_connections=xxx max_worker_processes=30 max_wal_sen LOG: startup process (PID xxx) exited with exit code 1 LOG: terminating any other active server processes LOG: shutting down due to startup process failure LOG: database system is shut down LOG: starting PostgreSQL 14.18 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712
Thanks, James Kirill Reshke <[email protected]> 於 2025年3月13日週四 上午2:09寫道: > On Thu, 29 Aug 2024 at 23:52, Heikki Linnakangas <[email protected]> wrote: > > > > Currently, if you configure a hot standby server with a smaller > > max_connections setting than the primary, the server refuses to start up: > > > > LOG: entering standby mode > > FATAL: recovery aborted because of insufficient parameter settings > > DETAIL: max_connections = 10 is a lower setting than on the primary > > server, where its value was 100. > > HINT: You can restart the server after making the necessary > > configuration changes. > > > > Or if you change the setting in the primary while the standby is > > running, replay pauses: > > > > WARNING: hot standby is not possible because of insufficient parameter > > settings > > DETAIL: max_connections = 100 is a lower setting than on the primary > > server, where its value was 200. > > CONTEXT: WAL redo at 2/E10000D8 for XLOG/PARAMETER_CHANGE: > > max_connections=200 max_worker_processes=8 max_wal_senders=10 > > max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical > > wal_log_hints=off track_commit_timestamp=off > > LOG: recovery has paused > > DETAIL: If recovery is unpaused, the server will shut down. > > HINT: You can then restart the server after making the necessary > > configuration changes. > > CONTEXT: WAL redo at 2/E10000D8 for XLOG/PARAMETER_CHANGE: > > max_connections=200 max_worker_processes=8 max_wal_senders=10 > > max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical > > wal_log_hints=off track_commit_timestamp=off > > > > Both of these are rather unpleasant behavior. > > > > I thought I could get rid of that limitation with my CSN snapshot patch > > [1], because it gets rid of the fixed-size known-assigned XIDs array, > > but there's a second reason for these limitations. It's also used to > > ensure that the standby has enough space in the lock manager to hold > > possible AccessExclusiveLocks taken by transactions in the primary. > > > > So firstly, I think that's a bad tradeoff. In vast majority of cases, > > you would not run out of lock space anyway, if you just started up the > > system. Secondly, that cross-check of settings doesn't fully prevent the > > problem. It ensures that the lock tables are large enough to accommodate > > all the locks you could possibly hold in the primary, but that doesn't > > take into account any additional locks held by read-only queries in the > > hot standby. So if you have queries running in the standby that take a > > lot of locks, this can happen anyway: > > > > 2024-08-29 21:44:32.634 EEST [668327] FATAL: out of shared memory > > 2024-08-29 21:44:32.634 EEST [668327] HINT: You might need to increase > > "max_locks_per_transaction". > > 2024-08-29 21:44:32.634 EEST [668327] CONTEXT: WAL redo at 2/FD40FCC8 > > for Standby/LOCK: xid 996 db 5 rel 154045 > > 2024-08-29 21:44:32.634 EEST [668327] WARNING: you don't own a lock of > > type AccessExclusiveLock > > 2024-08-29 21:44:32.634 EEST [668327] LOG: RecoveryLockHash contains > > entry for lock no longer recorded by lock manager: xid 996 database 5 > > relation 154045 > > TRAP: failed Assert("false"), File: > > "../src/backend/storage/ipc/standby.c", Line: 1053, PID: 668327 > > postgres: startup recovering > > 0000000100000002000000FD(ExceptionalCondition+0x6e)[0x556a4588396e] > > postgres: startup recovering > > 0000000100000002000000FD(+0x44156e)[0x556a4571356e] > > postgres: startup recovering > > 0000000100000002000000FD(StandbyReleaseAllLocks+0x78)[0x556a45712738] > > postgres: startup recovering > > > 0000000100000002000000FD(ShutdownRecoveryTransactionEnvironment+0x15)[0x556a45712685] > > postgres: startup recovering > > 0000000100000002000000FD(shmem_exit+0x111)[0x556a457062e1] > > postgres: startup recovering > > 0000000100000002000000FD(+0x434132)[0x556a45706132] > > postgres: startup recovering > > 0000000100000002000000FD(proc_exit+0x59)[0x556a45706079] > > postgres: startup recovering > > 0000000100000002000000FD(errfinish+0x278)[0x556a45884708] > > postgres: startup recovering > > 0000000100000002000000FD(LockAcquireExtended+0xa46)[0x556a45719386] > > postgres: startup recovering > > > 0000000100000002000000FD(StandbyAcquireAccessExclusiveLock+0x11d)[0x556a4571330d] > > postgres: startup recovering > > 0000000100000002000000FD(standby_redo+0x70)[0x556a45713690] > > postgres: startup recovering > > 0000000100000002000000FD(PerformWalRecovery+0x7b3)[0x556a4547d313] > > postgres: startup recovering > > 0000000100000002000000FD(StartupXLOG+0xac3)[0x556a4546dae3] > > postgres: startup recovering > > 0000000100000002000000FD(StartupProcessMain+0xe8)[0x556a45693558] > > postgres: startup recovering > > 0000000100000002000000FD(+0x3ba95d)[0x556a4568c95d] > > postgres: startup recovering > > 0000000100000002000000FD(+0x3bce41)[0x556a4568ee41] > > postgres: startup recovering > > 0000000100000002000000FD(PostmasterMain+0x116e)[0x556a4568eaae] > > postgres: startup recovering > > 0000000100000002000000FD(+0x2f960e)[0x556a455cb60e] > > /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f10ef042c8a] > > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f10ef042d45] > > postgres: startup recovering > > 0000000100000002000000FD(_start+0x21)[0x556a453af011] > > 2024-08-29 21:44:32.641 EEST [668324] LOG: startup process (PID 668327) > > was terminated by signal 6: Aborted > > 2024-08-29 21:44:32.641 EEST [668324] LOG: terminating any other active > > server processes > > 2024-08-29 21:44:32.654 EEST [668324] LOG: shutting down due to startup > > process failure > > 2024-08-29 21:44:32.729 EEST [668324] LOG: database system is shut down > > > > Granted, if you restart the server, it will probably succeed because > > restarting the server will kill all the other queries that were holding > > locks. But yuck. With assertions disabled, it looks a little less scary, > > but not nice anyway. > > > > So how to improve this? I see a few options: > > > > a) Downgrade the error at startup to a warning, and allow starting the > > standby with smaller settings in standby. At least with a smaller > > max_locks_per_transactions. The other settings also affect the size of > > known-assigned XIDs array, but if the CSN snapshots get committed, that > > will get fixed. In most cases there is enough lock memory anyway, and it > > will be fine. Just fix the assertion failure so that the error message > > is a little nicer. > > > > b) If you run out of lock space, kill running queries, and prevent new > > ones from starting. Track the locks in startup process' private memory > > until there is enough space in the lock manager, and then re-open for > > queries. In essence, go from hot standby mode to warm standby, until > > it's possible to go back to hot standby mode again. > > > > Thoughts, better ideas? > > > > [1] https://commitfest.postgresql.org/49/4912/ > > > > -- > > Heikki Linnakangas > > Neon (https://neon.tech) > > > > > > Hello! Do you intend to pursue this further? > > -- > Best regards, > Kirill Reshke > > >
