May I know the progress?  sometimes, standby only offload some query and
with lower cpu and ram.  warning is flexible instead of fatal ?  We
 track_commit_timestamp =on and max_worker_process, wal_level=logical
change and restart primary, it lead to standby failed and recovery from
last restart point   that lead to another issue, because old transaction
before turn on track_commit_timestamp, and failed in get that transaction
commit_ts.
FATAL:  recovery aborted because of insufficient parameter settings
DETAIL:  max_worker_processes = 10 is a lower setting than on the primary
server, where its value was 30.
HINT:  You can restart the server after making the necessary configuration
changes.
CONTEXT:  WAL redo at 1F4F/xxxxxxxx for XLOG/PARAMETER_CHANGE:
max_connections=xxx max_worker_processes=30 max_wal_sen
LOG:  startup process (PID xxx) exited with exit code 1
LOG:  terminating any other active server processes
LOG:  shutting down due to startup process failure
LOG:  database system is shut down
LOG:  starting PostgreSQL 14.18 on aarch64-unknown-linux-gnu, compiled by
gcc (GCC) 7.3.1 20180712

Thanks,

James

Kirill Reshke <[email protected]> 於 2025年3月13日週四 上午2:09寫道:

> On Thu, 29 Aug 2024 at 23:52, Heikki Linnakangas <[email protected]> wrote:
> >
> > Currently, if you configure a hot standby server with a smaller
> > max_connections setting than the primary, the server refuses to start up:
> >
> > LOG:  entering standby mode
> > FATAL:  recovery aborted because of insufficient parameter settings
> > DETAIL:  max_connections = 10 is a lower setting than on the primary
> > server, where its value was 100.
> > HINT:  You can restart the server after making the necessary
> > configuration changes.
> >
> > Or if you change the setting in the primary while the standby is
> > running, replay pauses:
> >
> > WARNING:  hot standby is not possible because of insufficient parameter
> > settings
> > DETAIL:  max_connections = 100 is a lower setting than on the primary
> > server, where its value was 200.
> > CONTEXT:  WAL redo at 2/E10000D8 for XLOG/PARAMETER_CHANGE:
> > max_connections=200 max_worker_processes=8 max_wal_senders=10
> > max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical
> > wal_log_hints=off track_commit_timestamp=off
> > LOG:  recovery has paused
> > DETAIL:  If recovery is unpaused, the server will shut down.
> > HINT:  You can then restart the server after making the necessary
> > configuration changes.
> > CONTEXT:  WAL redo at 2/E10000D8 for XLOG/PARAMETER_CHANGE:
> > max_connections=200 max_worker_processes=8 max_wal_senders=10
> > max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical
> > wal_log_hints=off track_commit_timestamp=off
> >
> > Both of these are rather unpleasant behavior.
> >
> > I thought I could get rid of that limitation with my CSN snapshot patch
> > [1], because it gets rid of the fixed-size known-assigned XIDs array,
> > but there's a second reason for these limitations. It's also used to
> > ensure that the standby has enough space in the lock manager to hold
> > possible AccessExclusiveLocks taken by transactions in the primary.
> >
> > So firstly, I think that's a bad tradeoff. In vast majority of cases,
> > you would not run out of lock space anyway, if you just started up the
> > system. Secondly, that cross-check of settings doesn't fully prevent the
> > problem. It ensures that the lock tables are large enough to accommodate
> > all the locks you could possibly hold in the primary, but that doesn't
> > take into account any additional locks held by read-only queries in the
> > hot standby. So if you have queries running in the standby that take a
> > lot of locks, this can happen anyway:
> >
> > 2024-08-29 21:44:32.634 EEST [668327] FATAL:  out of shared memory
> > 2024-08-29 21:44:32.634 EEST [668327] HINT:  You might need to increase
> > "max_locks_per_transaction".
> > 2024-08-29 21:44:32.634 EEST [668327] CONTEXT:  WAL redo at 2/FD40FCC8
> > for Standby/LOCK: xid 996 db 5 rel 154045
> > 2024-08-29 21:44:32.634 EEST [668327] WARNING:  you don't own a lock of
> > type AccessExclusiveLock
> > 2024-08-29 21:44:32.634 EEST [668327] LOG:  RecoveryLockHash contains
> > entry for lock no longer recorded by lock manager: xid 996 database 5
> > relation 154045
> > TRAP: failed Assert("false"), File:
> > "../src/backend/storage/ipc/standby.c", Line: 1053, PID: 668327
> > postgres: startup recovering
> > 0000000100000002000000FD(ExceptionalCondition+0x6e)[0x556a4588396e]
> > postgres: startup recovering
> > 0000000100000002000000FD(+0x44156e)[0x556a4571356e]
> > postgres: startup recovering
> > 0000000100000002000000FD(StandbyReleaseAllLocks+0x78)[0x556a45712738]
> > postgres: startup recovering
> >
> 0000000100000002000000FD(ShutdownRecoveryTransactionEnvironment+0x15)[0x556a45712685]
> > postgres: startup recovering
> > 0000000100000002000000FD(shmem_exit+0x111)[0x556a457062e1]
> > postgres: startup recovering
> > 0000000100000002000000FD(+0x434132)[0x556a45706132]
> > postgres: startup recovering
> > 0000000100000002000000FD(proc_exit+0x59)[0x556a45706079]
> > postgres: startup recovering
> > 0000000100000002000000FD(errfinish+0x278)[0x556a45884708]
> > postgres: startup recovering
> > 0000000100000002000000FD(LockAcquireExtended+0xa46)[0x556a45719386]
> > postgres: startup recovering
> >
> 0000000100000002000000FD(StandbyAcquireAccessExclusiveLock+0x11d)[0x556a4571330d]
> > postgres: startup recovering
> > 0000000100000002000000FD(standby_redo+0x70)[0x556a45713690]
> > postgres: startup recovering
> > 0000000100000002000000FD(PerformWalRecovery+0x7b3)[0x556a4547d313]
> > postgres: startup recovering
> > 0000000100000002000000FD(StartupXLOG+0xac3)[0x556a4546dae3]
> > postgres: startup recovering
> > 0000000100000002000000FD(StartupProcessMain+0xe8)[0x556a45693558]
> > postgres: startup recovering
> > 0000000100000002000000FD(+0x3ba95d)[0x556a4568c95d]
> > postgres: startup recovering
> > 0000000100000002000000FD(+0x3bce41)[0x556a4568ee41]
> > postgres: startup recovering
> > 0000000100000002000000FD(PostmasterMain+0x116e)[0x556a4568eaae]
> > postgres: startup recovering
> > 0000000100000002000000FD(+0x2f960e)[0x556a455cb60e]
> > /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f10ef042c8a]
> > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f10ef042d45]
> > postgres: startup recovering
> > 0000000100000002000000FD(_start+0x21)[0x556a453af011]
> > 2024-08-29 21:44:32.641 EEST [668324] LOG:  startup process (PID 668327)
> > was terminated by signal 6: Aborted
> > 2024-08-29 21:44:32.641 EEST [668324] LOG:  terminating any other active
> > server processes
> > 2024-08-29 21:44:32.654 EEST [668324] LOG:  shutting down due to startup
> > process failure
> > 2024-08-29 21:44:32.729 EEST [668324] LOG:  database system is shut down
> >
> > Granted, if you restart the server, it will probably succeed because
> > restarting the server will kill all the other queries that were holding
> > locks. But yuck. With assertions disabled, it looks a little less scary,
> > but not nice anyway.
> >
> > So how to improve this? I see a few options:
> >
> > a) Downgrade the error at startup to a warning, and allow starting the
> > standby with smaller settings in standby. At least with a smaller
> > max_locks_per_transactions. The other settings also affect the size of
> > known-assigned XIDs array, but if the CSN snapshots get committed, that
> > will get fixed. In most cases there is enough lock memory anyway, and it
> > will be fine. Just fix the assertion failure so that the error message
> > is a little nicer.
> >
> > b) If you run out of lock space, kill running queries, and prevent new
> > ones from starting. Track the locks in startup process' private memory
> > until there is enough space in the lock manager, and then re-open for
> > queries. In essence, go from hot standby mode to warm standby, until
> > it's possible to go back to hot standby mode again.
> >
> > Thoughts, better ideas?
> >
> > [1] https://commitfest.postgresql.org/49/4912/
> >
> > --
> > Heikki Linnakangas
> > Neon (https://neon.tech)
> >
> >
>
> Hello! Do you intend to pursue this further?
>
> --
> Best regards,
> Kirill Reshke
>
>
>

Reply via email to