Hello,

We are experiencing a high system cpu issue with our database that we can't
quite explain.

Description of the issue:
The database instance cpu suddenly increases to over 90%, the majority of
it as reported by RDS Enhanced Monitoring being system cpu, as opposed to
coming from Postgres itself. IOPS usage barely changes, plenty of headroom.

The issue mostly manifests near the end of the "scanning heap" phase of
vacuuming of one of our largest tables, we'll call table1. RDS Performance
Insights reports that selects on table1 start to wait on cpu, where
previously it didn't even show up in the top 25 queries by wait. It doesn't
always happen, but if there is a larger than usual number of selects on
table1 it is more likely to happen.


Example Timeline:
00:00 vacuum starts on table1 (has at least 10 million dead tuples as
reported by pg_stat_user_tables)
02:00 database cpu p90 12%, RDS Performance Insights CPU waits 11%, IOPS
under 100k, table1 not in top 25 waits in RDS Performance Insights, vacuum
reports 6GB out of 3TB remaining to scan
02:01 database cpu p90 98%, RDS Performance Insights CPU waits 73%, IOPS
under 100k, table1 selects now #1, #2, #6 in RDS Performance Insights
making up the majority of the cpu waits
02:42 vacuum on table1 finishes, database cpu p90 11%, RDS Performance
Insights CPU waits 12%, IOPS under 100k, table1 not in top 25 waits in RDS
Performance Insights


Installation Details:
  AWS RDS Managed Instance
  Postgres Version: PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by gcc
(GCC) 7.3.1 20180712 (Red Hat 7.3.1-17), 64-bit
  Instance Class: r7i.48xlarge
  vCPU: 192
  RAM: 1536 GB
  Storage Type: Provisioned IOPS SSD (io2), 180000 IOPS
  Application Language/Framework: Ruby on Rails
  Pooler: PgBouncer 1.17.0

Config:
|name                                       |current_setting

|-------------------------------------------|---------------------------------------
|autovacuum_analyze_scale_factor            |0.005

|autovacuum_freeze_max_age                  |500000000

|autovacuum_max_workers                     |24

|autovacuum_naptime                         |15s

|autovacuum_vacuum_cost_delay               |1ms

|autovacuum_vacuum_cost_limit               |2000

|autovacuum_vacuum_scale_factor             |0.01

|autovacuum_work_mem                        |48756174kB

|checkpoint_completion_target               |0.9

|checkpoint_timeout                         |1h

|default_toast_compression                  |lz4

|effective_cache_size                       |780098792kB

|enable_presorted_aggregate                 |on

|fsync                                      |on

|full_page_writes                           |on

|gss_accept_delegation                      |off

|hot_standby                                |off

|huge_pages                                 |on

|icu_validation_level                       |error

|idle_in_transaction_session_timeout        |30s

|ignore_invalid_pages                       |off

|jit                                        |off

|lo_compat_privileges                       |off

|log_autovacuum_min_duration                |0

|log_checkpoints                            |on

|log_connections                            |on

|log_disconnections                         |on

|log_min_duration_statement                 |10ms

|log_rotation_age                           |1h

|log_temp_files                             |0

|log_timezone                               |UTC

|log_truncate_on_rotation                   |off

|logging_collector                          |on

|maintenance_work_mem                       |25577009kB

|max_connections                            |5000

|max_locks_per_transaction                  |64

|max_logical_replication_workers            |10

|max_parallel_apply_workers_per_subscription|2

|max_parallel_workers                       |96

|max_prepared_transactions                  |0

|max_replication_slots                      |10

|max_slot_wal_keep_size                     |400000MB

|max_stack_depth                            |6MB

|max_standby_archive_delay                  |-1

|max_standby_streaming_delay                |2147483647ms

|max_wal_senders                            |25

|max_wal_size                               |200000MB

|max_worker_processes                       |20

|min_wal_size                               |128MB

|random_page_cost                           |1.05

|recovery_init_sync_method                  |syncfs

|recovery_prefetch                          |off

|remove_temp_files_after_crash              |off

|reserved_connections                       |2

|scram_iterations                           |4096

|send_abort_for_crash                       |off

|send_abort_for_kill                        |off

|session_preload_libraries                  |

|shared_buffers                             |390049392kB

|shared_preload_libraries
|rdsutils,pg_tle,pg_stat_statements
|synchronous_commit                         |on

|track_activity_query_size                  |4kB

|track_functions                            |pl

|track_io_timing                            |on

|track_wal_io_timing                        |on

|vacuum_buffer_usage_limit                  |399410kB

|vacuum_cost_page_miss                      |5

|vacuum_failsafe_age                        |1200000000

|vacuum_multixact_failsafe_age              |1200000000

|wal_compression                            |lz4

|wal_keep_size                              |400000MB

|wal_level                                  |logical

|wal_receiver_create_temp_slot              |off

|wal_receiver_timeout                       |15min

|wal_sender_timeout                         |2min


Large Table Stats:
table1:
    rows: 2,303,386,400
    disk space: 3.5T
    rel size: 3T
    FSM size: 758 MB
    visibility map size: 94 MB

table2:
    rows: 1,919,360,500
    disk space: 1.6T
    rel size: 758G
    FSM size: 189 MB
    visibility map size: 24 MB

table3:
    rows: 2,412,903,000
    disk space: 741G
    rel size: 387G
    FSM size: 96 MB
    visibility map size: 12 MB


Any suggestions or insights greatly appreciated!

Joshua Banton

Reply via email to