All my sar statistics sar -r 11:40:02 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit 01:15:01 PM 269108 32608084 99.18 367144 29707240 10289444 27.83 01:20:01 PM 293560 32583632 99.11 367428 29674272 10287136 27.82 01:25:01 PM 417640 32459552 98.73 366148 29563220 10289220 27.83 01:30:01 PM 399448 32477744 98.79 366812 29596520 10298876 27.85 01:35:01 PM 432332 32444860 98.69 367412 29616524 10277484 27.80
sar -d -p 11:40:02 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 01:15:01 PM vg_root-lv_pgsql 73.10 116.59 540.15 8.98 6.98 95.44 1.61 11.78 01:20:01 PM vg_root-lv_pgsql 71.39 170.21 508.21 9.50 5.44 76.23 1.72 12.31 01:25:01 PM vg_root-lv_pgsql 54.32 136.21 381.53 9.53 3.58 65.98 1.81 9.85 01:30:01 PM vg_root-lv_pgsql 81.35 167.98 585.25 9.26 8.15 100.13 1.63 13.25 01:35:01 PM vg_root-lv_pgsql 66.75 126.02 482.72 9.12 5.59 83.73 1.78 11.90 sar -u ALL 11:40:02 AM CPU %usr %nice %sys %iowait %steal %irq %soft %guest %idle 01:15:01 PM all 8.57 0.00 1.52 1.46 0.00 0.00 0.05 0.00 88.40 01:20:01 PM all 8.50 0.00 1.53 1.61 0.00 0.00 0.05 0.00 88.31 01:25:01 PM all 9.00 0.00 1.78 1.27 0.00 0.00 0.06 0.00 87.89 01:30:01 PM all 9.58 0.00 1.63 1.71 0.00 0.00 0.06 0.00 87.01 01:35:01 PM all 8.75 0.00 1.47 1.57 0.00 0.00 0.06 0.00 88.15 As you say i install perf and get statistics with command perf record -g -f -u postgres -e block:block_rq_*,syscalls:sys_enter_write,syscalls:sys_enter_fsync But i really don't understand perf report, what values i need to see. Could you help me with advice how to read perf report. What events from perf list i shoud trace, and what the good and bad values in this report depend of my hardware? 2013/11/7 Merlin Moncure <mmonc...@gmail.com> > On Sat, Nov 2, 2013 at 1:54 PM, Евгений Селявка <evg.selya...@gmail.com> > wrote: > > Please help with advice! > > > > Server > > HP ProLiant BL460c G1 > > > > Architecture: x86_64 > > CPU op-mode(s): 32-bit, 64-bit > > Byte Order: Little Endian > > CPU(s): 8 > > On-line CPU(s) list: 0-7 > > Thread(s) per core: 1 > > Core(s) per socket: 4 > > CPU socket(s): 2 > > NUMA node(s): 1 > > Vendor ID: GenuineIntel > > CPU family: 6 > > Model: 23 > > Stepping: 6 > > CPU MHz: 3000.105 > > BogoMIPS: 6000.04 > > Virtualization: VT-x > > L1d cache: 32K > > L1i cache: 32K > > L2 cache: 6144K > > NUMA node0 CPU(s): 0-7 > > > > 32GB RAM > > [root@db3 ~]# numactl --hardware > > available: 1 nodes (0) > > node 0 cpus: 0 1 2 3 4 5 6 7 > > node 0 size: 32765 MB > > node 0 free: 317 MB > > node distances: > > node 0 > > 0: 10 > > > > > > RAID1 2x146GB 10k rpm > > > > CentOS release 6.3 (Final) > > Linux 2.6.32-279.11.1.el6.x86_64 #1 SMP x86_64 GNU/Linux > > > > > > kernel.msgmnb = 65536 > > kernel.msgmax = 65536 > > kernel.shmmax = 68719476736 > > kernel.shmall = 4294967296 > > vm.swappiness = 30 > > vm.dirty_background_bytes = 67108864 > > vm.dirty_bytes = 536870912 > > > > > > PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 > > 20120305 (Red Hat 4.4.6-4), 64-bit > > > > listen_addresses = '*' > > port = 5433 > > max_connections = 350 > > shared_buffers = 8GB > > temp_buffers = 64MB > > max_prepared_transactions = 350 > > work_mem = 256MB > > maintenance_work_mem = 1GB > > max_stack_depth = 4MB > > max_files_per_process = 5000 > > effective_io_concurrency = 2 > > wal_level = hot_standby > > synchronous_commit = off > > checkpoint_segments = 64 > > checkpoint_timeout = 15min > > checkpoint_completion_target = 0.75 > > max_wal_senders = 4 > > wal_sender_delay = 100ms > > wal_keep_segments = 128 > > random_page_cost = 3.0 > > effective_cache_size = 18GB > > autovacuum = on > > autovacuum_max_workers = 5 > > autovacuum_vacuum_threshold = 900 > > autovacuum_analyze_threshold = 350 > > autovacuum_vacuum_scale_factor = 0.1 > > autovacuum_analyze_scale_factor = 0.05 > > log_min_duration_statement = 500 > > deadlock_timeout = 1s > > > > > > DB size is about 20GB. There is no high write activity on DB. But > > periodically in postgresql log i see for example: "select 1" duration is > > about 500-1000 ms. > > > > In this period of time response time from db terribly. This period of > time > > not bound with high traffic. It is not other app on the server. There is > not > > specific cron job on server. > > > > Our app written on java and use jdbc to connect to DB and internal > pooling. > > There is about 100 connection to DB. This is sar output: > > > > 12:00:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s > > pgscand/s pgsteal/s %vmeff > > 09:30:01 PM 73.17 302.72 134790.16 0.00 46809.73 0.00 > > 0.00 0.00 0.00 > > 09:35:01 PM 63.42 655.80 131740.74 0.00 46182.74 0.00 > > 0.00 0.00 0.00 > > 09:40:01 PM 76.87 400.62 122375.34 0.00 42096.27 0.00 > > 0.00 0.00 0.00 > > 09:45:01 PM 58.49 198.33 121922.86 0.00 42765.27 0.00 > > 0.00 0.00 0.00 > > 09:50:01 PM 52.21 485.45 136775.65 0.15 49098.65 0.00 > > 0.00 0.00 0.00 > > 09:55:01 PM 49.68 476.75 130159.24 0.00 45192.54 0.00 > > 0.00 0.00 0.00 > > 10:00:01 PM 41.35 295.34 118655.80 0.00 40786.52 0.00 > > 0.00 0.00 0.00 > > 10:05:01 PM 60.84 593.85 129890.83 0.00 44170.92 0.00 > > 0.00 0.00 0.00 > > 10:10:01 PM 52.08 471.36 132773.63 0.00 46019.13 0.00 > > 2.41 2.41 100.00 > > 10:15:01 PM 73.93 196.50 129384.21 0.33 45255.76 65.92 > > 1.19 66.87 99.64 > > 10:20:02 PM 70.35 473.16 121940.38 0.11 44061.52 81.95 > > 37.79 119.42 99.73 > > 10:25:01 PM 57.84 471.69 130583.33 0.01 46093.33 0.00 > > 0.00 0.00 0.00 > > 10:30:01 PM 52.91 321.62 119264.34 0.01 41748.19 0.00 > > 0.00 0.00 0.00 > > 10:35:01 PM 47.13 451.78 114625.62 0.02 40600.98 0.00 > > 0.00 0.00 0.00 > > 10:40:01 PM 48.96 472.41 102352.79 0.00 35402.17 0.00 > > 0.00 0.00 0.00 > > 10:45:01 PM 70.07 321.33 121423.02 0.00 43052.04 0.00 > > 0.00 0.00 0.00 > > 10:50:01 PM 46.78 479.95 128938.09 0.02 37864.07 116.64 > > 48.97 165.07 99.67 > > 10:55:02 PM 104.84 453.55 109189.98 0.00 37583.50 0.00 > > 0.00 0.00 0.00 > > 11:00:01 PM 46.23 248.75 107313.26 0.00 37278.10 0.00 > > 0.00 0.00 0.00 > > 11:05:01 PM 44.28 446.41 115598.61 0.01 40070.61 0.00 > > 0.00 0.00 0.00 > > 11:10:01 PM 38.86 457.32 100240.71 0.00 34407.29 0.00 > > 0.00 0.00 0.00 > > 11:15:01 PM 48.23 275.60 104780.84 0.00 36183.84 0.00 > > 0.00 0.00 0.00 > > 11:20:01 PM 92.74 432.49 114698.74 0.01 40413.14 0.00 > > 0.00 0.00 0.00 > > 11:25:01 PM 42.76 428.50 87769.28 0.00 29379.87 0.00 > > 0.00 0.00 0.00 > > 11:30:01 PM 36.83 260.34 85072.46 0.00 28234.50 0.00 > > 0.00 0.00 0.00 > > 11:35:01 PM 62.52 481.56 93150.67 0.00 31137.13 0.00 > > 0.00 0.00 0.00 > > 11:40:01 PM 43.50 459.10 90407.34 0.00 30241.70 0.00 > > 0.00 0.00 0.00 > > > > 12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit > > %commit > > 09:30:01 PM 531792 32345400 98.38 475504 29583340 10211064 > > 27.62 > > 09:35:01 PM 512096 32365096 98.44 475896 29608660 10200916 > > 27.59 > > 09:40:01 PM 455584 32421608 98.61 476276 29638952 10211652 > > 27.62 > > 09:45:01 PM 425744 32451448 98.71 476604 29662384 10206044 > > 27.60 > > 09:50:01 PM 380960 32496232 98.84 477004 29684296 10243704 > > 27.71 > > 09:55:01 PM 385644 32491548 98.83 477312 29706940 10204776 > > 27.60 > > 10:00:01 PM 348604 32528588 98.94 477672 29725476 10228984 > > 27.67 > > 10:05:01 PM 279216 32597976 99.15 478104 29751016 10281748 > > 27.81 > > 10:10:01 PM 255168 32622024 99.22 478220 29769924 10247404 > > 27.72 > > 10:15:01 PM 321188 32556004 99.02 475124 29721912 10234500 > > 27.68 > > 10:20:02 PM 441660 32435532 98.66 472336 29610476 10246288 > > 27.71 > > 10:25:01 PM 440636 32436556 98.66 472636 29634960 10219940 > > 27.64 > > 10:30:01 PM 469872 32407320 98.57 473016 29651476 10208520 > > 27.61 > > 10:35:01 PM 414540 32462652 98.74 473424 29672728 10223964 > > 27.65 > > 10:40:01 PM 354632 32522560 98.92 473772 29693016 10247752 > > 27.72 > > 10:45:01 PM 333708 32543484 98.98 474092 29720256 10227204 > > 27.66 > > 10:50:01 PM 528004 32349188 98.39 469396 29549832 10219536 > > 27.64 > > 10:55:02 PM 499068 32378124 98.48 469692 29587140 10204836 > > 27.60 > > 11:00:01 PM 462980 32414212 98.59 470032 29606764 10235820 > > 27.68 > > 11:05:01 PM 449540 32427652 98.63 470368 29626136 10209788 > > 27.61 > > 11:10:01 PM 419984 32457208 98.72 470772 29644248 10214480 > > 27.63 > > 11:15:01 PM 429900 32447292 98.69 471104 29664292 10202344 > > 27.59 > > 11:20:01 PM 394852 32482340 98.80 471528 29698052 10207604 > > 27.61 > > 11:25:01 PM 345328 32531864 98.95 471904 29717264 10215632 > > 27.63 > > 11:30:01 PM 368224 32508968 98.88 472236 29733544 10206468 > > 27.61 > > 11:35:01 PM 321800 32555392 99.02 472528 29758548 10211820 > > 27.62 > > 11:40:01 PM 282520 32594672 99.14 472860 29776952 10243516 > > 27.71 > > > > 12:00:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz > > await svctm %util > > 09:30:01 PM dev253-5 66.29 146.33 483.33 9.50 6.27 > > 94.53 2.08 13.78 > > 09:35:01 PM dev253-5 154.80 126.85 1192.96 8.53 28.57 > > 184.59 1.45 22.43 > > 09:40:01 PM dev253-5 92.21 153.75 686.75 9.11 11.53 > > 125.00 1.87 17.21 > > 09:45:01 PM dev253-5 39.66 116.99 279.32 9.99 0.42 > > 10.66 2.61 10.36 > > 09:50:01 PM dev253-5 106.73 95.58 820.70 8.58 16.77 > > 157.12 1.68 17.88 > > 09:55:01 PM dev253-5 107.90 99.36 831.46 8.63 16.05 > > 148.76 1.71 18.42 > > 10:00:01 PM dev253-5 62.48 82.70 471.28 8.87 5.91 > > 94.52 2.10 13.11 > > 10:05:01 PM dev253-5 137.84 121.69 1064.03 8.60 24.48 > > 177.31 1.56 21.52 > > 10:10:01 PM dev253-5 107.93 104.16 827.83 8.64 16.69 > > 155.04 1.68 18.11 > > 10:15:01 PM dev253-5 40.55 126.12 277.57 9.96 0.41 > > 10.13 2.57 10.42 > > 10:20:02 PM dev253-5 104.33 136.77 793.49 8.92 16.97 > > 162.69 1.76 18.35 > > 10:25:01 PM dev253-5 108.04 115.36 825.26 8.71 16.68 > > 154.36 1.76 19.05 > > 10:30:01 PM dev253-5 69.72 105.66 523.05 9.02 7.45 > > 106.92 1.90 13.25 > > 10:35:01 PM dev253-5 101.58 91.59 781.85 8.60 15.00 > > 147.68 1.67 16.97 > > 10:40:01 PM dev253-5 107.50 97.91 827.17 8.61 17.68 > > 164.49 1.77 19.06 > > 10:45:01 PM dev253-5 69.98 140.13 519.57 9.43 7.09 > > 101.25 1.96 13.72 > > 10:50:01 PM dev253-5 104.30 83.31 806.12 8.53 16.18 > > 155.10 1.65 17.16 > > 10:55:02 PM dev253-5 106.86 209.65 790.27 9.36 15.59 > > 145.08 1.74 18.60 > > 11:00:01 PM dev253-5 50.42 92.08 371.52 9.19 3.05 > > 62.16 2.28 11.52 > > 11:05:01 PM dev253-5 101.06 88.31 776.57 8.56 15.12 > > 149.58 1.67 16.90 > > 11:10:01 PM dev253-5 103.08 77.73 798.23 8.50 17.14 > > 166.25 1.74 17.90 > > 11:15:01 PM dev253-5 57.74 96.45 428.62 9.09 5.23 > > 90.52 2.13 12.32 > > 11:20:01 PM dev253-5 97.73 185.18 727.38 9.34 14.64 > > 149.84 1.94 18.92 > > 11:25:01 PM dev253-5 95.03 85.52 730.31 8.58 14.42 > > 151.79 1.79 16.97 > > 11:30:01 PM dev253-5 53.76 73.65 404.47 8.89 3.94 > > 73.25 2.17 11.64 > > 11:35:01 PM dev253-5 110.37 125.05 842.17 8.76 16.96 > > 153.63 1.66 18.30 > > 11:40:01 PM dev253-5 103.93 87.00 801.59 8.55 16.01 > > 154.00 1.73 18.00 > > > > As you can see there is no high io activity in this period of time but > db is > > frozen. My opinion that i have incorrect kernel setting and/or i have a > > mistake in postgresql.conf. Because there is not high activity on db. > load > > avg is about 1. When there is high traffic is about 1.15. This is from > > nagios monitoring system. > > > > But sometimes load is about 4 and this time matches with sar %vmeff = > 100% > > and database response time increase. > > > Need to see: iowait, system load. > > Also consider installing perf and grabbing a profile while issue is > happening. > > Probably this problem will go way with 2GB shared buffers, but before > doing that we'd like to diagnose this if possible. > > merlin > -- С уважением Селявка Евгений