Hi all, I've an environment 9.4 + bdr: PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
kernel version: 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1 x86_64 GNU/Linux This is consolidation databases, in this machine there are around 250+ wal sender processes. top output revealed high system cpu: %Cpu(s): 1.4 us, 49.7 sy, 0.0 ni, 48.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st profiling cpu with perf: perf top -e cpu-clock Events: 142K cpu-clock 82.37% [kernel] [k] __mutex_lock_common.isra.5 4.49% [kernel] [k] do_raw_spin_lock 2.23% [kernel] [k] mutex_lock 2.16% [kernel] [k] mutex_unlock 2.12% [kernel] [k] arch_local_irq_restore 1.73% postgres [.] ValidXLogRecord 0.87% [kernel] [k] __mutex_unlock_slowpath 0.78% [kernel] [k] arch_local_irq_enable 0.63% [kernel] [k] sys_recvfrom finally get which processes (wal senders) that are using mutexes: perf top -e task-clock -p 55382 Events: 697 task-clock 88.08% [kernel] [k] __mutex_lock_common.isra.5 3.27% [kernel] [k] do_raw_spin_lock 2.34% [kernel] [k] arch_local_irq_restore 2.10% postgres [.] ValidXLogRecord 1.87% [kernel] [k] mutex_unlock 1.87% [kernel] [k] mutex_lock 0.47% [kernel] [k] sys_recvfrom strace output of wal sender process: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 98.30 1.030072 5 213063 201463 read 1.69 0.017686 0 201464 201464 recvfrom 0.01 0.000110 0 806 lseek 0.00 0.000043 0 474 468 rt_sigreturn 0.00 0.000000 0 6 open 0.00 0.000000 0 6 close ------ ----------- ----------- --------- --------- ---------------- 100.00 1.047911 415819 403395 total strace detail, majority read from pipe and recvfrom from socket (but most of them are EAGAIN): read(15, "~\320\5\0\1\0\0\0\0@\235\1\360\16\0\0\334\26\0\0\0\0\0\0\365\27\0\0\0\0\0\0"..., 8192) = 8192 <0.000025> read(6, 0x7fffdd837b3f, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000116> recvfrom(10, 0x7fffdd837b17, 1, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000049> ls -l /proc/62388/fd/15 lr-x------ 1 postgres postgres 64 Oct 1 08:39 /proc/62388/fd/15 -> /data/pg_xlog/0000000100000EF000000061 ls -l /proc/62388/fd/6 lr-x------ 1 postgres postgres 64 Oct 1 08:39 /proc/62388/fd/6 -> pipe:[1090892506] ls -l /proc/62388/fd/10 lrwx------ 1 postgres postgres 64 Oct 1 08:39 /proc/62388/fd/10 -> socket:[1096584060] I wonder, is there kernel version has better handling mutexes? Or is it the expected behavior? Sorry for cross-posting, I have posted the same on pgsql-performance too.... -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab