Hi, while debugging a performance issue I noticed the following very odd bit.
Kernel: 4.1.0-rc7-andres-00049-ge64f638 CPU: 2xE5520 Workload: postgresql (multi-process via shared memory SQL server) and benchmark client (pgbench, multi threaded) running on the same server. Connected using unix sockets. The statements are relatively simple (~1.5ms on average), forcing frequent back/forth between server processes and pgbench threads. I found that disabling sched_autogroup *significantly* reduces throughput. Even when both server and client are started from the same terminal and thus should be in the same group! There's a significant difference in how %sys with autogroups enabled/disabled. ~8% v ~27%. That sounds too much. The difference in postgres performance is significant: enabled: progress: 1.0 s, 25857.2 tps, lat 1.171 ms stddev 3.314 progress: 2.0 s, 19582.5 tps, lat 1.586 ms stddev 5.349 progress: 3.0 s, 17796.3 tps, lat 1.804 ms stddev 7.364 progress: 4.0 s, 18736.7 tps, lat 1.696 ms stddev 7.612 progress: 5.0 s, 19269.9 tps, lat 1.640 ms stddev 6.835 progress: 6.0 s, 21389.6 tps, lat 1.467 ms stddev 5.663 progress: 7.0 s, 22153.8 tps, lat 1.417 ms stddev 8.417 progress: 8.0 s, 21550.7 tps, lat 1.487 ms stddev 7.460 progress: 9.0 s, 22602.3 tps, lat 1.401 ms stddev 6.738 progress: 10.0 s, 23475.5 tps, lat 1.327 ms stddev 7.111 tps = 21329.219141 (excluding connections establishing) progress: 1.0 s, 25092.5 tps, lat 1.255 ms stddev 0.511 progress: 2.0 s, 13322.0 tps, lat 2.377 ms stddev 5.435 progress: 3.0 s, 13903.0 tps, lat 2.298 ms stddev 5.075 progress: 4.0 s, 13734.0 tps, lat 2.328 ms stddev 5.146 progress: 5.0 s, 14015.0 tps, lat 2.281 ms stddev 5.099 progress: 6.0 s, 14034.0 tps, lat 2.277 ms stddev 5.002 progress: 7.0 s, 13930.0 tps, lat 2.297 ms stddev 5.136 progress: 8.0 s, 14015.0 tps, lat 2.279 ms stddev 5.161 progress: 9.0 s, 14000.0 tps, lat 2.287 ms stddev 5.117 progress: 10.0 s, 13900.6 tps, lat 2.299 ms stddev 5.142 tps = 15006.317841 (excluding connections establishing) (note the odd dip after the first second. That's what I was actually debugging) A perf diff of -g -a -e cycles:k shows: # Baseline Delta Shared Object Symbol # ........ ....... ................... .............................................. # 11.70% +3.87% [kernel.vmlinux] [k] unix_stream_recvmsg 6.74% +2.49% [kernel.vmlinux] [k] __fget 6.69% +4.31% [kernel.vmlinux] [k] sys_recvfrom 6.49% +1.77% [kernel.vmlinux] [k] system_call_after_swapgs 5.37% +1.39% [kernel.vmlinux] [k] _raw_spin_lock 4.63% +1.76% [kernel.vmlinux] [k] mutex_lock_interruptible 4.27% +1.27% [kernel.vmlinux] [k] mutex_unlock 3.43% +1.22% [kernel.vmlinux] [k] fput 3.41% -0.21% [kernel.vmlinux] [k] __fget_light 2.96% +1.33% libpthread-2.19.so [.] __libc_recv 2.56% +2.35% [kernel.vmlinux] [k] system_call_fastpath 2.36% +0.80% [kernel.vmlinux] [k] import_single_range 2.30% +0.82% [kernel.vmlinux] [k] sockfd_lookup_light Looking a bit into the profile shows an apparent difference in how often recv goes through the fastpath: enabled: - 6.55% pgbench [kernel.vmlinux] [k] sys_recvfrom - sys_recvfrom + 85.88% system_call_fastpath + 14.12% __libc_recv disabled: - 10.97% pgbench [kernel.vmlinux] [k] sys_recvfrom - sys_recvfrom + 74.16% system_call_fastpath + 25.84% __libc_recv Is that actually expected or is there a bug somewhere? The difference seems to be a bit large to me. If interesting, I can try to script the workload in a reproducible manner. Regards, Andres -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/