* Chen, Tim C <[EMAIL PROTECTED]> wrote: > Ingo Molnar wrote: > > > > If you'd like to profile this yourself then the lowest-cost way of > > profiling lock contention on -rt is to use the yum kernel and run the > > attached trace-it-lock-prof.c code on the box while your workload is > > in 'steady state' (and is showing those extended idle times): > > > > ./trace-it-lock-prof > trace.txt > > > > this captures up to 1 second worth of system activity, on the current > > CPU. Then you can construct the histogram via: > > > > grep -A 1 ' __schedule()<-' trace.txt | cut -d: -f2- | sort | > > uniq -c | sort -n > prof.txt > > > > I did lock profiling on Volanomark as suggested and obtained the > profile that is listed below.
thanks - this is really useful! > 246 > __sched_text_start()<-schedule()<-rt_spin_lock_slowlock()<-__lock_text_start() > 264 rt_mutex_slowunlock()<-rt_mutex_unlock()<-rt_up_read()<-(-1)() > 334 > __sched_text_start()<-schedule()<-posix_cpu_timers_thread()<-kthread() > 437 __sched_text_start()<-schedule()<-do_futex()<-sys_futex() > 467 (-1)()<-(0)()<-(0)()<-(0)() > 495 > __sched_text_start()<-preempt_schedule()<-__spin_unlock_irqrestore()<-rt_mutex_adjust_prio() > 497 __netif_rx_schedule()<-netif_rx()<-loopback_xmit()<-(-1)() > 499 __sched_text_start()<-schedule()<-schedule_timeout()<-sk_wait_data() > 500 tcp_recvmsg()<-sock_common_recvmsg()<-sock_recvmsg()<-(-1)() > 503 __rt_down_read()<-rt_down_read()<-do_futex()<-(-1)() > 1160 __sched_text_start()<-schedule()<-ksoftirqd()<-kthread() > 1433 __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() > 1497 child_rip()<-(-1)()<-(0)()<-(0)() > 1936 > __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() > > Looks like the idle time I saw was due to lock contention during call > to futex_wake, which requires acquisition of current->mm->mmap_sem. > Many of the java threads share mm and result in concurrent access to > common mm. [...] ah. This explains why i'm not seeing this bad contention in a comparable workload (hackbench.c): because hackbench uses processes not threads. > [...] Looks like under rt case there is no special treatment to read > locking so the read lock accesses are contended under __rt_down_read. > For non rt case, __down_read makes the distinction for read lock > access and the read lockings do not contend. yeah, makes sense. I'll do something about this. > Things are made worse here as this delayed waking up processes locked > by the futex. See also a snippet of the latency_trace below. > > <idle>-0 2D..2 5821us!: thread_return <softirq--31> (150 20) > <idle>-0 2DN.1 6278us : > __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() > <idle>-0 2DN.1 6278us : (0)()<-(0)()<-(0)()<-(0)() > java-6648 2D..2 6280us+: thread_return <<idle>-0> (20 -4) > java-6648 2D..1 6296us : > try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slowunlock() > java-6648 2D..1 6296us : > rt_mutex_unlock()<-rt_up_read()<-do_futex()<-(-1)() > java-6648 2D..2 6297us : effective_prio <<...>-6673> (-4 -4) > java-6648 2D..2 6297us : __activate_task <<...>-6673> (-4 1) > java-6648 2.... 6297us < (-11) > java-6648 2.... 6298us+> sys_futex (0000000000afaf50 0000000000000001 > 0000000000000001) > java-6648 2...1 6315us : > __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() > java-6648 2...1 6315us : > __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() > java-6648 2D..2 6316us+: deactivate_task <java-6648> (-4 1) > <idle>-0 2D..2 6318us+: thread_return <java-6648> (-4 20) > <idle>-0 2DN.1 6327us : > __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() > <idle>-0 2DN.1 6328us+: (0)()<-(0)()<-(0)()<-(0)() > java-6629 2D..2 6330us+: thread_return <<idle>-0> (20 -4) > java-6629 2D..1 6347us : > try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slowunlock() > java-6629 2D..1 6347us : > rt_mutex_unlock()<-rt_up_read()<-futex_wake()<-(-1)() > java-6629 2D..2 6348us : effective_prio <java-6235> (-4 -4) > java-6629 2D..2 6349us : __activate_task <java-6235> (-4 1) > java-6629 2.... 6350us+< (0) > java-6629 2.... 6352us+> sys_futex (0000000000afc1dc 0000000000000001 > 0000000000000001) > java-6629 2...1 6368us : > __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() > java-6629 2...1 6368us : > __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() > java-6629 2D..2 6369us+: deactivate_task <java-6629> (-4 1) > <idle>-0 2D..2 6404us!: thread_return <java-6629> (-4 20) > <idle>-0 2DN.1 6584us : > __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() indeed - basically the mm semaphore is a common resource here. I suspect you'll see somewhat better numbers by using idle=poll or idle=mwait (or are using those options already?). (could you send me the whole trace if you still have it? It would be interesting to see a broader snippet from the life of individual java threads.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/