[i]Hello Eric, lockstat shows following results : [/i]
Adaptive mutex spin: 37048 events in 10.043 seconds (3689 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- [b] 9698 26% 26% 0.00 1 0xffffffff8cb3ce78 cfg_process_cmd_ring+0x36 6618 18% 44% 0.00 2 0xffffffff8cb3ce78 cfg_nic_xmit_frame+0x139 4962 13% 57% 0.00 4 0xffffffff8cb3ce78 cfg_nic_xmit_frame+0x353 4298 12% 69% 0.00 2 0xffffffff8cb3ce78 cfg_process_cmd_ring+0x2e4[/b] 2630 7% 76% 0.00 1 0xffffffff86662440 tcp_send_data+0x111 1845 5% 81% 0.00 4 0xffffffff8cb2d2a8 put_tx_handle+0x20 1618 4% 85% 0.00 1 0xffffffff86662440 ipcl_classify_v4+0x142 1413 4% 89% 0.00 1 0xffffffff86662440 tcp_wput+0x4b 1384 4% 93% 0.00 1 0xffffffff8005a380 kmem_cache_free+0x5f Adaptive mutex block: 8706 events in 10.043 seconds (867 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- [b] 4169 48% 48% 0.00 7381 0xffffffff8cb3ce78 cfg_process_cmd_ring+0x36[/b] 1384 16% 64% 0.00 7332 0xffffffff8005a380 kmem_cache_free+0x5f [b] 850 10% 74% 0.00 17150 0xffffffff8cb3ce78 cfg_nic_xmit_frame+0x139[/b] 707 8% 82% 0.00 7299 0xffffffff86662440 ipcl_classify_v4+0x142 483 6% 87% 0.00 7267 0xffffffff80186380 kmem_cache_free+0x5f [b] 436 5% 92% 0.00 17671 0xffffffff8cb3ce78 cfg_nic_xmit_frame+0x353[/b] 258 3% 95% 0.00 19439 0xffffffff86662440 tcp_wput+0x4b 144 2% 97% 0.00 7333 0xffffffff815b7f00 squeue_enter_chain+0x40 52 1% 97% 0.00 19470 0xffffffff86662440 tcp_send_data+0x111 Profiling interrupt: 80120 events in 10.045 seconds (7976 events/sec) Count indv cuml rcnt nsec CPU+PIL Caller ------------------------------------------------------------------------------- 10007 12% 12% 0.00 9618 cpu[2] cpu_halt 9991 12% 25% 0.00 6432 cpu[4] cpu_halt 9975 12% 37% 0.00 5259 cpu[3] cpu_halt 9935 12% 50% 0.00 5389 cpu[7] cpu_halt 9904 12% 62% 0.00 6184 cpu[0] cpu_halt 9475 12% 86% 0.00 5435 cpu[1] cpu_halt 1145 1% 87% 0.00 4647 cpu[6] kcopy 649 1% 88% 0.00 9948 cpu[6]+6 ddi_get32 [b] 648 1% 89% 0.00 17172 cpu[6] cpu_halt[/b] 548 1% 90% 0.00 4498 cpu[6] mutex_enter 373 0% 90% 0.00 12130 cpu[6]+4 mutex_enter 337 0% 91% 0.00 11240 cpu[6]+6 intr_thread 322 0% 91% 0.00 4599 cpu[6] bcopy 228 0% 91% 0.00 4628 cpu[6] bzero 223 0% 92% 0.00 4542 cpu[6] membar_exit 221 0% 92% 0.00 4574 cpu[6] rootnex_dma_bindhdl 219 0% 92% 0.00 4609 cpu[6] ddi_dma_addr_bind_handle --------------------------------------------------------------------------------------------------------------- [i]Functions starting with cfg (eg:cfg_process_cmd_ring) are my drivers function. I tried and reduced the time spent waiting for lock and number of times mutex/lock was aquired. But I did not see any improvement in performance. The CPUs execute idle cycles in the saved time. Following is the snippet of lockstat output when I changed the locking/mutex structure in driver. [/i] Adaptive mutex spin: 25668 events in 10.079 seconds (2547 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- [b] 7985 31% 31% 0.00 1 0xffffffff8c8f3e78 cfg_process_cmd_ring+0x315[/b] 3534 14% 45% 0.00 1 0xffffffff8149ed40 tcp_send_data+0x111 [b] 2928 11% 56% 0.00 3 0xffffffff8c8f3e78 cfg_nic_xmit_frame+0x139 2162 8% 65% 0.00 4 0xffffffff8c8f3e78 cfg_nic_xmit_frame+0x353[/b] 2089 8% 73% 0.00 1 0xffffffff8149ed40 ipcl_classify_v4+0x142 2002 8% 81% 0.00 1 0xffffffff8149ed40 tcp_wput+0x4b 1301 5% 86% 0.00 1 0xffffffff8005a380 kmem_cache_free+0x5f 905 4% 89% 0.00 3 0xffffffff8149ed40 tcp_output+0x4e 734 3% 92% 0.00 2 0xffffffff8149ed40 squeue_drain+0x207 471 2% 94% 0.00 1 0xffffffff80186380 kmem_cache_free+0x5f Adaptive mutex block: 7175 events in 10.079 seconds (712 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 3080 43% 43% 0.00 7693 0xffffffff8c8f3e78 cfg_process_cmd_ring+0x315 1301 18% 61% 0.00 7463 0xffffffff8005a380 kmem_cache_free+0x5f 643 9% 70% 0.00 7384 0xffffffff8149ed40 ipcl_classify_v4+0x142 579 8% 78% 0.00 19541 0xffffffff8c8f3e78 cfg_nic_xmit_frame+0x139 471 7% 85% 0.00 7361 0xffffffff80186380 kmem_cache_free+0x5f 347 5% 89% 0.00 20567 0xffffffff8149ed40 tcp_wput+0x4b 281 4% 93% 0.00 19837 0xffffffff8c8f3e78 cfg_nic_xmit_frame+0x353 120 2% 95% 0.00 7388 0xffffffff815b8d80 squeue_enter_chain+0x40 Count indv cuml rcnt nsec CPU+PIL Caller ------------------------------------------------------------------------------- 10021 12% 12% 0.00 5231 cpu[1] cpu_halt 9741 12% 25% 0.00 30240 cpu[2] cpu_halt 9614 12% 37% 0.00 6954 cpu[0] cpu_halt 9487 12% 48% 0.00 5127 cpu[5] cpu_halt 9420 12% 60% 0.00 5849 cpu[3] cpu_halt 9093 11% 71% 0.00 5782 cpu[7] cpu_halt 9016 11% 83% 0.00 7544 cpu[4] cpu_halt [b] 2959 4% 86% 0.00 32134 cpu[6] cpu_halt[/b] 646 1% 87% 0.00 24924 cpu[6]+4 mutex_enter 579 1% 88% 0.00 36862 cpu[6]+6 ddi_get32 334 0% 88% 0.00 7451 cpu[4] kcopy 305 0% 89% 0.00 6886 cpu[7] kcopy 281 0% 89% 0.00 4619 cpu[6] kcopy 275 0% 89% 0.00 39864 cpu[6]+6 intr_thread 260 0% 90% 0.00 32437 cpu[6]+4 bcopy 246 0% 90% 0.00 56238 cpu[6]+4 atomic_add_64 232 0% 90% 0.00 45435 cpu[6]+4 cfg_process_cmd_ring 210 0% 91% 0.00 16289 cpu[6]+4 rootnex_dma_bindhdl ----------------------------------------------------------------------------------------------------------------------------------------- [i] [b]You can see that the CPU[6] is executing more cpu_halt instructioins because of the time saved in locking.[/b] How can I push all these CPUs to do work ? Any hints will be appreciated a lot. Thanks for your help. [/i] Best regards. Shreyas PS : lockstat numbers are taken while running an iperf TCP transmit. Commands used are : lockstat sleep 10 -l lockstat -kIi997 sleep 10 This message posted from opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org