[i]Hello Eric,

lockstat shows following results :
[/i]


Adaptive mutex spin: 37048 events in 10.043 seconds (3689 events/sec)
Count indv cuml rcnt     spin Lock                   Caller
-------------------------------------------------------------------------------
[b] 9698  26%  26% 0.00        1 0xffffffff8cb3ce78     
cfg_process_cmd_ring+0x36
 6618  18%  44% 0.00        2 0xffffffff8cb3ce78     cfg_nic_xmit_frame+0x139
 4962  13%  57% 0.00        4 0xffffffff8cb3ce78     cfg_nic_xmit_frame+0x353
 4298  12%  69% 0.00        2 0xffffffff8cb3ce78     
cfg_process_cmd_ring+0x2e4[/b]
 2630   7%  76% 0.00        1 0xffffffff86662440     tcp_send_data+0x111
 1845   5%  81% 0.00        4 0xffffffff8cb2d2a8     put_tx_handle+0x20
 1618   4%  85% 0.00        1 0xffffffff86662440     ipcl_classify_v4+0x142
 1413   4%  89% 0.00        1 0xffffffff86662440     tcp_wput+0x4b
 1384   4%  93% 0.00        1 0xffffffff8005a380     kmem_cache_free+0x5f


Adaptive mutex block: 8706 events in 10.043 seconds (867 events/sec)
Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
[b] 4169  48%  48% 0.00     7381 0xffffffff8cb3ce78     
cfg_process_cmd_ring+0x36[/b]
 1384  16%  64% 0.00     7332 0xffffffff8005a380     kmem_cache_free+0x5f
[b]  850  10%  74% 0.00    17150 0xffffffff8cb3ce78     
cfg_nic_xmit_frame+0x139[/b]
  707   8%  82% 0.00     7299 0xffffffff86662440     ipcl_classify_v4+0x142
  483   6%  87% 0.00     7267 0xffffffff80186380     kmem_cache_free+0x5f
[b]  436   5%  92% 0.00    17671 0xffffffff8cb3ce78     
cfg_nic_xmit_frame+0x353[/b]
  258   3%  95% 0.00    19439 0xffffffff86662440     tcp_wput+0x4b
  144   2%  97% 0.00     7333 0xffffffff815b7f00     squeue_enter_chain+0x40
   52   1%  97% 0.00    19470 0xffffffff86662440     tcp_send_data+0x111


Profiling interrupt: 80120 events in 10.045 seconds (7976 events/sec)

Count indv cuml rcnt     nsec CPU+PIL                Caller
-------------------------------------------------------------------------------
10007  12%  12% 0.00     9618 cpu[2]                 cpu_halt
 9991  12%  25% 0.00     6432 cpu[4]                 cpu_halt
 9975  12%  37% 0.00     5259 cpu[3]                 cpu_halt
 9935  12%  50% 0.00     5389 cpu[7]                 cpu_halt
 9904  12%  62% 0.00     6184 cpu[0]                 cpu_halt
 9475  12%  86% 0.00     5435 cpu[1]                 cpu_halt
 1145   1%  87% 0.00     4647 cpu[6]                 kcopy
  649   1%  88% 0.00     9948 cpu[6]+6               ddi_get32
[b]  648   1%  89% 0.00    17172 cpu[6]                 cpu_halt[/b]
  548   1%  90% 0.00     4498 cpu[6]                 mutex_enter
  373   0%  90% 0.00    12130 cpu[6]+4               mutex_enter
  337   0%  91% 0.00    11240 cpu[6]+6               intr_thread
  322   0%  91% 0.00     4599 cpu[6]                 bcopy
  228   0%  91% 0.00     4628 cpu[6]                 bzero
  223   0%  92% 0.00     4542 cpu[6]                 membar_exit
  221   0%  92% 0.00     4574 cpu[6]                 rootnex_dma_bindhdl
  219   0%  92% 0.00     4609 cpu[6]                 ddi_dma_addr_bind_handle
---------------------------------------------------------------------------------------------------------------


[i]Functions starting with cfg (eg:cfg_process_cmd_ring) are my drivers 
function.

I tried and reduced the time spent waiting for lock and number of times 
mutex/lock was aquired. But I did not see any improvement in performance. The 
CPUs execute idle cycles in the saved time.

Following is the snippet of lockstat output when I changed the locking/mutex 
structure in driver.  [/i]


Adaptive mutex spin: 25668 events in 10.079 seconds (2547 events/sec)

Count indv cuml rcnt     spin Lock                   Caller
-------------------------------------------------------------------------------
[b] 7985  31%  31% 0.00        1 0xffffffff8c8f3e78     
cfg_process_cmd_ring+0x315[/b]
 3534  14%  45% 0.00        1 0xffffffff8149ed40     tcp_send_data+0x111
[b] 2928  11%  56% 0.00        3 0xffffffff8c8f3e78     cfg_nic_xmit_frame+0x139
 2162   8%  65% 0.00        4 0xffffffff8c8f3e78     
cfg_nic_xmit_frame+0x353[/b]
 2089   8%  73% 0.00        1 0xffffffff8149ed40     ipcl_classify_v4+0x142
 2002   8%  81% 0.00        1 0xffffffff8149ed40     tcp_wput+0x4b
 1301   5%  86% 0.00        1 0xffffffff8005a380     kmem_cache_free+0x5f
  905   4%  89% 0.00        3 0xffffffff8149ed40     tcp_output+0x4e
  734   3%  92% 0.00        2 0xffffffff8149ed40     squeue_drain+0x207
  471   2%  94% 0.00        1 0xffffffff80186380     kmem_cache_free+0x5f


Adaptive mutex block: 7175 events in 10.079 seconds (712 events/sec)
Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
 3080  43%  43% 0.00     7693 0xffffffff8c8f3e78     cfg_process_cmd_ring+0x315
 1301  18%  61% 0.00     7463 0xffffffff8005a380     kmem_cache_free+0x5f
  643   9%  70% 0.00     7384 0xffffffff8149ed40     ipcl_classify_v4+0x142
  579   8%  78% 0.00    19541 0xffffffff8c8f3e78     cfg_nic_xmit_frame+0x139
  471   7%  85% 0.00     7361 0xffffffff80186380     kmem_cache_free+0x5f
  347   5%  89% 0.00    20567 0xffffffff8149ed40     tcp_wput+0x4b
  281   4%  93% 0.00    19837 0xffffffff8c8f3e78     cfg_nic_xmit_frame+0x353
  120   2%  95% 0.00     7388 0xffffffff815b8d80     squeue_enter_chain+0x40

 
Count indv cuml rcnt     nsec CPU+PIL                Caller
-------------------------------------------------------------------------------
10021  12%  12% 0.00     5231 cpu[1]                 cpu_halt
 9741  12%  25% 0.00    30240 cpu[2]                 cpu_halt
 9614  12%  37% 0.00     6954 cpu[0]                 cpu_halt
 9487  12%  48% 0.00     5127 cpu[5]                 cpu_halt
 9420  12%  60% 0.00     5849 cpu[3]                 cpu_halt
 9093  11%  71% 0.00     5782 cpu[7]                 cpu_halt
 9016  11%  83% 0.00     7544 cpu[4]                 cpu_halt
[b] 2959   4%  86% 0.00    32134 cpu[6]                 cpu_halt[/b]
  646   1%  87% 0.00    24924 cpu[6]+4               mutex_enter
  579   1%  88% 0.00    36862 cpu[6]+6               ddi_get32
  334   0%  88% 0.00     7451 cpu[4]                 kcopy
  305   0%  89% 0.00     6886 cpu[7]                 kcopy
  281   0%  89% 0.00     4619 cpu[6]                 kcopy
  275   0%  89% 0.00    39864 cpu[6]+6               intr_thread
  260   0%  90% 0.00    32437 cpu[6]+4               bcopy
  246   0%  90% 0.00    56238 cpu[6]+4               atomic_add_64
  232   0%  90% 0.00    45435 cpu[6]+4               cfg_process_cmd_ring
  210   0%  91% 0.00    16289 cpu[6]+4               rootnex_dma_bindhdl
-----------------------------------------------------------------------------------------------------------------------------------------

[i]
[b]You can see that the CPU[6] is executing more cpu_halt instructioins because 
of the time saved in locking.[/b] 
How can I push all these CPUs to do work ?

Any hints will be appreciated a lot.

Thanks for your help.
[/i]

Best regards.
Shreyas

PS : lockstat numbers are taken while running an iperf TCP transmit.
Commands used are :

lockstat sleep 10 -l 
lockstat -kIi997 sleep 10
 
 
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to