Hi All,

Our team is doing some micro benchmarks on a multi CPU machine
and found the following hot locks from lockstat(1M) when using 32 threads.
The benchmark does an ioctl() on /dev/crypto in a while loop.

Adaptive mutex spin: 7123153 events in 30.966 seconds (230029 events/sec)

Count indv cuml rcnt     spin Lock                   Caller
-------------------------------------------------------------------------------
1045435  15%  15% 0.00        9 0x60006cb0100          releasef+0x24
1029413  14%  29% 0.00        9 0x60006cb0100          getf+0x38
...

Adaptive mutex block: 57757 events in 30.966 seconds (1865 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
...
8027  14%  28% 0.00    83275 0x60006cb0100          releasef+0x24
...
5723  10%  62% 0.00    83270 0x60006cb0100          getf+0x38

Another test with a simple stub driver and a little test code that uses
32 threads to perform ioctl operations on the device showed the same problem.

This lock is uf_lock of the device file descriptor on which the ioctl
is being done. This contention is surprising to me because the routine getf()
seems to be optimized for scaling.

Any ideas on what kind of solution is possible here? We are not able to scale
beyond 5X and this lock seems to be the main culprit.

Thanks,
-Krishna

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to