Hi All, Our team is doing some micro benchmarks on a multi CPU machine and found the following hot locks from lockstat(1M) when using 32 threads. The benchmark does an ioctl() on /dev/crypto in a while loop.
Adaptive mutex spin: 7123153 events in 30.966 seconds (230029 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- 1045435 15% 15% 0.00 9 0x60006cb0100 releasef+0x24 1029413 14% 29% 0.00 9 0x60006cb0100 getf+0x38 ... Adaptive mutex block: 57757 events in 30.966 seconds (1865 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- ... 8027 14% 28% 0.00 83275 0x60006cb0100 releasef+0x24 ... 5723 10% 62% 0.00 83270 0x60006cb0100 getf+0x38 Another test with a simple stub driver and a little test code that uses 32 threads to perform ioctl operations on the device showed the same problem. This lock is uf_lock of the device file descriptor on which the ioctl is being done. This contention is surprising to me because the routine getf() seems to be optimized for scaling. Any ideas on what kind of solution is possible here? We are not able to scale beyond 5X and this lock seems to be the main culprit. Thanks, -Krishna _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org