Re: [perf-discuss] perf rsa-sha256 / pkcs11_softtoken[_extra].so / libdkim

2009-06-22 Thread Krishna Yenduri
On 06/21/09 16:28, Jens Elkner wrote: ... http://iws.cs.uni-magdeburg.de/~elkner/dkim/dkimbench.txt First of all, great work collecting the data and doing the comparisons. Couple of things: 1. I assume you are using the OpenSSL libraries in /usr/sfw/lib. There are some known performance pr

Re: [perf-discuss] Thread scheduling behavior

2008-03-28 Thread Krishna Yenduri
Bart Smaalders wrote: > ... > Keep in mind the differences between lwps and kernel threads, esp. on > NUMA (MPO) platforms. Note that lgrp_choose isn't called for kernel > threads > That explains it then. Thanks. > What are you trying to do? > The kernel test models the behavior of

[perf-discuss] Thread scheduling behavior

2008-03-27 Thread Krishna Yenduri
Hi All, I have a user level benchmark that does for (i = 0; i < nthreads; i++) (void) thr_create(NULL, 0, testaes, (void *)0, THR_NEW_LWP, &tid); I found that running this benchmark with nthreads == ncpus schedules each thread to a separate CPU. The system is

Re: [perf-discuss] lock contention in ioctl() and scalability issues

2006-11-08 Thread Krishna Yenduri
Bart Smaalders wrote On 11/08/06 07:47,: ... A quick check with libmicro indicates Prakash's suggestion scales very nicely indeed, and will not change semantics at all. Just have each thread dup(2) its crypto fd and all the contention in getf will disappear. Yes. We also noticed the same re

Re: [perf-discuss] lock contention in ioctl() and scalability issues

2006-11-07 Thread Krishna Yenduri
Bart Smaalders wrote: Krishna Yenduri wrote: ... This lock is uf_lock of the device file descriptor on which the ioctl is being done. This contention is surprising to me because the routine getf() seems to be optimized for scaling. Any ideas on what kind of solution is possible here? We

[perf-discuss] lock contention in ioctl() and scalability issues

2006-11-06 Thread Krishna Yenduri
Hi All, Our team is doing some micro benchmarks on a multi CPU machine and found the following hot locks from lockstat(1M) when using 32 threads. The benchmark does an ioctl() on /dev/crypto in a while loop. Adaptive mutex spin: 7123153 events in 30.966 seconds (230029 events/sec) Count indv cu