From the pstack info, the thread is blocking often in the kernel. It is possible that the treads are blocking more often in the kernel on this Ultra IV system
running Solaris 9.

One thing you can try is increase the adaptive spin count for the mutex locks
and see if it helps. The default count is 1000, increase it to 10000.

To do that, set the environment variable as below and start the application.

LIBTHREAD_ADAPTIVE_SPIN=10000

-Prakash.

[EMAIL PROTECTED] wrote:
Konstantin:

This is single static RW lock which protects array of pointers to
data structures. this array slowly growing to size depended
of specific installation. growing with pretty big size increase at once.
say, on W this RW is locked once in a hour. a lot of threads,
which consume static info from this array (static in a sense
that its only added, not modified) R lock it to ensure that they
are not catching moment of reallocation.

If I understand you correctly, you're saying that you encounter this
problem even before the size of the data-structure has grown?

Is the size of the array fixed?  How do you grow the array without
removing the previously allocated entries?

false sharing can be an explanation.

T2k can be ok cause only 8 threads can be executed at once,
not 32 as in problematic case. or alternatively, maybe the fact
that all T2k are sol10 boxes, explains why they behave ok.
false sharing can be an explanation.

afaiu if its false sharing, amd boxes should also suffer from it?

If it's false sharing, then yes, every box will suffer.  I was simply
observing that the effects might not be as noticable on a T2000.

However, it's also possible that because of the longer cache-to-cache
transfer times on an UltraSPARC-IV the cache line is bouncing back and
forth between CPUs just as part of normal usage of the lock.

You've said that the grow operation is very infrequent.  If this is the
case, I don't think it makes a lot of sense to use a RW lock in this
situation.  There's inherently more overhead in this approach, since a
RW lock is more complicated than a mutex.  You're requiring almost all
threads that access these structures to pay an additional overhead for a
very infrequent case.  It's generally better to optimize for the common
case.  Unfortunately, I can't really offer any more concrete advice
without knowing more about these data structures and how they're
accessed.

-j
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to