Re: [perf-discuss] libmtmalloc vs libumem

2010-08-13 Thread Steve Sistare
libumem is a userland port of the kernel slab allocator (kmem_cache_alloc and friends), which is described by the following papers: Jeff Bonwick, The Slab Allocator: An Object-Caching Kernel Memory Allocator. Proceedings of the Summer 1994 Usenix Conference. Jeff Bonwick and Jonathan Ada

Re: [perf-discuss] large pages and fragmentation on x86

2009-12-10 Thread Steve Sistare
The following CR will reduce fragmentation and improve large page availability: 6535949 availability of 2M pages degrades over time on Solaris/x64 The fix is in progress, and has a chance of making Solaris 10 U9. As a workaround, you could experiment with the (unsupported) tunable colorequiv.

Re: [perf-discuss] thread_reaper can't keep up with massive creation of threads

2009-10-23 Thread Steve Sistare
Thanks for retesting. In hindsight my analysis has a fatal flaw. I used the reap deficit at the end of the run of 162468 threads to estimate hash table residency, but the residency should instead track the steady state rate of thread creation/destruction of approx 5000 threads (assuming that t

Re: [perf-discuss] thread_reaper can't keep up with massive creation of threads

2009-10-22 Thread Steve Sistare
Hi Thomas, The thread_reaper() rate of 2500 threads/second is limiting your throughput and sounds low. Why is it expensive? I have a theory. Here is the code, with subroutine calls: thread_reaper() for (;;) cv_wait(); thread_reap_list() for each thread

Re: [perf-discuss] Kernel usage

2009-10-20 Thread Steve Sistare
On 10/20/09 10:49, Matt V. wrote: I've tried your script lockutil and now the output: lockutil -n 44 lockstat_kWP.out CPU-util Lock Caller 0.191 kstat_chain_lock kstat_hold 0.036 kstat_chain_lock kstat_rele 0.014 mod_lock mod_hold_stub 0.00

Re: [perf-discuss] Kernel usage

2009-10-20 Thread Steve Sistare
See here for a script that post-processes lockstat output and quantifies the overhead from lock spinning: http://blogs.sun.com/sistare/entry/measuring_lock_spin_utilization I just ran it (using a guestimate for your CPU count), and the culprits clearly are the kstat related locks. Something on

Re: [perf-discuss] How to lower sys utlization and reduce xcalls?

2009-07-10 Thread Steve Sistare
This has the same signature as CR: 6694625 Performance falls off the cliff with large IO sizes http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6694625 which was raised in the network forum thread "expensive pullupmsg in kstrgetmsg()" http://www.opensolaris.org/jive/thread.jspa?

Re: [perf-discuss] poor application performance - very high cross-calls

2009-06-30 Thread Steve Sistare
Yes, temporarily moving to UFS would eliminate the xcalls. - Steve On 06/29/09 23:51, Matthew Flanagan wrote: [...] It appears that fwd is mmapping the files. So as a temporary measure would moving > the logs to UFS improve things? I'll be applying latest recommended patches but another po

Re: [perf-discuss] poor application performance - very high cross-calls

2009-06-29 Thread Steve Sistare
The high xcall rate consuming high %sys is likely the majority of the problem. You need the fix for: 6699438 zfs induces crosscall storm under heavy mapped sequential read This is the case that Phil recalled working on. It was recently fixed, and will be in S10U8 RR. If you need a patch earlier

Re: [perf-discuss] Idle time numbers for a T1 strand

2008-10-03 Thread Steve Sistare
On 10/03/08 13:20, Elad Lahav wrote: >> (In your first posting you said mpstat was used, not vmstat, so >> I will assume mpstat). > Oops, of course, I meant mpstat... > >> If mpstat shows idle time, then during the idle time, no thread >> is runnable from a high-level, traditional operating system

Re: [perf-discuss] Idle time numbers for a T1 strand

2008-10-03 Thread Steve Sistare
one on the core, will do much less when > sharing the core with other threads. Thus, you're left with mpstat to > tell you whether the thread is saturated. Only I'm not sure whether > mpstat is doing the right thing. > > --Elad > > Steve Sistare wrote: >

Re: [perf-discuss] Idle time numbers for a T1 strand

2008-10-02 Thread Steve Sistare
See Ravi Talashikar's blog for an explanation of CPU vs core utilization on CMT architectures such as the T1000: http://blogs.sun.com/travi/entry/ultrasparc_t1_utilization_explained - Steve On 09/30/08 15:18, Elad Lahav wrote: > I'm looking into the performance of a simple, single-threaded, TC

Re: [perf-discuss] Single-thread performance on Niagara

2008-05-15 Thread Steve Sistare
consumed by useful work performed by another thread. - Steve Sistare Elad Lahav wrote: > I am toying around with a T1000 machine (T1 1GHz processor, 8 cores, > 4-threads per core, 8GB RAM). I was unable to saturate a single Gigabit NIC > with netperf, so I started investigating with th

Re: [perf-discuss] file system cache / segmap tuning

2008-04-30 Thread Steve Sistare
Hi Nick, segmap memory consumption is included in the "Page cache" total of memstat. For UFS on Solaris/SPARC as of as of Solaris 10 3/05, segmap is no longer used for normal read and write operations. It is only used in a few oddball places relating to metadata, so you should be able to reduc

Re: [perf-discuss] interrupt shielding

2008-04-29 Thread Steve Sistare
"psradm -i" stops a CPU from handling device interrupts, but it may still take low level interrupts, such as those generated from CPU cross calls. For example, the latter are generated to force TLB shootdowns when a page is unmapped. Note that in the third mpstat period, CPU 1 shows 46 xcal's, an

Re: [perf-discuss] trapstat -T on T2 chips

2008-02-15 Thread Steve Sistare
Nicolas Michael wrote: > Steve Sistare wrote: >> Nicolas Michael wrote: >>> trapstat -T is not working properly: It only reports some tsb misses, >>> but the columns for tlb misses are all 0 (instruction and data). >> This is the expected result. The T

Re: [perf-discuss] trapstat -T on T2 chips

2008-02-15 Thread Steve Sistare
ntes ITLB_HWTW_miss_L2 and DTLB_HWTW_miss_L2. - Steve Sistare ___ perf-discuss mailing list perf-discuss@opensolaris.org

Re: [perf-discuss] large pages on amd64?

2007-09-06 Thread Steve Sistare
nfortunate side effect of reducing large page availability. One component of the VM2 project will modify page freelist management to preserve contiguous large pages when possible, but I don't have any details on schedule. - Steve Sistare Andrew Gallatin wrote On 09/06/07 09:37,: > Ar

Re: [perf-discuss] TLB miss on x64

2007-08-30 Thread Steve Sistare
and setup by software. See Chapter 12, Memory Management Unit, in the UltraSPARC T2 Programmers Reference Manual: http://opensparc-t2.sunsource.net/specs/UST2-UASuppl-current-draft-HP-EXT.pdf - Steve Sistare Rafael Vanoni wrote On 08/30/07 00:11,: > Hi everyone, > been studying Sola

Re: [perf-discuss] Changing Solaris kernel page size

2006-08-01 Thread Steve Sistare
have units of bytes. - Steve Sistare Raymond wrote On 08/01/06 09:33,: Hi, I'm running Solaris 10 update 2 on Ultra 10 that has 440MHz USIIi CPU. When I run 'trapstat -T' I notice that Solaris kernel is mainly using 8k pages and about 9% of the time is spent on handling TLB misse

Re: [perf-discuss] Re: TLB lifespan across context switches

2006-06-29 Thread Steve Sistare
d file needs to be larger than the large page size, and ideally should have a starting address that is large-page aligned. BTW, DISM will not help. It causes the kernel data structures storing translations to be shared amongst processes, but it does not enable processes to share the same translatio