libumem is a userland port of the kernel slab allocator
(kmem_cache_alloc and friends), which is described by the
following papers:
Jeff Bonwick,
The Slab Allocator: An Object-Caching Kernel Memory Allocator.
Proceedings of the Summer 1994 Usenix Conference.
Jeff Bonwick and Jonathan Ada
The following CR will reduce fragmentation and improve large
page availability:
6535949 availability of 2M pages degrades over time on Solaris/x64
The fix is in progress, and has a chance of making Solaris 10 U9.
As a workaround, you could experiment with the (unsupported) tunable
colorequiv.
Thanks for retesting. In hindsight my analysis has a fatal flaw. I used the
reap deficit at the end of the run of 162468 threads to estimate hash table
residency, but the residency should instead track the steady state rate of
thread creation/destruction of approx 5000 threads (assuming that t
Hi Thomas,
The thread_reaper() rate of 2500 threads/second is limiting your
throughput and sounds low. Why is it expensive? I have a theory.
Here is the code, with subroutine calls:
thread_reaper()
for (;;)
cv_wait();
thread_reap_list()
for each thread
On 10/20/09 10:49, Matt V. wrote:
I've tried your script lockutil and now the output:
lockutil -n 44 lockstat_kWP.out
CPU-util Lock Caller
0.191 kstat_chain_lock kstat_hold
0.036 kstat_chain_lock kstat_rele
0.014 mod_lock mod_hold_stub
0.00
See here for a script that post-processes lockstat output and quantifies
the overhead from lock spinning:
http://blogs.sun.com/sistare/entry/measuring_lock_spin_utilization
I just ran it (using a guestimate for your CPU count), and the culprits
clearly are the kstat related locks. Something on
This has the same signature as CR:
6694625 Performance falls off the cliff with large IO sizes
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6694625
which was raised in the network forum thread "expensive pullupmsg in
kstrgetmsg()"
http://www.opensolaris.org/jive/thread.jspa?
Yes, temporarily moving to UFS would eliminate the xcalls.
- Steve
On 06/29/09 23:51, Matthew Flanagan wrote:
[...]
It appears that fwd is mmapping the files. So as a temporary measure would moving
> the logs to UFS improve things?
I'll be applying latest recommended patches but another po
The high xcall rate consuming high %sys is likely the majority of the problem.
You need the fix for:
6699438 zfs induces crosscall storm under heavy mapped sequential read
This is the case that Phil recalled working on.
It was recently fixed, and will be in S10U8 RR. If you need a patch
earlier
On 10/03/08 13:20, Elad Lahav wrote:
>> (In your first posting you said mpstat was used, not vmstat, so
>> I will assume mpstat).
> Oops, of course, I meant mpstat...
>
>> If mpstat shows idle time, then during the idle time, no thread
>> is runnable from a high-level, traditional operating system
one on the core, will do much less when
> sharing the core with other threads. Thus, you're left with mpstat to
> tell you whether the thread is saturated. Only I'm not sure whether
> mpstat is doing the right thing.
>
> --Elad
>
> Steve Sistare wrote:
>
See Ravi Talashikar's blog for an explanation of CPU vs core
utilization on CMT architectures such as the T1000:
http://blogs.sun.com/travi/entry/ultrasparc_t1_utilization_explained
- Steve
On 09/30/08 15:18, Elad Lahav wrote:
> I'm looking into the performance of a simple, single-threaded, TC
consumed by useful work performed by another thread.
- Steve Sistare
Elad Lahav wrote:
> I am toying around with a T1000 machine (T1 1GHz processor, 8 cores,
> 4-threads per core, 8GB RAM). I was unable to saturate a single Gigabit NIC
> with netperf, so I started investigating with th
Hi Nick,
segmap memory consumption is included in the "Page cache" total
of memstat. For UFS on Solaris/SPARC as of as of Solaris 10 3/05,
segmap is no longer used for normal read and write operations. It is
only used in a few oddball places relating to metadata, so you should
be able to reduc
"psradm -i" stops a CPU from handling device interrupts, but it may
still take low level interrupts, such as those generated from
CPU cross calls. For example, the latter are generated to force
TLB shootdowns when a page is unmapped. Note that in the third
mpstat period, CPU 1 shows 46 xcal's, an
Nicolas Michael wrote:
> Steve Sistare wrote:
>> Nicolas Michael wrote:
>>> trapstat -T is not working properly: It only reports some tsb misses,
>>> but the columns for tlb misses are all 0 (instruction and data).
>> This is the expected result. The T
ntes ITLB_HWTW_miss_L2 and DTLB_HWTW_miss_L2.
- Steve Sistare
___
perf-discuss mailing list
perf-discuss@opensolaris.org
nfortunate side effect of reducing large page availability.
One component of the VM2 project will modify page freelist
management to preserve contiguous large pages when possible, but
I don't have any details on schedule.
- Steve Sistare
Andrew Gallatin wrote On 09/06/07 09:37,:
> Ar
and setup by software. See Chapter
12, Memory Management Unit, in the UltraSPARC T2 Programmers
Reference Manual:
http://opensparc-t2.sunsource.net/specs/UST2-UASuppl-current-draft-HP-EXT.pdf
- Steve Sistare
Rafael Vanoni wrote On 08/30/07 00:11,:
> Hi everyone,
> been studying Sola
have units of bytes.
- Steve Sistare
Raymond wrote On 08/01/06 09:33,:
Hi, I'm running Solaris 10 update 2 on Ultra 10 that has 440MHz USIIi CPU. When
I run 'trapstat -T' I notice that Solaris kernel is mainly using 8k pages and
about 9% of the time is spent on handling TLB misse
d file needs to be larger than the large page size,
and ideally should have a starting address that is large-page aligned.
BTW, DISM will not help. It causes the kernel data structures storing
translations to be shared amongst processes, but it does not enable processes
to share the same translatio
21 matches
Mail list logo