On Mon, 11 May 2009, johan...@sun.com wrote:
I'm not disagreeing that the user time between umem and mapmalloc is
very similar. However, page-fault time should be attributed as system
time. Chad Mynhier recently putback some improvements to ptime(1), that
show information about the microstate accounting data that each process
keeps. That information contains not only usr and sys, but trap, text
and data fault, and a bunch of other information. If you want to get
this data, try the experiments again using ptime(1) -m.
If it was not clear, this is with Solaris 10 at essentially "U6+"
patch levels on a Sun Ultra 40-M2.
Libumem allows you to choose which backend is used for the vmem layer by
setting backend=sbrk or backend=mmap in UMEM_OPTIONS in the environment,
provided that you're not using it in standalone mode. I'm not certain
what the difference is, but I assume that compiling -lumem isn't
considered standalone. I could be wrong about that.
That is interesting. The documentation says that libumem uses sbrk by
default so mmap was not the issue. I don't see much difference from
these options:
UMEM_OPTIONS='backend=sbrk':
real 0m21.001s
user 0m17.761s
sys 0m2.206s
UMEM_OPTIONS='backend=mmap'
real 0m20.987s
user 0m17.769s
sys 0m3.137s
Yes. I don't know what libjpeg itself does, but GraphicsMagick should
be performing a similar number of allocations (maybe 1000 small
allocations) regardless of the size of the JPEG file.
There are some known issues with small allocations on Solaris. I think
I included a bug-id in the e-mail pointer I sent you before.
Right. I don't think that many small allocations is the issue at all.
To verify, I used a small JPEG file (producing a similar number of
allocations) and see that the time is 0m0.301s rather than 0m21.001s.
The SPOT profiler does not show that memory allocators or any other OS
related functionality is a factor. Instead it shows that memory
access is "slow" as if one allocator provides access to memory which
is "faster" (i.e. "hotter") than the other. This is certainly
possible.
SPOT says that most time is spent executing libjpeg code (primarily
ycc_rgb_convert) and that there is quite a lot of application stall with
"LD/ST Unit Full" at a wopping 49.7%. When using -lumem, the program
seems to spend 45% of the time waiting. This is definitely not the case
for the rest of GraphicsMagick.
I'm not sure I completely understand. What happens the rest of the time
in GraphicsMagick?
What I mean is that CPU-bound algorithms in GraphicsMagick don't
normally show high stall times. The algorithms which show high stall
times are very fast ones which are barely more than memcpy().
Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org