On Mon, 11 May 2009, johan...@sun.com wrote:

I'm not disagreeing that the user time between umem and mapmalloc is
very similar.  However, page-fault time should be attributed as system
time. Chad Mynhier recently putback some improvements to ptime(1), that
show information about the microstate accounting data that each process
keeps.  That information contains not only usr and sys, but trap, text
and data fault, and a bunch of other information.  If you want to get
this data, try the experiments again using ptime(1) -m.

If it was not clear, this is with Solaris 10 at essentially "U6+" patch levels on a Sun Ultra 40-M2.

Libumem allows you to choose which backend is used for the vmem layer by
setting backend=sbrk or backend=mmap in UMEM_OPTIONS in the environment,
provided that you're not using it in standalone mode.  I'm not certain
what the difference is, but I assume that compiling -lumem isn't
considered standalone.  I could be wrong about that.

That is interesting. The documentation says that libumem uses sbrk by default so mmap was not the issue. I don't see much difference from these options:

UMEM_OPTIONS='backend=sbrk':
real    0m21.001s
user    0m17.761s
sys     0m2.206s

UMEM_OPTIONS='backend=mmap'
real    0m20.987s
user    0m17.769s
sys     0m3.137s

Yes.  I don't know what libjpeg itself does, but GraphicsMagick should
be performing a similar number of allocations (maybe 1000 small
allocations) regardless of the size of the JPEG file.

There are some known issues with small allocations on Solaris.  I think
I included a bug-id in the e-mail pointer I sent you before.

Right. I don't think that many small allocations is the issue at all. To verify, I used a small JPEG file (producing a similar number of allocations) and see that the time is 0m0.301s rather than 0m21.001s. The SPOT profiler does not show that memory allocators or any other OS related functionality is a factor. Instead it shows that memory access is "slow" as if one allocator provides access to memory which is "faster" (i.e. "hotter") than the other. This is certainly possible.

SPOT says that most time is spent executing libjpeg code (primarily
ycc_rgb_convert) and that there is quite a lot of application stall with
"LD/ST Unit Full" at a wopping 49.7%.  When using -lumem, the program
seems to spend 45% of the time waiting.  This is definitely not the case
for the rest of GraphicsMagick.

I'm not sure I completely understand.  What happens the rest of the time
in GraphicsMagick?

What I mean is that CPU-bound algorithms in GraphicsMagick don't normally show high stall times. The algorithms which show high stall times are very fast ones which are barely more than memcpy().

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to