Bob Friesenhahn wrote:

SPOT says that most time is spent executing libjpeg code (primarily
ycc_rgb_convert) and that there is quite a lot of application stall with
"LD/ST Unit Full" at a wopping 49.7%.  When using -lumem, the program
seems to spend 45% of the time waiting.  This is definitely not the case
for the rest of GraphicsMagick.

Hi,

That sounds rather like some kind of thrashing in the caches. One routine accumulating large amounts of load/store stall time.

I guess this is the source for that routine:
https://tahoe.ca.sandia.gov/public/VTK-doc/jdcolor_8c-source.php#l00120

If it is thrashing, then I'd have expected you to rarely see it. So I'm quite surprised that you have found an example.

I'd echo Johansen's suggestion to dtrace the size and offset of the memory locations to see if they do have an unfortunate pattern.

The other thing to try is data space profiling/dprofile. Essentially run collect from the performance analyzer with a + sign before the performance counter names. You will want to read up on this blog before trying it out:
http://blogs.sun.com/nk/

DSP tracks the effective address of memory operations, so it should identify whether particular cachelines are hot. You'll need to figure out the cache configuration of your machine, and enter that into your .er.rc file (yup, it's a bit tricky this analysis).

HTH,

Regards,

Darryl.

--
Darryl Gove
Compiler Performance Engineering
Blog : http://blogs.sun.com/d/
Books: http://www.sun.com/books/catalog/solaris_app_programming.xml
       http://my.safaribooksonline.com/0595352510
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to