On Thu, Apr 10, 2008 at 8:58 AM, Zeljko Vrba <[EMAIL PROTECTED]> wrote:
> The benchmark is compiled in 64-bit mode and executed on Solaris 10,
> dual-core AMD64 (1 socket with two cores) and 2GB of RAM. Now the
> results: for M=2^1 .. M=2^11 (2 .. 2048) threads, the running time
> (wall-clock time) is fairly constant around ~10 seconds.  Beyond
> this number, as M doubles, the running time also roughly doubles:
> (2^12 threads, 13s), (2^13 threads, 20s), (2^14 threads, 35s).
>
> Running iostat and vmstat in parallel confirms that no swapping
> occurs. 33% of time is reported to be spent in system, (with 9% of
> CPU time idle?!), with ~150k/sec systemcalls and ~120k/sec context
> switches.
>
> Can anybody offer some insight on why this sudden degradation in
> performance occurs?

I would guess that you are thrashing caches or MMU's.  I'm far from a
hardware guru at this level but consider the following graphic and
description.

http://www.chip-architect.com/news/Opteron_1600x1200.jpg
http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html

In the "Opteron's Data Cache & Load/Store Units" notice, for instance,
that "Dual L1 Tags" has 2x1024 (presumably = 2048) entries.  That
seems suspiciously close to the knee in your observed performance.

Does x86 have the detailed hardware counters that are available on
sparc?  If so, cputrack(1) may be able to help out.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to