I am toying around with a T1000 machine (T1 1GHz processor, 8 cores, 4-threads per core, 8GB RAM). I was unable to saturate a single Gigabit NIC with netperf, so I started investigating with the help of performance counters. It turns out that even a simple for loop that only increments a counter can do at most 250 million instructions per second (hardly any cache/TLB misses, as expected). From my understanding of the Niagara architecture, a single thread executing on a core should be able to fully utilise it (1 billion instructions per second in my case).
What am I missing? Thanks, Elad P.S., I am tracking performance with cputrack -c Instr_cnt,sys This message posted from opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org