Re: [perf-discuss] Performance of 32-bit vs 64-bit benchmark

Darryl Gove Fri, 30 Jan 2009 14:15:30 -0800


On 01/30/09 01:40 PM, Elad Lahav wrote:
>> I'm not sure that I follow your argument.  The T1000's architecture
>> favors workloads that have many parallel tasks that involve data
>> throughput.  The Xenon is going to have a better showing for straight
>> number-crunching work.  If your webserver benchmark is trying to measure
>> the throughput for a few clients that perform simple tasks, then I can
>> understand why you might expect the Xenon to do better.  However, if
>> you're trying to measure a workload that has many clients and measures
>> in ops/sec, the T1000 may well do better.
> 
> This is exactly why I created this benchmark. On the T1000 it uses 32 threads 
> to encrypt 
> the file in parallel - exactly the situation this machine is supposed to 
> shine in. From 
> the performance counters I can tell that the CPU is maxed, for a total of 
> close to 8 
> billion operations per second, so that there is almost no internal stalling.
> My point was that even in this situation the UltraSPARC T1 lags considerably 
> behind the 
> quad-core Xeon.
>


Hi,

I don't think that follows.

You've put together a compute intensive benchmark - so the constraint on 
that code is how many instructions per second you can execute.

It doesn't follow that it is analogous to a webserver. A webserver is 
not generally so compute intensive. If there's lot's of memory stall, 
then the T1/T2 will use that stall time to get useful work done on other 
threads, most other processors will have idle pipelines until the stall 
resolves. [A benchmark for this situation would be something like 
running multiple copies of the latency measurements in lmbench.]

This is a workload characterisation question. Just because platform A 
under performs platform B on workload C it does not follow that it will 
also under perform on workload D.

Looking at the specweb results - which are probably the benchmarks 
closest to the situation that you are trying to model - your expectation 
may not be that far out of line (I cannot see any results submitted 
using a Xeon E5440).

http://www.spec.org/web2005/results/web2005.html

Looking through the results, and trying to find stuff that's vaguely 
comparable:
Xeon E3360 => 18495
2x Xeon E5460 => 28127
T2000 => 16407 (T2000 = 2RU version of T1000)
T1000 => 10466
T5220 => 41847 (UltraSPARC T2 processor)

 From this I'd expect figures in the same ball park from the T1 and the 
Xeon for webserving.

The main point I'd make is that a compute intensive benchmark is not 
necessarily a good proxy for a webserver.

Regards,

Darryl.


> --Elad
> _______________________________________________
> perf-discuss mailing list
> perf-discuss@opensolaris.org

-- 
Darryl Gove
Compiler Performance Engineering
Blog: http://blogs.sun.com/d/
Book: http://www.sun.com/books/catalog/solaris_app_programming.xml
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] Performance of 32-bit vs 64-bit benchmark

Reply via email to