Forgive me if this has been beaten into the ground, but my team and I couldn't 
find much conclusive study or posts on this issue.  To make a long story short: 
we're experiencing Xeons as 50% slower than Opterons, even when the Xeon has 
twice as much cache and a slight clock speed advantage.

The full story: we have an older production server with 2G of RAM, 2.4GHz 
Opterons w/ 1M of cache.  The database is not large, only around 7M or 8M rows 
altogether, 2.5G on disk.  Most queries are reads, probably on a 10:1 
proportion with writes.  In the process of upgrading this server to a pair of 
DRBD-mirrored (more on this below) servers we discovered that the new servers 
were actually slower than the older one.  The newer servers have 4G of RAM, 
3.0GHz Xeons with 2M of cache.  And not just a little slower, but queries 
(simple, complex, and disgusting recursive stored procedures) routinely run in 
50-100% more time than they did on the older server.  After many 
troubleshooting techniques (downgrading the kernel to that of the older 
machine, verifying version parity, copying the binary from the older server, 
building a 32bit binary on the new servers, running the entire database out of 
a ramdisk, and of course much tweaking of postgresql.conf) and seeing virtually 
no benefit from any of these tests I finally took the final leap: just pull the 
disks and throw them in a newer Opteron chassis (2.8GHz, 1M cache).  And 
whaddya know?  It's got a 20% speed edge on the older Opteron, and blows away 
the performance of the newer Xeons.

One of my guys did some testing and it appears that LWLockAquire and 
LWLockRelease are the culprits, but we're not entirely confident of our 
conclusion.  Any thoughts on why this might be so different between the two 
architectures?  We're a hosting provider so we've got some spare equipment to 
work with and I'm going to request that we keep these two boxes up for a week 
or so.  Are there any other tests that you guys can suggest that would help get 
down to the bottom of this?  I figure that not everyone has access to as much 
gear as we do so it might be a good opportunity to get some A/B testing on a 
production database on identical OS/server installs on different hardware.  I'm 
content to just say "Well, we use Opterons then!", but I imagine that if we 
could help bring equal performance to Xeon users that it would be worth the 
effort of volunteering.  To be clear, I have two machines sitting on the 
network ready for tweaking, one is a Xeon, the other is an Opteron, neither is 
in production and both can be fully mangled in the interest of figuring this 
out.

Speaking of being a hosting provider, I may as well take a moment to point out 
that we are working with DRBD for mirroring and have found it works beautifully 
with PG (MySQL as well).  Also, while our "Managed Database Service" product is 
geared around MySQL, Oracle, and MSSQL, we're pretty familiar with PG and would 
be happy to talk to anyone about hosting needs they may have.

Thanks for listening, and again please let me know if there is further testing 
we can do to help get to the bottom of this Opteron/Xeon performance 
discrepancy.

Bart Grantham
VP of R&D
Logicworks, Inc.
www.logicworks.net

Reply via email to