Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

matthias platzer Tue, 02 Oct 2007 03:31:10 -0700

hello,

i saw this thread a bit late, but I had /am having the exact same issueson a dual-2-core-cpu opteron box with a 9550SX. (Centos 5 x86_64)What I did to work around them was basically switching to XFS foreverything except / (3ware say their cards are fast, but only on XFS)AND using very low nr_requests for every blockdev on the 3ware card.(like 32 or 64). That will limit the iowait times for the cpus and makethe 3ware-drives respond faster (see yourself with iostat -x -m 1 whilebenchmarking).If you can, you could also try _not_ putting the system disks on the3ware card, because additionally the 3ware driver/card gives writespriority. People suggested the unresponsive system behaviour is becausethe cpu hanging in iowait for writing and then reading the systembinaries won't happen till the writes are done, so the binaries shouldbe on another io path.

All this seem to be symptoms of a very complex issue consisting ofkernel bugs/bad drivers/... and they seem to be worst on a AMD/3wareCombination.

here is another link:
http://bugzilla.kernel.org/show_bug.cgi?id=7372

regards,
matthias

Simon Banton schrieb:

Dear list,
I thought I'd just share my experiences with this 3Ware card, and see ifanyone might have any suggestions.
System: Supermicro H8DA8 with 2 x Opteron 250 2.4GHz and 4GB RAMinstalled. 9550SX-8LP hosting 4x Seagate ST3250820SV 250GB in a RAID 1plus 2 hot spare config. The array is properly initialized, write cacheis on, as is queueing (and supported by the drives). StoreSave set toProtection.
OS is CentOS 4.5 i386, minimal install, default partitioning assuggested by the installer (ext3, small /boot on /dev/sda1, remainder as/ on LVM VolGroup with 2GB swap).
Firmware from 3Ware codeset 9.4.1.2 in use, firmware/driver details:
//serv1> /c0 show all
/c0 Driver Version = 2.26.05.007
/c0 Model = 9550SX-8LP
/c0 Memory Installed  = 112MB
/c0 Firmware Version = FE9X 3.08.02.005
/c0 Bios Version = BE9X 3.08.00.002
/c0 Monitor Version = BL9X 3.01.00.006
I initially noticed something odd while installing 4.4, that writing theinode tables took a longer time than I expected (I thought the installerhad frozen) and the system overall felt sluggish when doing its firstyum update, certainly more sluggish than I'd expect with a comparativelypowerful machine and hardware RAID 1.
I tried a few simple benchmarks (bonnie++, iozone, dd) and noticed up to8 pdflush commands hanging about in uninterruptible sleep when writingto disk, along with kjournald and kswapd from time to time. Loadaveduring writing climbed considerably (up to >12) with 'ls' taking up to30 seconds to give any output. I've tried CentOS 4.4, 4.5, RHEL AS 4update 5 (just in case) and openSUSE 10.2 and they all show the samesymptoms.
Googling around makes me think that this may be related to queue depth,nr_requests and possibly VM params (the latter fromhttps://bugzilla.redhat.com/show_bug.cgi?id=121434#c275). These are thedefault settings:
/sys/block/sda/device/queue_depth = 254
/sys/block/sda/queue/nr_requests = 8192
/proc/sys/vm/dirty_expire_centisecs = 3000
/proc/sys/vm/dirty_ratio = 30
3Ware mentions elevator=deadline, blockdev --setra 16384 along withnr_requests=512 in their performance tuning doc - these alone seem tomake no difference to the latency problem.
Setting dirty_expire_centisecs = 1000 and dirty_ratio = 5 does indeedreduce the number of processes in 'b' state as reported by vmstat 1during an iozone benchmark (./iozone -s 20480m -r 64 -i 0 -i 1 -t 1 -bfilename.xls as per 3Ware's own tuning doc) but the problem is obviouslystill there, just mitigated somewhat. The comparison graphs are in a PDFhere:http://community.novacaster.com/attach.pl/7411/482/iozone_vm_tweaks_xls.pdfIncidentally, the vmstat 1 output was directed to an NFS-mounted disk toavoid writing it to the arry during the actual testing.
I've tried eliminating LVM from the equation, going to ext2 rather thanext3 and booting single-processor all to no useful effect. I've alsotried benchmarking with different blocksizes from 512B to 1M in powersof 2 and the problem remains - many processes in uninterruptible sleepblocking other IO. I'm about to start downloading CentOS 5 to give it ago, and after that I might have to resort to seeing if WinXP has thesame issue.
My only real question is "where do I go from here?" I don't have enoughspecific tuning knowledge to know what else to look at.
Thanks for any pointers.

Simon
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

Reply via email to