Harry M. Greenblatt schrieb:
BS"DTo those hardware oriented:We have a compute cluster with 23 nodes (dual socket, dual core Intel servers). Users run simulation jobs on the nodes from the head node. At the end of each simulation, a result file is compressed to 2GB, and copied to the file server for the cluster (not the head node) via NFS. Each node is connected via a Gigabit line to a switch. The file server has a 4-link aggregated Ethernet trunk (4Gb/S) to the switch. The file server also has two sockets, with Dual Core Xeon 2.1GHz CPU's and 4 GB of memory, running RH4. There are two raid arrays (RAID 5), each consisting of 8x500GB SATA II WD server drives, with one file system on each. The raid cards are AMCC 3WARE 9550 and 9650SE (PCI-Express) with 256 MB of cache memory . When several (~10) jobs finish at once, and the nodes start copying the compressed file to the file server, the load on the file server gets very high (~10), and the users whose home directory are on the file server cannot work at their stations. Using nmon to locate the bottleneck, it appears that disk I/O is the problem. But the numbers being reported are a bit strange. It reports a throughput of only about 50MB/s, and claims the "disk" is 100% busy. These raid cards should give throughput in the several hundred MB/s range, especially the 9650 which is rated at 600MB/s RAID 6 write (and we have RAID 5).1) Is there a more friendly system load monitoring tool we can use?2) The users may be able to stagger the output schedule of their jobs, but based on the numbers, we get the feeling the RAID arrays are not performing as they should. Any suggestions?Thanks Harry ------------------------------------------------------------------------- Harry M. Greenblatt Staff ScientistDept of Structural Biology [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>Weizmann Institute of Science Phone: 972-8-934-3625 Rehovot, 76100 Facsimile: 972-8-934-4159Israel
Harry,to my understanding, the WRITE performance of RAID5 is no more than what a _single_ disk gives (essentially because almost the _same_ data have to be written to _all_ disks at the same time). This is different from the READ situation - here RAID5 should give (maybe much) more than a single disk.
Thus I don't find it surprising that your RAID5 write operation has "only" 50 MB/s. If you need more, you should use RAID0, or RAID10 (twice the number of disks compared to RAID0).
HTH, Kay -- Kay Diederichs http://strucbio.biologie.uni-konstanz.de email: [EMAIL PROTECTED] Tel +49 7531 88 4049 Fax 3183 Fachbereich Biologie, Universität Konstanz, Box M647, D-78457 Konstanz
smime.p7s
Description: S/MIME Cryptographic Signature