I know that performance has been discussed often here, but I have just gone through some testing in preparation for deploying a large configuration (120 drives is a large configuration for me) and I wanted to share my results, both to share the results as well as to see if anyone sees anything wrong in either my methodology or results.
First the hardware and requirements. We have an M4000 production server and a T2000 test server. The storage resides in five J4400 dual attached to the T2000 (and soon to be connected to the M4000 as well). The drives are all 750 GB SATA disks. So we have 120 drives. The data is currently residing on other storage and will be migrated to the new storage as soon as we are happy with the configuration. There is about 20 TB or data today, and we need growth to at least 40 TB. We also need a small set of drives for testing. My plan is to use 80 to 100 drives for production and 20 drives for test. The I/O pattern is a small number of large sequential writes (to load the data) followed by lots of random reads and some random writes (5% sequential writes, 10% random writes, 85% random reads). The files are relatively small, as they are scans (TIFF) of documents, median size is 23KB. The data is divided into projects, each of which varies in size between almost nothing up to almost 50 million objects. We currently have multiple zpools (based on department) and multiple datasets in each (based on project). The plan for the new storage is to go with one zpool, and then have a dataset per department, and datasets within the departments for each project. Based on recommendations from our local Sun / Oracle staff, we are planning on using raidz2 for recoverability reasons over mirroring (to get a comparable level of fault tolerance with mirrors would require three-way mirrors, and that does not get us the capacity we need). I have been testing various raidz2 configurations to confirm the data I have found regarding performance vs. number of vdevs and size of raidz2 vdevs. I used 40 disks out of the 120 and used the same 40 disks (after culling out any that showed unusual asvc_t via iostat. I used filebench for the testing as it seemed to generate real differences based on zpool configuration (other tools I tried show no statistical difference between zpool configurations). See https://spreadsheets.google.com/pub?key=0AtReWsGW-SB1dFB1cmw0QWNNd0RkR1ZnN0JEb2RsLXc&hl=en&output=html for a summary of the results. The random read numbers agree with what is expected (performance scales linearly with the number of vdevs). The random write numbers also agree with the expected result, except for the 4 vdevs of 10 disk raidz2 which showed higher performance than expected. The sequential write performance actually was fairly consistent and even showed a slight improvement with fewer vdevs of more disks. Based on these results, and our capacity needs, I am planning to go with 5 disk raidz2 vdevs. Since we have five J4400, I am considering using one disk in each of the five arrays per vdev, so that a complete failure of a J4400 does not cause any loss of data. What is the general opinion of that approach and does anyone know how to map the MPxIO device name back to a physical drive ? Does anyone see any issues with either the results or the tentative plan for the new storage layout ? Thanks, in advance, for your feedback. P.S. Let me know if you want me to post the filebench workloads I used, they are the defaults with a few small tweeks (random workload ran 64 threads, for example). -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss