> With my (COTS) LSI 1068 and 1078 based controllers I get consistently > better performance when I export all disks as jbod (MegaCli - > CfgEachDskRaid0). > > >> Is that really 'all disks as JBOD'? or is it 'each disk as a single >> drive RAID0'?
single disk raid0: ./MegaCli -CfgEachDskRaid0 Direct -a0 >>It may not sound different on the surface, but I asked in another thread >>and others confirmed, that if your RAID card has a battery backed cache >>giving ZFS many single drive RAID0's is much better than JBOD (using the >>'nocacheflush' option may even improve it more.) >>My understanding is that it's kind of like the best of both worlds. You >>get the higher number of spindles and vdevs for ZFS to manage, ZFS gets >>to do the redundancy, and the the HW RAID Cache gives virtually instant >>acknowledgement of writes, so that ZFS can be on it's way. >>So I think many RAID0's is not always the same as JBOD. That's not to >>say that even True JBOD doesn't still have an advantage over HW RAID. I >>don't know that for sure. I have tried mixing hardware and zfs raid but it just doesn't make sense to use from a performance or redundancy standpoint why we would add those layers of complexity. In this case I'm building nearline so there isn't even a battery attached and I have disabled any caching on the controller. I have a SUN SAS HBA on the way which would be what I would use ultimately for my JBOD attachment. >>But I think there is a use for HW RAID in ZFS configs which wasn't >>always the theory I've heard. > I have really learned not to do it this way with raidz and raidz2: > > #zpool create pool2 raidz c3t8d0 c3t9d0 c3t10d0 c3t11d0 c3t12d0 > c3t13d0 c3t14d0 c3t15d0 > >>Why? I know creating raidz's with more than 9-12 devices, but that >>doesn't cross that threshold. >>Is there a reason you'd split 8 disks up into 2 groups of 4? What >>experience led you to this? >>(Just so I don't have to repeat it. ;) ) I don't know why but with most setups I have tested (8 and 16 drive configs) dividing raid5 into 4 disks per vdev and 5 for a raidz2 perform better. Take a look at my simple dd test (filebench results as soon as I can figure out how to get it working proper with SOL10). ===== 8 SATA 500gb disk system with LSI 1068 (megaRAID 8888ELP) - no BBU --------- bash-3.00# zpool history History for 'pool0-raidz': 2008-02-11.16:38:13 zpool create pool0-raidz raidz c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t7d0 bash-3.00# zfs list NAME USED AVAIL REFER MOUNTPOINT pool0-raidz 117K 3.10T 42.6K /pool0-raidz bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo0 bs=8192 count=131072;time sync 131072+0 records in 131072+0 records out real 0m1.768s user 0m0.080s sys 0m1.688s real 0m3.495s user 0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo0 of=/pool0-raidz/rw-test.lo0 bs=8192; time sync 131072+0 records in 131072+0 records out real 0m6.994s user 0m0.097s sys 0m2.827s real 0m1.043s user 0m0.001s sys 0m0.013s bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo1 bs=8192 count=655360;time sync 655360+0 records in 655360+0 records out real 0m24.064s user 0m0.402s sys 0m8.974s real 0m1.629s user 0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo1 of=/pool0-raidz/rw-test.lo1 bs=8192; time sync 655360+0 records in 655360+0 records out real 0m40.542s user 0m0.476s sys 0m16.077s real 0m0.617s user 0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo0 of=/dev/null bs=8192; time sync 131072+0 records in 131072+0 records out real 0m3.443s user 0m0.084s sys 0m1.327s real 0m0.013s user 0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo1 of=/dev/null bs=8192; time sync 655360+0 records in 655360+0 records out real 0m15.972s user 0m0.413s sys 0m6.589s real 0m0.013s user 0m0.001s sys 0m0.012s ----------------------- bash-3.00# zpool history History for 'pool0-raidz': 2008-02-11.17:02:16 zpool create pool0-raidz raidz c2t0d0 c2t1d0 c2t2d0 c2t3d0 2008-02-11.17:02:51 zpool add pool0-raidz raidz c2t4d0 c2t5d0 c2t6d0 c2t7d0 bash-3.00# zfs list NAME USED AVAIL REFER MOUNTPOINT pool0-raidz 110K 2.67T 36.7K /pool0-raidz bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo0 bs=8192 count=131072;time sync 131072+0 records in 131072+0 records out real 0m1.835s user 0m0.079s sys 0m1.687s real 0m2.521s user 0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo0 of=/pool0-raidz/rw-test.lo0 bs=8192; time sync 131072+0 records in 131072+0 records out real 0m2.376s user 0m0.084s sys 0m2.291s real 0m2.578s user 0m0.001s sys 0m0.013s bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo1 bs=8192 count=655360;time sync 655360+0 records in 655360+0 records out real 0m19.531s user 0m0.404s sys 0m8.731s real 0m2.255s user 0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo1 of=/pool0-raidz/rw-test.lo1 bs=8192; time sync 655360+0 records in 655360+0 records out real 0m34.698s user 0m0.484s sys 0m13.868s real 0m0.741s user 0m0.001s sys 0m0.016s bash-3.00# time dd if=/pool0-raidz/w-test.lo0 of=/dev/null bs=8192; time sync 131072+0 records in 131072+0 records out real 0m3.372s user 0m0.088s sys 0m1.209s real 0m0.015s user 0m0.001s sys 0m0.012s bash-3.00# time dd if=/pool0-raidz/w-test.lo1 of=/dev/null bs=8192; time sync 655360+0 records in 655360+0 records out real 0m15.863s user 0m0.431s sys 0m6.077s real 0m0.013s user 0m0.001s sys 0m0.013s === The latter is my 4 disk split and as you can see, it performs pretty good. Maybe someone can help us understand why it appears this way? -Andy _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss