> With my (COTS) LSI 1068 and 1078 based controllers I get consistently

> better performance when I export all disks as jbod (MegaCli - 
> CfgEachDskRaid0).
>
>   
>> Is that really 'all disks as JBOD'? or is it 'each disk as a single 
>> drive RAID0'?

single disk raid0:
./MegaCli -CfgEachDskRaid0 Direct -a0


>>It may not sound different on the surface, but I asked in another
thread 
>>and others confirmed, that if your RAID card has a battery backed
cache 
>>giving ZFS many single drive RAID0's is much better than JBOD (using
the 
>>'nocacheflush' option may even improve it more.)

>>My understanding is that it's kind of like the best of both worlds.
You 
>>get the higher number of spindles and vdevs for ZFS to manage, ZFS
gets 
>>to do the redundancy, and the the HW RAID Cache gives virtually
instant 
>>acknowledgement of writes, so that ZFS can be on it's way.

>>So I think many RAID0's is not always the same as JBOD. That's not to 
>>say that even True JBOD doesn't still have an advantage over HW RAID.
I 
>>don't know that for sure.

I have tried mixing hardware and zfs raid but it just doesn't make sense
to use from a performance or redundancy standpoint why we would add
those layers of complexity.  In this case I'm building nearline so there
isn't even a battery attached and I have disabled any caching on the
controller.  I have a SUN SAS HBA on the way which would be what I would
use ultimately for my JBOD attachment.


>>But I think there is a use for HW RAID in ZFS configs which wasn't 
>>always the theory I've heard.
> I have really learned not to do it this way with raidz and raidz2:
>
> #zpool create pool2 raidz c3t8d0 c3t9d0 c3t10d0 c3t11d0 c3t12d0  
> c3t13d0 c3t14d0 c3t15d0
>   
>>Why? I know creating raidz's with more than 9-12 devices, but that 
>>doesn't cross that threshold.
>>Is there a reason you'd split 8 disks up into 2 groups of 4? What 
>>experience led you to this?
>>(Just so I don't have to repeat it. ;) )

I don't know why but with most setups I have tested (8 and 16 drive
configs) dividing raid5 into 4 disks per vdev and 5 for a raidz2 perform
better.  Take a look at my simple dd test (filebench results as soon as
I can figure out how to get it working proper with SOL10).

=====

8 SATA 500gb disk system with LSI 1068 (megaRAID 8888ELP) - no BBU


---------
bash-3.00# zpool history
History for 'pool0-raidz':
2008-02-11.16:38:13 zpool create pool0-raidz raidz c2t0d0 c2t1d0 c2t2d0
c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t7d0

bash-3.00# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
pool0-raidz   117K  3.10T  42.6K  /pool0-raidz


bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo0 bs=8192
count=131072;time sync
131072+0 records in
131072+0 records out

real    0m1.768s
user    0m0.080s
sys     0m1.688s

real    0m3.495s
user    0m0.001s
sys     0m0.013s

bash-3.00# time dd if=/pool0-raidz/w-test.lo0
of=/pool0-raidz/rw-test.lo0 bs=8192; time sync
131072+0 records in
131072+0 records out

real    0m6.994s
user    0m0.097s
sys     0m2.827s

real    0m1.043s
user    0m0.001s
sys     0m0.013s

bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo1 bs=8192
count=655360;time sync
655360+0 records in
655360+0 records out

real    0m24.064s
user    0m0.402s
sys     0m8.974s

real    0m1.629s
user    0m0.001s
sys     0m0.013s

bash-3.00# time dd if=/pool0-raidz/w-test.lo1
of=/pool0-raidz/rw-test.lo1 bs=8192; time sync
655360+0 records in
655360+0 records out

real    0m40.542s
user    0m0.476s
sys     0m16.077s

real    0m0.617s
user    0m0.001s
sys     0m0.013s
bash-3.00# time dd if=/pool0-raidz/w-test.lo0 of=/dev/null bs=8192; time
sync
131072+0 records in
131072+0 records out

real    0m3.443s
user    0m0.084s
sys     0m1.327s

real    0m0.013s
user    0m0.001s
sys     0m0.013s

bash-3.00# time dd if=/pool0-raidz/w-test.lo1 of=/dev/null bs=8192; time
sync
655360+0 records in
655360+0 records out

real    0m15.972s
user    0m0.413s
sys     0m6.589s

real    0m0.013s
user    0m0.001s
sys     0m0.012s
-----------------------

bash-3.00# zpool history
History for 'pool0-raidz':
2008-02-11.17:02:16 zpool create pool0-raidz raidz c2t0d0 c2t1d0 c2t2d0
c2t3d0
2008-02-11.17:02:51 zpool add pool0-raidz raidz c2t4d0 c2t5d0 c2t6d0
c2t7d0

bash-3.00# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
pool0-raidz   110K  2.67T  36.7K  /pool0-raidz

bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo0 bs=8192
count=131072;time sync
131072+0 records in
131072+0 records out

real    0m1.835s
user    0m0.079s
sys     0m1.687s

real    0m2.521s
user    0m0.001s
sys     0m0.013s

bash-3.00# time dd if=/pool0-raidz/w-test.lo0
of=/pool0-raidz/rw-test.lo0 bs=8192; time sync
131072+0 records in
131072+0 records out

real    0m2.376s
user    0m0.084s
sys     0m2.291s

real    0m2.578s
user    0m0.001s
sys     0m0.013s

bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo1 bs=8192
count=655360;time sync
655360+0 records in
655360+0 records out

real    0m19.531s
user    0m0.404s
sys     0m8.731s

real    0m2.255s
user    0m0.001s
sys     0m0.013s

bash-3.00# time dd if=/pool0-raidz/w-test.lo1
of=/pool0-raidz/rw-test.lo1 bs=8192; time sync
655360+0 records in
655360+0 records out

real    0m34.698s
user    0m0.484s
sys     0m13.868s

real    0m0.741s
user    0m0.001s
sys     0m0.016s

bash-3.00# time dd if=/pool0-raidz/w-test.lo0 of=/dev/null bs=8192; time
sync
131072+0 records in
131072+0 records out

real    0m3.372s
user    0m0.088s
sys     0m1.209s

real    0m0.015s
user    0m0.001s
sys     0m0.012s

bash-3.00# time dd if=/pool0-raidz/w-test.lo1 of=/dev/null bs=8192; time
sync
655360+0 records in
655360+0 records out

real    0m15.863s
user    0m0.431s
sys     0m6.077s

real    0m0.013s
user    0m0.001s
sys     0m0.013s

===

The latter is my 4 disk split and as you can see, it performs pretty
good.  Maybe someone can help us understand why it appears this way?


-Andy
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to