Re: [zfs-discuss] rethinking RaidZ and Record size [SEC=UNCLASSIFIED]

2010-01-07 Thread Richard Elling
On Jan 6, 2010, at 11:09 PM, Wilkinson, Alex wrote: 0n Wed, Jan 06, 2010 at 11:00:49PM -0800, Richard Elling wrote: On Jan 6, 2010, at 10:39 PM, Wilkinson, Alex wrote: 0n Wed, Jan 06, 2010 at 02:22:19PM -0800, Richard Elling wrote: Rather, ZFS works very nicely with "hardware RAID" sys

Re: [zfs-discuss] rethinking RaidZ and Record size [SEC=UNCLASSIFIED]

2010-01-06 Thread Wilkinson, Alex
0n Wed, Jan 06, 2010 at 11:00:49PM -0800, Richard Elling wrote: >On Jan 6, 2010, at 10:39 PM, Wilkinson, Alex wrote: >> >>0n Wed, Jan 06, 2010 at 02:22:19PM -0800, Richard Elling wrote: >> >>> Rather, ZFS works very nicely with "hardware RAID" systems or JBODs >>>

Re: [zfs-discuss] rethinking RaidZ and Record size [SEC=UNCLASSIFIED]

2010-01-06 Thread Richard Elling
On Jan 6, 2010, at 10:39 PM, Wilkinson, Alex wrote: 0n Wed, Jan 06, 2010 at 02:22:19PM -0800, Richard Elling wrote: Rather, ZFS works very nicely with "hardware RAID" systems or JBODs iSCSI, et.al. You can happily add the Im not sure how ZFS works very nicely with say for example an EM

Re: [zfs-discuss] rethinking RaidZ and Record size [SEC=UNCLASSIFIED]

2010-01-06 Thread Wilkinson, Alex
0n Wed, Jan 06, 2010 at 02:22:19PM -0800, Richard Elling wrote: >Rather, ZFS works very nicely with "hardware RAID" systems or JBODs >iSCSI, et.al. You can happily add the Im not sure how ZFS works very nicely with say for example an EMC Cx310 array ? -Alex IMPORTANT: This ema

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-06 Thread Ross Walker
On Wed, Jan 6, 2010 at 4:30 PM, Wes Felter wrote: > Michael Herf wrote: > >> I agree that RAID-DP is much more scalable for reads than RAIDZx, and >> this basically turns into a cost concern at scale. >> >> The raw cost/GB for ZFS is much lower, so even a 3-way mirror could be >> used instead of n

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-06 Thread Richard Elling
On Jan 6, 2010, at 1:30 PM, Wes Felter wrote: Michael Herf wrote: I agree that RAID-DP is much more scalable for reads than RAIDZx, and this basically turns into a cost concern at scale. The raw cost/GB for ZFS is much lower, so even a 3-way mirror could be used instead of netapp. But this

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-06 Thread Wes Felter
Michael Herf wrote: I agree that RAID-DP is much more scalable for reads than RAIDZx, and this basically turns into a cost concern at scale. The raw cost/GB for ZFS is much lower, so even a 3-way mirror could be used instead of netapp. But this certainly reduces the cost advantage significantly

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread David Magda
On Jan 5, 2010, at 16:06, Bob Friesenhahn wrote: Perhaps inovative designers like Suncast will figure out how to build reliable SSDs based on parts which are more likely to wear out and forget. At which point we'll probably start seeing the memristor start making an appearance in various

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Robert Milkowski
On 05/01/2010 23:31, Michael Herf wrote: The raw cost/GB for ZFS is much lower, so even a 3-way mirror could be used instead of netapp. But this certainly reduces the cost advantage significantly. This is true to some extent. I didn't want to bring it up as I wanted to focus only on techni

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Michael Herf
Many large-scale photo hosts start with netapp as the default "good enough" way to handle multiple-TB storage. With a 1-5% cache on top, the workload is truly random-read over many TBs. But these workloads almost assume a frontend cache to take care of hot traffic, so L2ARC is just a nice implement

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Robert Milkowski
On 05/01/2010 20:19, Richard Elling wrote: [...] Fortunately, most workloads are not of that size and scope. Forgot to mention it in my last email - yes, I agree. The environment I'm talking about is rather unusual and in most other cases where RAID-5/6 was considered the performance of RAID

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Robert Milkowski
On 05/01/2010 20:19, Richard Elling wrote: On Jan 5, 2010, at 11:30 AM, Robert Milkowski wrote: On 05/01/2010 18:49, Richard Elling wrote: On Jan 5, 2010, at 8:49 AM, Robert Milkowski wrote: The problem is that while RAID-Z is really good for some workloads it is really bad for others. Somet

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Tristan Ball
On 6/01/2010 7:19 AM, Richard Elling wrote: If you are doing small, random reads on dozens of TB of data, then you've got a much bigger problem on your hands... kinda like counting grains of sand on the beach during low tide :-). Hopefully, you do not have to randomly update that data becaus

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Bob Friesenhahn
On Tue, 5 Jan 2010, Richard Elling wrote: Since there are already 1 TB SSDs on the market, the only thing keeping the HDD market alive is the low $/TB. Moore's Law predicts that cost advantage will pass. SSDs are already the low $/IOPS winners. SSD vendors are still working to stabilize thei

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Richard Elling
On Jan 5, 2010, at 11:56 AM, Tristan Ball wrote: On 6/01/2010 3:00 AM, Roch wrote: That said, I truly am for a evolution for random read workloads. Raid-Z on 4K sectors is quite appealing. It means that small objects become nearly mirrored with good random read performance while large objects ar

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Richard Elling
On Jan 5, 2010, at 11:30 AM, Robert Milkowski wrote: On 05/01/2010 18:49, Richard Elling wrote: On Jan 5, 2010, at 8:49 AM, Robert Milkowski wrote: The problem is that while RAID-Z is really good for some workloads it is really bad for others. Sometimes having L2ARC might effectively mitigat

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Tristan Ball
On 6/01/2010 3:00 AM, Roch wrote: Richard Elling writes: > On Jan 3, 2010, at 11:27 PM, matthew patton wrote: > > > I find it baffling that RaidZ(2,3) was designed to split a record- > > size block into N (N=# of member devices) pieces and send the > > uselessly tiny requests to

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Robert Milkowski
On 05/01/2010 18:49, Richard Elling wrote: On Jan 5, 2010, at 8:49 AM, Robert Milkowski wrote: The problem is that while RAID-Z is really good for some workloads it is really bad for others. Sometimes having L2ARC might effectively mitigate the problem but for some workloads it won't (due to

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Robert Milkowski
On 05/01/2010 18:37, Roch Bourbonnais wrote: Writes are not the problem and we have log device to offload them. It's really about maintaining integrity of raid-5 type layout in the presence of bit-rot even if such bit-rot occur within free space. How is it addressed in RAID-DP? -- Robert

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Richard Elling
On Jan 5, 2010, at 8:49 AM, Robert Milkowski wrote: On 05/01/2010 16:00, Roch wrote: That said, I truly am for a evolution for random read workloads. Raid-Z on 4K sectors is quite appealing. It means that small objects become nearly mirrored with good random read performance while large objects

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Roch Bourbonnais
Le 5 janv. 10 à 17:49, Robert Milkowski a écrit : On 05/01/2010 16:00, Roch wrote: That said, I truly am for a evolution for random read workloads. Raid-Z on 4K sectors is quite appealing. It means that small objects become nearly mirrored with good random read performance while large objects

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread A Darren Dunham
On Tue, Jan 05, 2010 at 04:49:00PM +, Robert Milkowski wrote: > A possible *workaround* is to use SVM to set-up RAID-5 and create a > zfs pool on top of it. > How does SVM handle R5 write hole? IIRC SVM doesn't offer RAID-6. As far as I know, it does not address it. It's possible that adding

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Robert Milkowski
On 05/01/2010 16:00, Roch wrote: That said, I truly am for a evolution for random read workloads. Raid-Z on 4K sectors is quite appealing. It means that small objects become nearly mirrored with good random read performance while large objects are stored efficiently. Have you got any bench

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-05 Thread Roch
Richard Elling writes: > On Jan 3, 2010, at 11:27 PM, matthew patton wrote: > > > I find it baffling that RaidZ(2,3) was designed to split a record- > > size block into N (N=# of member devices) pieces and send the > > uselessly tiny requests to spinning rust when we know the massive >

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-04 Thread Richard Elling
On Jan 3, 2010, at 11:27 PM, matthew patton wrote: I find it baffling that RaidZ(2,3) was designed to split a record- size block into N (N=# of member devices) pieces and send the uselessly tiny requests to spinning rust when we know the massive delays entailed in head seeks and rotational d

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-04 Thread matthew patton
Chris Siebenmann wrote: > People have already mentioned the RAID-[56] write hole, > but it's more > than that; in a never-overwrite system with multiple blocks > in one RAID > stripe, how do you handle updates to some of the blocks? > > See: >     http://utcc.utoronto.ca/~cks/space/blog/solar

Re: [zfs-discuss] rethinking RaidZ and Record size

2010-01-04 Thread Ross Walker
On Mon, Jan 4, 2010 at 2:27 AM, matthew patton wrote: > I find it baffling that RaidZ(2,3) was designed to split a record-size block > into N (N=# of member devices) pieces and send the uselessly tiny requests to > spinning rust when we know the massive delays entailed in head seeks and > rotat

[zfs-discuss] rethinking RaidZ and Record size

2010-01-03 Thread matthew patton
I find it baffling that RaidZ(2,3) was designed to split a record-size block into N (N=# of member devices) pieces and send the uselessly tiny requests to spinning rust when we know the massive delays entailed in head seeks and rotational delay. The ZFS-mirror and load-balanced configuration do