Re: [zfs-discuss] Re: Re[2]: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Anton Rang
On Jan 4, 2007, at 10:26 AM, Roch - PAE wrote: All filesystems will incur a read-modify-write when application is updating portion of a block. For most Solaris file systems it is the page size, rather than the block size, that affects read-modify-write; hence 8K (SPARC) or 4K (x86

Re: [zfs-discuss] Re: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Anton Rang
On Jan 4, 2007, at 3:25 AM, [EMAIL PROTECTED] wrote: Is there some reason why a small read on a raidz2 is not statistically very likely to require I/O on only one device? Assuming a non-degraded pool of course. ZFS stores its checksums for RAIDZ/RAIDZ2 in such a way that all disks must b

Re: [zfs-discuss] Re: ZFS and SE 3511

2006-12-19 Thread Anton Rang
On Dec 19, 2006, at 7:14 AM, Mike Seda wrote: Anton B. Rang wrote: I have a Sun SE 3511 array with 5 x 500 GB SATA-I disks in a RAID 5. This 2 TB logical drive is partitioned into 10 x 200GB slices. I gave 4 of these slices to a Solaris 10 U2 machine and added each of them to a concat (non

Re: [zfs-discuss] Re: Self-tuning recordsize

2006-10-17 Thread Anton Rang
On Oct 17, 2006, at 12:43 PM, Matthew Ahrens wrote: Jeremy Teo wrote: Heya Anton, On 10/17/06, Anton B. Rang <[EMAIL PROTECTED]> wrote: No, the reason to try to match recordsize to the write size is so that a small write does not turn into a large read + a large write. In configurations wh

Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

2006-09-09 Thread Anton Rang
On Sep 9, 2006, at 1:32 AM, Frank Cusack wrote: On September 7, 2006 12:25:47 PM -0700 "Anton B. Rang" <[EMAIL PROTECTED]> wrote: The bigger problem with system utilization for software RAID is the cache, not the CPU cycles proper. Simply preparing to write 1 MB of data will flush half of a

Re: [zfs-discuss] Re: Lots of seeks?

2006-08-11 Thread Anton Rang
On Aug 11, 2006, at 12:38 PM, Jonathan Adams wrote: The problem is that you don't know the actual *contents* of the parent block until *all* of its children have been written to their final locations. (This is because the block pointer's value depends on the final location) But I know whe

Re: [zfs-discuss] Re: Lots of seeks?

2006-08-11 Thread Anton Rang
On Aug 9, 2006, at 8:18 AM, Roch wrote: So while I'm feeling optimistic :-) we really ought to be able to do this in two I/O operations. If we have, say, 500K of data to write (including all of the metadata), we should be able to allocate a contiguous 500K block on disk and

Re: [zfs-discuss] Re: [osol-discuss] Re: I wish Sun would open-source"QFS"... / was:Re: Re: Distributed File System for Solaris

2006-05-31 Thread Anton Rang
On May 31, 2006, at 10:21 AM, Bill Sommerfeld wrote: Hunh. Gigabit ethernet devices typically implement some form of interrupt blanking or coalescing so that the host cpu can batch I/O completion handling. That doesn't exist in FC controllers? Not in quite the same way, AFAIK. Usually there

Re: [zfs-discuss] Re: [osol-discuss] Re: I wish Sun would open-source"QFS"... / was:Re: Re: Distributed File System for Solaris

2006-05-31 Thread Anton Rang
On May 31, 2006, at 8:56 AM, Roch Bourbonnais - Performance Engineering wrote: I'm not taking a stance on this, but if I keep a controler full of 128K I/Os and assuming there are targetting contiguous physical blocks, how different is that to issuing a very large I/O ? There are d

Re: [zfs-discuss] Re: [osol-discuss] Re: I wish Sun would open-source"QFS"... / was:Re: Re: Distributed File System for Solaris

2006-05-30 Thread Anton Rang
a lot of disks on FC probably isn't too bad, though on parallel SCSI the negotiation overhead and lack of fairness was awful, but I haven't tested this.) On Tue, 2006-05-30 at 11:43 -0500, Anton Rang wrote: Sure, the block size may be 128KB, but ZFS can bundle more than one per-file/tran

Re: [zfs-discuss] Re: [osol-discuss] Re: I wish Sun would open-source"QFS"... / was:Re: Re: Distributed File System for Solaris

2006-05-30 Thread Anton Rang
On May 30, 2006, at 12:23 PM, Nicolas Williams wrote: Another way is to have lots of pre-allocated next ubberblock locations, so that seek-to-one-ubberblock times are always small. Each ubberblock can point to its predecessor and its copies and list the pre-allocated possible locations of i

Re: [zfs-discuss] Re: [osol-discuss] Re: I wish Sun would open-source"QFS"... / was:Re: Re: Distributed File System for Solaris

2006-05-30 Thread Anton Rang
On May 30, 2006, at 11:25 AM, Nicolas Williams wrote: On Tue, May 30, 2006 at 08:13:56AM -0700, Anton B. Rang wrote: Well, I don't know about his particular case, but many QFS clients have found the separation of data and metadata to be invaluable. The primary reason is that it avoids disk seek

Re: [zfs-discuss] Re: [osol-discuss] Re: I wish Sun would open-source"QFS"... / was:Re: Re: Distributed File System for Solaris

2006-05-30 Thread Anton Rang
On May 30, 2006, at 10:36 AM, [EMAIL PROTECTED] wrote: That does not answer th equestion I asked; since ZFS is a copy-on- write filesystem, there's no fixed inode location and streaming writes should always be possible. The überblock still must be updated, however. This may not be an issu

Re: [zfs-discuss] Re: Re[5]: Re: Re: Due to 128KB limit in ZFS it can'tsaturate disks

2006-05-16 Thread Anton Rang
Ok so lets consider your 2MB read. You have the option of setting in in one contiguous place on the disk or split it into 16 x 128K chunks, somewhat spread all over. Now you issue a read to that 2MB of data. As you noted, you either have to wait for the head to find the 2MB block and stream i

Re: [zfs-discuss] Re: ZFS and databases

2006-05-12 Thread Anton Rang
On May 12, 2006, at 11:59 AM, Richard Elling wrote: CPU cycles and memory bandwidth (which both can be in short supply on a database server). We can throw hardware at that :-) Imagine a machine with lots of extra CPU cycles [ ... ] Yes, I've heard this story before, and I won't believe it t

Re: [zfs-discuss] Re: ZFS and databases

2006-05-12 Thread Anton Rang
We might want an interface for the app to know what the natural block size of the file is, so it can read at proper file offsets. Seems that stat(2) could be used for this ... long st_blksize; /* Preferred I/O block size */ This isn't particularly useful for databases if they already

Re: [zfs-discuss] Re: ZFS and databases

2006-05-12 Thread Anton Rang
Now could we detect the pattern that cause holding to the cached block not optimal and do a quick freebehind after the copyout ? Something like Random access + very large file + poor cache hit ratio ? We might detect it ... or we could let the application give us the hint, via the directio ioct