Anton B. Rang writes: > > When you have a striped storage device under a > > file system, then the database or file system's view > > of contiguous data is not contiguous on the media. > > Right. That's a good reason to use fairly large stripes. (The > primary limiting factor for stripe size is efficient parallel access; > using a 100 MB stripe size means that an average 100 MB file gets less > than two disks' worth of throughput.) > > ZFS, of course, doesn't have this problem, since it's handling the > layout on the media; it can store things as contiguously as it wants. >
It can do what it wants. But currently what it does is to maintain files subject to small random writes contiguous to the level of the zfs recordsize. Now after a significant run of random writes the files ends up with a scattered on-disk layout. This should work well for the transaction parts of the workload. But the implications of using a small recordsize are that large sequential scans of files will make the disk heads very busy fetching or pre-fetching recordsized chunks. Get more spindles and a good prefetch algorithm you can reach whatever throughput you need. The problem is that your scanning ops will create heavy competition at the spindle level thus impacting the transactional response time (once you have 150 IOPS on every spindle just prefetching data for your full table scans, the OLTP will suffer). Yes we do need data to characterise this but the physics are fairly clear. The BP suggesting a small recordize needs to be updated. We need to strike a balance between random writes and sequential reads which does imply using greater records that 8K/16K DB blocks. -r _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss