On Sun, 22 Jun 2008, Ralf Bertling wrote: > > Now lets see if this really has to be this way (this implies no, doesn't it > ;-) > When reading small blocks of data (as opposed to streams discussed earlier) > the requested data resides on a single disk and thus reading it does not > require to send read commands to all disks in the vdev. Without detailed > knowledge of the ZFS code, I suspect the problem is the logical block size of > any ZFS operation always uses the full stripe. If true, I think this is a > design error. > Without that, random reads to a raid-z are almost as fast as mirrored data.
Keep in mind that ZFS checksums all data, the checksum is stored in a different block than the data, and that if ZFS were to checksum on the stripe segment level, a lot more checksums would need to be stored. All these extra checksums would require more data access, more checksum computations, and more stress on the free block allocator since ZFS uses copy-on-write in all cases. Perhaps the solution is to install more RAM in the system so that the stripe is fully cached and ZFS does not need to go back to disk prior to writing an update. The need to read prior to write is clearly what kills ZFS update performance. That is why using 8K blocks helps database performance. Bob ====================================== Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss