Brandorr wrote: > Is ZFS efficient at handling huge populations of tiny-to-small files - > for example, 20 million TIFF images in a collection, each between 5 > and 500k in size? > > I am asking because I could have sworn that I read somewhere that it > isn't, but I can't find the reference. > If you're worried about the I/O throughput, you should avoid RAIDZ1/2 configurations. random read performance will be desastrous if you do;
A raid-z group can do one random read per I/O latency. So for 8 disks (each capable of 200 IOPS) in a zpool split into 2 raid-z groups should be able to server 400 files per second. If you need to serve more files, then you need more disks or need to use mirroring. With mirroring, I'd expect to serve 1600 files (8*200). This model only applies to random reading, not sequential access, not to any types of write loads. For small file creation ZFS can be extremely efficient in that it can create more than 1 file per I/O. It should also approach disk streaming performance for write loads. I've seen random reads ratios with less than 1 MB/s on a X4500 with 40 dedicated disks for data storage. It would be nice to see if the above model matches your data. So if you have all 40 disks in a single raid-z group (an anti best practice) I'd expect <200 files served per second and if the files were of 5K avg size then I'd expect that 1MB/sec. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide, If you don't have to worry about disk space, use mirrors; right on ! I got my best results during my extensive X4500 benchmarking sessions, when I mirrored single slices instead of complete disks (resulting in 40 2-way-mirrors on 40 physical discs, mirroring c0t0d0s0->c0t1d0s1 and c0t1d0s0->c0t0d0s1, and so on). If you're worried about disk space, you should consider striping several instances of RAIDZ1 arrays, each one consisting of three discs or slices. sequential access will go down the cliff, but random reads will be boosted. Writes should be good if not great, no matter what the workload is. I'm interested in data that shows otherwise. You should also adjust the recordsize. For small files I certainly would not. Small files are stored as single record when they are smaller than the recordsize. Single record is good in my book. Not sure when one would want otherwise for small files. Try to measure the average I/O transaction size. There's a good chance that your I/O performance will be best if you set your recordsize to a smaller value. For instance, if your average file size is 12 KB, try using 8K or even 4K recordsize, stay away from 16K or higher. Tuning the record size is currently only recommended for databases (large file) with fixed record access. Again it's interesting input if tuning the recordsize helped another type of workload. -r -- Ralf Ramge Senior Solaris Administrator, SCNA, SCSA Tel. +49-721-91374-3963 [EMAIL PROTECTED] - http://web.de/ 1&1 Internet AG Brauerstraße 48 76135 Karlsruhe Amtsgericht Montabaur HRB 6484 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss Aufsichtsratsvorsitzender: Michael Scheeren _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss