On Wed, Jun 17 at 13:49, Alan Hargreaves wrote:
Another question worth asking here is, is a find over the entire filesystem something that they would expect to be executed with sufficient regularity that it the execution time would have a business impact.
Exactly. That's such an odd business workload on 250,000,000 files that there isn't likely to be much of a shortcut other than just throwing tons of spindles (or SSDs) at the problem, and/or having tons of memory. If the finds are just by name, thats easy for the system to cache, but if you're expecting to run something against the output of find with -exec to parse/process 250M files on a regular basis, you'll likely be severely IO bound. Almost to the point of arguing for something like Hadoop or another form of distributed map:reduce on your dataset with a lot of nodes, instead of a single storage server. -- Eric D. Mudama edmud...@mail.bounceswoosh.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss