Greetings coreutils folks, There are a number of interesting filesystems (glusterfs, lustre? ... NFS) which could benefit from userspace utilities doing certain operatings in parallel. (I have a very slow glusterfs installation that makes me think that some things can be done better.)
For example, copying a number of files is currently done in series ... cp a b c d e f g h dest/ but, on certain filesystems, it would be roughly twice as efficient if implemented in two parallel threads, something like: cp a c e g dest/ & cp b d f h dest/ since the source and destination files can be stored on multiple physical volumes. Simlarly, ls -l . will readdir(), and then stat() each file in the directory. On a filesystem with high latency, it would be faster to issue the stat() calls asynchronously, and in parallel, and then collect the results for display. (This could improve performance for NFS, in proportion to the latency and the number of threads.) Question: Is there already a set of "improved" utilities that implement this kind of technique? If not, would this kind of performance enhancements be considered useful? (It would mean introducing threading into programs which are currently single-threaded.) To the user, it could look very much the same ... export GNU_COREUTILS_THREADS=8 cp # manipulate multiple files simultaneously mv # manipulate multiple files simultaneously ls # stat() multiple files simultaneously One could also optimise the text utilities like cat by doing the open() and stat() operations in parallel and in the background -- userspace read-ahead caching. All of the utilities which process mutliple files could get small speed boosts from this -- rm, cat, chown, chmod ... even tail, head, wc -- but probably only on network filesystems. &:-) _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils