>Since you were reporting 2 min, was wondering what your platform is and >whether there might be something else involved eating the 2 min realtime?
Shouldn't any modern operating system do enough caching of inodes and files (like the file with the "cp" executable) that the only difference should be the CPU time for "cp" to initialize and parse its command line? Does it make a difference if you run "cp" and have some large directories $PATH before /bin compared to running "/bin/cp"? Shells like bash hash paths to commands, but that wouldn't help if each "cp" runs from a fresh shell from "xargs". Does it make a difference if the source or destination files are absolute or relative paths? Does it make a difference if one of the path components is a network mount that can't be cached and requires sending requests to a remote server? What is the locale? On my Fedora 40 laptop, strace shows that "cp" with LANG=en_US.UTF-8 opens /usr/lib/locale/locale-archive , which is 229,754,784 bytes, although it then uses mmap and probably reads just a few bytes. On my Fedora 40 i7-12800H laptop, "cd /tmp && touch abc && time /bin/cp abc def" shows "real 0m0.004s". 2500 copies would scale to 10s. Does the person with the problem have a file system that gets slow when several thousand files are in a directory? ________________________________ From: coreutils-bounces+williambader=hotmail....@gnu.org <coreutils-bounces+williambader=hotmail....@gnu.org> on behalf of Glenn Golden <g...@zplane.com> Sent: Sunday, August 25, 2024 7:34 PM To: Yair Lenga <yair.le...@gmail.com> Cc: P=C3=A1draig Brady <p...@draigbrady.com>; Coreutils <coreutils@gnu.org> Subject: Re: Pair-wise file operation (copy, link) Yair Lenga <yair.le...@gmail.com> [1970-01-01 00:00:00 +0000]: > > In my case, I have to bulk-move about 2500 files. This is part of a > recurring sync job that has to mirror an existing hierarchy into a new > hierarchy with different naming rules. > > It takes no time to create the mapping (even in bash script, case > statement). When I "pipe" the mapping into "ln" (with xargs) it takes >2 > min to create the symlinks. Practically, all the time is spent on launch= > ing "ln". With a custom perl script - it's 3 seconds. > 2c observation: Years ago I had a similar weekly need at work, except for an even larger number of files (10k - 20k or so iirc), and always used a one-liner xargs script to do the copy. My recollection is that it would complete in "a few" seconds (maybe 10s or so). I couldn't find that script, but I just tried it now manually: Created 2500 randomly named files, each comprising 4kB random data, and then copied them to new names like this $ cat fmap | xargs -L1 cp where fmap is the name-mapping file, comprising 2500 lines like oldname0 newname0 oldname1 newname1 oldname2 newname2 . . . . . . It took under 4 seconds, plus another 1-2 seconds for the sync. This was on a commodity x86_64 laptop. The target filesystem was the same as original. Device is a slow 20-year old HDD. Since you were reporting 2 min, was wondering what your platform is and whether there might be something else involved eating the 2 min realtime? Glenn