On Vi, 15 mai 20, 12:38:12, Albretch Mueller wrote: > On 5/14/20, Nicolas George <geo...@nsup.org> wrote: > > > The question was not how to find the files, the formulation of the > > question indicates that Albretch has that covered. > > Yeah, my problem is not finding the files per se. I have them or > could have them easily listed.
If your filenames contain "strange" characters you can avoid a lot of headaches by using 'find -exec <whatever> {} +' instead of using xargs directly. The man page claims the '-exec {} +' is similar to xargs. Since you have these many files you could test ;) Using 'xargs' directly (or combined with 'find -print0' to avoid issues with strange filenames) allows for some additional tuning. > The thing is that when you work on copora research you have to get > fairly complicated answers from millions of text "as fast as possible" > and you have to make sure that your baseline hasn't been changed. > > I will have to play (again) with the options that you have given me > and by the way I said sha256sum as an example in the typical case you > would run "file" and two hashes on each file and that would take > forever a user's machine. Are you sure the bottleneck is in execution? With so many files it could be many other things (storage, RAM, etc.). Kind regards, Andrei -- http://wiki.debian.org/FAQsFromDebianUser
signature.asc
Description: PGP signature