On 2/11/24 03:13, Thomas Schmitt wrote:
Hi,

David Christensen wrote:
Concurrency:
threads throughput
8       205+198+180+195+205+184+184+189=1,540 MB/s

There remains the question how to join these streams without losing speed
in order to produce a single checksum. (Or one would have to divide the
target into 8 areas which get checked separately.)


I had similar thoughts. A FIFO should be able to join the streams. But, dividing the device by the number of virtual cores and putting a thread on each makes more sense. Either done right should fill the drive I/O capacity.


Does this 8 thread generator cause any problems with the usability of
the rest of the system ? Sluggish program behavior or so ?


CPU Graph shows all eight virtual cores at 100%, so everything else on the system would be sluggish (unless you use nice(1)).


Here is a processor with Intel Secure Key and otherwise unloaded:

2024-02-11 11:48:21 dpchrist@taz ~
$ lscpu | grep 'Model name'
Model name:                         Intel(R) Xeon(R) E-2174G CPU @ 3.80GHz

2024-02-11 11:59:55 dpchrist@taz ~
$ cat /etc/debian_version ; uname -a
11.8
Linux taz 5.10.0-27-amd64 #1 SMP Debian 5.10.205-2 (2023-12-31) x86_64 GNU/Linux

2024-02-11 12:02:52 dpchrist@taz ~
$ dd if=/dev/urandom of=/dev/null bs=1M count=10K
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 20.0469 s, 536 MB/s

threads throughput
1       536 MB/s
2       512+512 = 1,024 MB/s
3       502+503+503 = 1,508 MB/s
4       492+491+492+492 = 1,967 MB/s
5       492+384+491+385+491 = 2,243 MB/s
6       379+491+492+379+379+379 = 2,499 MB/s
7       352+491+356+388+352+357+388 = 2,684 MB/s
8       355+354+344+348+344+354+353+349 = 2,801 MB/s


I have to correct my previous measurement on the 4 GHz Xeon, which was
made with a debuggable version of the program that produced the stream.
The production binary which is compiled with -O2 can write 2500 MB/s into
a pipe with a pacifier program which counts the data:

   $ time $(scdbackup -where bin)/cd_backup_planer -write_random - 100g 
2s62gss463ar46492bni | $(scdbackup -where bin)/raedchen -step 100m -no_output 
-print_count
       100.0g bytes

   real    0m39.884s
   user    0m30.629s
   sys     0m41.013s

(One would have to install scdbackup to reproduce this and to see raedchen
  count the bytes while spinning the classic SunOS boot wheel: |/-\|/-\|/-\
    http://scdbackup.webframe.org/main_eng.html
    http://scdbackup.webframe.org/examples.html
  Oh nostalgy ...
)

Totally useless but yielding nearly 4000 MB/s:

   $ time $(scdbackup -where bin)/cd_backup_planer -write_random - 100g 
2s62gss463ar46492bni >/dev/null

   real    0m27.064s
   user    0m23.433s
   sys     0m3.646s

The main bottleneck in my proposal would be the checksummer:

   $ time $(scdbackup -where bin)/cd_backup_planer -write_random - 100g 
2s62gss463ar46492bni | md5sum
   5a6ba41c2c18423fa33355005445c183  -

   real    2m8.160s
   user    2m25.599s
   sys     0m22.663s

That's quite exactly 800 MiB/s ~= 6.7 Gbps.
Still good enough for vanilla USB-3 with a fast SSD, i'd say.


Yes -- more than enough throughput.


Before I knew of fdupes(1) and jdupes(1), I wrote a Perl script to find duplicate files. It uses the Digest module, and supports any algorithm supported by that module. Here are some runs against a local ext4 on LUKS (with AES-NI) on Intel SSD 520 Series 60 GB and check summing whole files:

2024-02-11 13:32:47 dpchrist@taz ~
$ time finddups --filter w --digest MD4 .thunderbird/ >/dev/null

real    0m0.878s
user    0m0.741s
sys     0m0.137s

2024-02-11 13:33:14 dpchrist@taz ~
$ time finddups --filter w --digest MD5 .thunderbird/ >/dev/null

real    0m1.110s
user    0m0.977s
sys     0m0.132s

2024-02-11 13:33:19 dpchrist@taz ~
$ time finddups --filter w --digest SHA-1 .thunderbird/ >/dev/null

real    0m1.306s
user    0m1.151s
sys     0m0.156s

2024-02-11 13:36:40 dpchrist@taz ~
$ time finddups --filter w --digest SHA-256 .thunderbird/ >/dev/null

real    0m2.545s
user    0m2.424s
sys     0m0.121s

2024-02-11 13:36:51 dpchrist@taz ~
$ time finddups --filter w --digest SHA-384 .thunderbird/ >/dev/null

real    0m1.808s
user    0m1.652s
sys     0m0.157s

2024-02-11 13:37:00 dpchrist@taz ~
$ time finddups --filter w --digest SHA-512 .thunderbird/ >/dev/null

real    0m1.814s
user    0m1.673s
sys     0m0.141s


It is curious that SHA-384 and SHA-512 are faster than SHA-256. I can confirm similar results on:

2024-02-11 13:39:58 dpchrist@laalaa ~
$ lscpu | grep 'Model name'
Model name: Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz


David

Reply via email to