On 2/11/24 03:13, Thomas Schmitt wrote:
Hi,
David Christensen wrote:
Concurrency:
threads throughput
8 205+198+180+195+205+184+184+189=1,540 MB/s
There remains the question how to join these streams without losing speed
in order to produce a single checksum. (Or one would have to divide the
target into 8 areas which get checked separately.)
I had similar thoughts. A FIFO should be able to join the streams.
But, dividing the device by the number of virtual cores and putting a
thread on each makes more sense. Either done right should fill the
drive I/O capacity.
Does this 8 thread generator cause any problems with the usability of
the rest of the system ? Sluggish program behavior or so ?
CPU Graph shows all eight virtual cores at 100%, so everything else on
the system would be sluggish (unless you use nice(1)).
Here is a processor with Intel Secure Key and otherwise unloaded:
2024-02-11 11:48:21 dpchrist@taz ~
$ lscpu | grep 'Model name'
Model name: Intel(R) Xeon(R) E-2174G CPU @ 3.80GHz
2024-02-11 11:59:55 dpchrist@taz ~
$ cat /etc/debian_version ; uname -a
11.8
Linux taz 5.10.0-27-amd64 #1 SMP Debian 5.10.205-2 (2023-12-31) x86_64
GNU/Linux
2024-02-11 12:02:52 dpchrist@taz ~
$ dd if=/dev/urandom of=/dev/null bs=1M count=10K
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 20.0469 s, 536 MB/s
threads throughput
1 536 MB/s
2 512+512 = 1,024 MB/s
3 502+503+503 = 1,508 MB/s
4 492+491+492+492 = 1,967 MB/s
5 492+384+491+385+491 = 2,243 MB/s
6 379+491+492+379+379+379 = 2,499 MB/s
7 352+491+356+388+352+357+388 = 2,684 MB/s
8 355+354+344+348+344+354+353+349 = 2,801 MB/s
I have to correct my previous measurement on the 4 GHz Xeon, which was
made with a debuggable version of the program that produced the stream.
The production binary which is compiled with -O2 can write 2500 MB/s into
a pipe with a pacifier program which counts the data:
$ time $(scdbackup -where bin)/cd_backup_planer -write_random - 100g
2s62gss463ar46492bni | $(scdbackup -where bin)/raedchen -step 100m -no_output
-print_count
100.0g bytes
real 0m39.884s
user 0m30.629s
sys 0m41.013s
(One would have to install scdbackup to reproduce this and to see raedchen
count the bytes while spinning the classic SunOS boot wheel: |/-\|/-\|/-\
http://scdbackup.webframe.org/main_eng.html
http://scdbackup.webframe.org/examples.html
Oh nostalgy ...
)
Totally useless but yielding nearly 4000 MB/s:
$ time $(scdbackup -where bin)/cd_backup_planer -write_random - 100g
2s62gss463ar46492bni >/dev/null
real 0m27.064s
user 0m23.433s
sys 0m3.646s
The main bottleneck in my proposal would be the checksummer:
$ time $(scdbackup -where bin)/cd_backup_planer -write_random - 100g
2s62gss463ar46492bni | md5sum
5a6ba41c2c18423fa33355005445c183 -
real 2m8.160s
user 2m25.599s
sys 0m22.663s
That's quite exactly 800 MiB/s ~= 6.7 Gbps.
Still good enough for vanilla USB-3 with a fast SSD, i'd say.
Yes -- more than enough throughput.
Before I knew of fdupes(1) and jdupes(1), I wrote a Perl script to find
duplicate files. It uses the Digest module, and supports any algorithm
supported by that module. Here are some runs against a local ext4 on
LUKS (with AES-NI) on Intel SSD 520 Series 60 GB and check summing whole
files:
2024-02-11 13:32:47 dpchrist@taz ~
$ time finddups --filter w --digest MD4 .thunderbird/ >/dev/null
real 0m0.878s
user 0m0.741s
sys 0m0.137s
2024-02-11 13:33:14 dpchrist@taz ~
$ time finddups --filter w --digest MD5 .thunderbird/ >/dev/null
real 0m1.110s
user 0m0.977s
sys 0m0.132s
2024-02-11 13:33:19 dpchrist@taz ~
$ time finddups --filter w --digest SHA-1 .thunderbird/ >/dev/null
real 0m1.306s
user 0m1.151s
sys 0m0.156s
2024-02-11 13:36:40 dpchrist@taz ~
$ time finddups --filter w --digest SHA-256 .thunderbird/ >/dev/null
real 0m2.545s
user 0m2.424s
sys 0m0.121s
2024-02-11 13:36:51 dpchrist@taz ~
$ time finddups --filter w --digest SHA-384 .thunderbird/ >/dev/null
real 0m1.808s
user 0m1.652s
sys 0m0.157s
2024-02-11 13:37:00 dpchrist@taz ~
$ time finddups --filter w --digest SHA-512 .thunderbird/ >/dev/null
real 0m1.814s
user 0m1.673s
sys 0m0.141s
It is curious that SHA-384 and SHA-512 are faster than SHA-256. I can
confirm similar results on:
2024-02-11 13:39:58 dpchrist@laalaa ~
$ lscpu | grep 'Model name'
Model name: Intel(R) Core(TM) i7-2720QM CPU @
2.20GHz
David