On Wed, 8 Jan 2025 at 22:58, Andres Freund <and...@anarazel.de> wrote: > master: ~18 GB/s > patch, buffered: ~20 GB/s > patch, direct, worker: ~28 GB/s > patch, direct, uring: ~35 GB/s > > > This was with io_workers=32, io_max_concurrency=128, > effective_io_concurrency=1000 (doesn't need to be that high, but it's what I > still have the numbers for). > > > This was without data checksums enabled as otherwise the checksum code becomes > a *huge* bottleneck.
I'm curious about this because the checksum code should be fast enough to easily handle that throughput. I remember checksum overhead being negligible even when pulling in pages from page cache. Is it just that the calculation is slow, or is it the fact that checksumming needs to bring the page into the CPU cache. Did you notice any hints which might be the case? I don't really have a machine at hand that can do anywhere close to this amount of I/O. I'm asking because if it's the calculation that is slow then it seems like it's time to compile different ISA extension variants of the checksum code and select the best one at runtime. -- Ants Aasma