On Sun, 1 Sept 2024 at 18:28, Andres Freund <and...@anarazel.de> wrote: > 0 workers 1 worker 2 workers > 4 workers > master: 65.753 33.246 21.095 > 12.918 > aio v2.0, worker: 21.519 12.636 10.450 > 10.004 > aio v2.0, uring*: 31.446 17.745 12.889 > 10.395 > aio v2.0, uring** 23.497 13.824 10.881 > 10.589 > aio v2.0, direct, worker: 22.377 11.989 09.915 > 09.772 > aio v2.0, direct, uring*: 24.502 12.603 10.058 > 09.759
I took this for a test drive on an AMD 3990x machine with a 1TB Samsung 980 Pro SSD on PCIe 4. I only tried io_method = io_uring, but I did try with and without direct IO. This machine has 64GB RAM and I was using ClickBench Q2 [1], which is "SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;" (for some reason they use 0-based query IDs). This table is 64GBs without indexes. I'm seeing direct IO slower than buffered IO with smaller worker counts. That's counter to what I would have expected as I'd have expected the memcpys from the kernel space to be quite an overhead in the buffered IO case. With larger worker counts the bottleneck is certainly disk. The part that surprised me was that the bottleneck is reached more quickly with buffered IO. I was seeing iotop going up to 5.54GB/s at higher worker counts. times in milliseconds workers buffered direct cmp 0 58880 102852 57% 1 33622 53538 63% 2 24573 40436 61% 4 18557 27359 68% 8 14844 17330 86% 16 12491 12754 98% 32 11802 11956 99% 64 11895 11941 100% Is there some other information I can provide to help this make sense? (Or maybe it does already to you.) David [1] https://github.com/ClickHouse/ClickBench/blob/main/postgresql-tuned/queries.sql