Hi, On 10/31/2017 04:48 PM, Greg Stark wrote: > On 31 October 2017 at 07:05, Chris Travers <chris.trav...@adjust.com> wrote: >> Hi; >> >> After Andres's excellent talk at PGConf we tried benchmarking >> effective_io_concurrency on some of our servers and found that those which >> have a number of NVME storage volumes could not fill the I/O queue even at >> the maximum setting (1000). > > And was the system still i/o bound? If the cpu was 100% busy then > perhaps Postgres just can't keep up with the I/O system. It would > depend on workload though, if you start many very large sequential > scans you may be able to push the i/o system harder. > > Keep in mind effective_io_concurrency only really affects bitmap > index scans (and to a small degree index scans). It works by issuing > posix_fadvise() calls for upcoming buffers one by one. That gets > multiple spindles active but it's not really going to scale to many > thousands of prefetches (and effective_io_concurrency of 1000 > actually means 7485 prefetches). At some point those i/o are going > to start completing before Postgres even has a chance to start > processing the data. > Yeah, initiating the prefetches is not expensive, but it's not free either. So there's a trade-off between time spent on prefetching and processing the data.
I believe this may be actually illustrated using Amdahl's law - the I/O is the parallel part, and processing the data is the serial part. And no matter what you do, the device only has so much bandwidth, which defines the maximum possible speedup (compared to "no prefetch" case). Furthermore, the device does not wait for all the I/O requests to be submitted - it won't wait for 1000 requests and then go "OMG! There's a lot of work to do!" It starts processing the requests as they arrive, and some of them will complete before you're done with submitting the rest, so you'll never see all the requests in the queue at once. And of course, iostat and other tools only give you "average queue length", which is mostly determined by the average throughput. In my experience (on all types of storage, including SSDs and NVMe), the performance quickly and significantly improves once you start increasing the value (say, to 8 or 16, maybe 64). And then the gains become much more modest - not because the device could not handle more, but because of the prefetch/processing ratio reached the optimal value. But all this is actually per-process, if you can run multiple backends (particularly when doing bitmap index scans), I'm sure you'll see the queues being more full. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers