On 2 Sep 2015 14:54, "Andres Freund" <and...@anarazel.de> wrote: > > > > + /*---------- > > + * The user-visible GUC parameter is the number of drives (spindles), > > + * which we need to translate to a number-of-pages-to-prefetch target. > > + * The target value is stashed in *extra and then assigned to the > > actual > > + * variable by assign_effective_io_concurrency. > > + * > > + * The expected number of prefetch pages needed to keep N drives busy > > is: > > + * > > + * drives | I/O requests > > + * -------+---------------- > > + * 1 | 1 > > + * 2 | 2/1 + 2/2 = 3 > > + * 3 | 3/1 + 3/2 + 3/3 = 5 1/2 > > + * 4 | 4/1 + 4/2 + 4/3 + 4/4 = 8 1/3 > > + * n | n * H(n) > > I know you just moved this code. But: I don't buy this formula. Like at > all. Doesn't queuing and reordering entirely invalidate the logic here?
I can take the blame for this formula. It's called the "Coupon Collector Problem". If you hit get a random coupon from a set of n possible coupons, how many random coupons would you have to collect before you expect to have at least one of each. This computation model assumes we have no information about which spindle each block will hit. That's basically true for the case of bitmapheapscan for most cases because the idea of bitmapheapscan is to be picking a sparse set of blocks and there's no reason the blocks being read will have any regularity that causes them all to fall on the same spindles. If in fact you're reading a fairly dense set then bitmapheapscan probably is a waste of time and simply reading sequentially would be exactly as fast or even faster. We talked about this quite a bit back then and there was no dispute that the aim is to provide GUCs that mean something meaningful to the DBA who can actually measure them. They know how many spindles they have. They do not know what the optimal prefetch depth is and the only way to determine it would be to experiment with Postgres. Worse, I think the above formula works for essentially random I/O but for more predictable I/O it might be possible to use a different formula. But if we made the GUC something low level like "how many blocks to prefetch" then we're left in the dark about how to handle that different access pattern. I did speak to a dm developer and he suggested that the kernel could help out with an API. He suggested something of the form "how many blocks do I have to read before the end of the current device". I wasn't sure exactly what we would do with something like that but it would be better than just guessing how many I/O operations we need to issue to keep all the spindles busy. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers