On Mar 22, 2011, at 11:46 AM, Cédric Villemain wrote: > 2011/3/22 Greg Stark <gsst...@mit.edu>: >> On Mon, Mar 21, 2011 at 6:08 PM, Jim Nasby <j...@nasby.net> wrote: >>> Has anyone looked at the overhead of measuring how long IO requests to the >>> kernel take? If we did that not only could we get an idea of what our IO >>> workload looked like, we could also figure out whether a block came out of >>> cache or not. That information could potentially be useful to the planner, >>> but even if the database couldn't use that knowledge itself it would be a >>> damn useful statistic to have... IMHO, far more useful than our current hit >>> rate statistics. >>> >> >> I've done this -- actually better, I used mincore to actually check >> whether the block was in cache before issuing the read -- but it turns >> out you can't get what you're looking for this way. > > The linux fincore() syscall never get in the kernel, maybe something > to revive...
Is there an equivalent in other OSes? Could we use time measurement as an alternative if not? >> >> It turns out when you do this you see one block being read from disk >> followed by n blocks that all appear to be cache hits. Because they've >> been prefetched by the kernel. > > I did the same, I now believe that it is not very important to have > the very exact numbers. > Prefetech blocks *are* in memory when we request them, the first read > access read more than one block because the cost is the same. Yeah... there's places in the planner where we make guesses as to the likelyhood of something being in-cache. If we could actually track complete hit rate over time (PG buffers + FS cache), then we wouldn't have to guess at things anymore. And having this info in pg_stats would be extremely valuable. >> What you end up with is actually something like the number of iops >> which is also an interesting measure but not really what you were >> looking for. >> >> My getrusage patch, which I should still dig out though it's rather >> too late to be committing now unless someone tells me otherwise, would >> tell you how much i/o a plan node actually did. But you won't know >> which blocks did the i/o since I was only tracking totals for the plan >> node. That's probably what you're looking for here. > > Please show us the patch :) -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers