On Mon, Dec 9, 2013 at 6:47 PM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > On 12/09/2013 11:35 PM, Jim Nasby wrote: >> >> On 12/8/13 1:49 PM, Heikki Linnakangas wrote: >>> >>> On 12/08/2013 08:14 PM, Greg Stark wrote: >>>> >>>> The whole accounts table is 1.2GB and contains 10 million rows. As >>>> expected with rows_per_block set to 1 it reads 240MB of that >>>> containing nearly 2 million rows (and takes nearly 20s -- doing a full >>>> table scan for select count(*) only takes about 5s): >>> >>> >>> One simple thing we could do, without or in addition to changing the >>> algorithm, is to issue posix_fadvise() calls for the blocks we're >>> going to read. It should at least be possible to match the speed of a >>> plain sequential scan that way. >> >> >> Hrm... maybe it wouldn't be very hard to use async IO here either? I'm >> thinking it wouldn't be very hard to do the stage 2 work in the callback >> routine... > > > Yeah, other than the fact we have no infrastructure to do asynchronous I/O > anywhere in the backend. If we had that, then we could easily use it here. I > doubt it would be much better than posix_fadvising the blocks, though.
Without patches to the kernel, it is much better. posix_fadvise interferes with read-ahead, so posix_fadvise on, say, bitmap heap scans (or similarly sorted analyze block samples) run at 1 IO / block, ie horrible, whereas aio can do read coalescence and read-ahead when the kernel thinks it'll be profitable, significantly increasing IOPS. I've seen everything from a 2x to 10x difference. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers