On 2016-04-12 14:29:10 -0400, Robert Haas wrote: > On Wed, Apr 6, 2016 at 6:57 AM, Andres Freund <and...@anarazel.de> wrote: > > While benchmarking on hydra > > (c.f. > > http://archives.postgresql.org/message-id/20160406104352.5bn3ehkcsceja65c%40alap3.anarazel.de), > > which has quite slow IO, I was once more annoyed by how incredibly long > > the vacuum at the the end of a pgbench -i takes. > > > > The issue is that, even for an entirely shared_buffers resident scale, > > essentially no data is cached in shared buffers. The COPY to load data > > uses a 16MB ringbuffer. Then vacuum uses a 256KB ringbuffer. Which means > > that copy immediately writes and evicts all data. Then vacuum reads & > > writes the data in small chunks; again evicting nearly all buffers. Then > > the creation of the ringbuffer has to read that data *again*. > > > > That's fairly idiotic. > > > > While it's not easy to fix this in the general case, we introduced those > > ringbuffers for a reason after all, I think we at least should add a > > special case for loads where shared_buffers isn't fully used yet. Why > > not skip using buffers from the ringbuffer if there's buffers on the > > freelist? If we add buffers gathered from there to the ringlist, we > > should have few cases that regress. > > That does not seem like a good idea from here. One of the ideas I > still want to explore at some point is having a background process > identify the buffers that are just about to be evicted and stick them > on the freelist so that the backends don't have to run the clock sweep > themselves on a potentially huge number of buffers, at perhaps > substantial CPU cost. Amit's last attempt at this didn't really pan > out, but I'm not convinced that the approach is without merit.
FWIW, I've posted an implementation of this in the checkpoint flushing thread; I saw quite substantial gains with it. It was just entirely unrealistic to push that into 9.6. > And, on the other hand, if we don't do something like that, it will be > quite an exceptional case to find anything on the free list. Doing it > just to speed up developer benchmarking runs seems like the wrong > idea. I don't think it's just developer benchmarks. I've seen a number of customer systems where significant portions of shared buffers were unused due to this. Unless you have an OLTP system, you can right now easily end up in a situation where, after a restart, you'll never fill shared_buffers. Just because sequential scans for OLAP and COPY use ringbuffers. It sure isn't perfect to address the problem while there's free space in s_b, but it sure is better than to just continue to have significant portions of s_b unused. > > Additionally, maybe we ought to increase the ringbuffer sizes again one > > of these days? 256kb for VACUUM is pretty damn low. > > But all that does is force the backend to write to the operating > system, which is where the real buffering happens. Relying on that has imo proven to be a pretty horrible idea. > The bottom line > here, IMHO, is not that there's anything wrong with our ring buffer > implementation, but that if you run PostgreSQL on a system where the > I/O is hitting a 5.25" floppy (not to say 8") the performance may be > less than ideal. I really appreciate IBM donating hydra - it's been > invaluable over the years for improving PostgreSQL performance - but I > sure wish they had donated a better I/O subsystem. It's really not just hydra. I've seen the same problem on 24 disk raid-0 type installations. The small ringbuffer leads to reads/writes being constantly interspersed, apparently defeating readahead. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers