On 2013-05-28 10:12:05 -0500, Jon Nelson wrote: > On Tue, May 28, 2013 at 9:19 AM, Robert Haas <robertmh...@gmail.com> wrote: > > On Tue, May 28, 2013 at 10:15 AM, Andres Freund <and...@2ndquadrant.com> > > wrote: > >> On 2013-05-28 10:03:58 -0400, Robert Haas wrote: > >>> On Sat, May 25, 2013 at 2:55 PM, Jon Nelson <jnelson+pg...@jamponi.net> > >>> wrote: > >>> >> The biggest thing missing from this submission is information about > >>> >> what > >>> >> performance testing you did. Ideally performance patches are > >>> >> submitted with > >>> >> enough information for a reviewer to duplicate the same test the > >>> >> author did, > >>> >> as well as hard before/after performance numbers from your test > >>> >> system. It > >>> >> often turns tricky to duplicate a performance gain, and being able to > >>> >> run > >>> >> the same test used for initial development eliminates a lot of the > >>> >> problems. > >>> > > >>> > This has been a bit of a struggle. While it's true that WAL file > >>> > creation doesn't happen with great frequency, and while it's also true > >>> > that - with strace and other tests - it can be proven that > >>> > fallocate(16MB) is much quicker than writing it zeroes by hand, > >>> > proving that in the larger context of a running install has been > >>> > challenging. > >>> > >>> It's nice to be able to test things in the context of a running > >>> install, but sometimes a microbenchmark is just as good. I mean, if > >>> posix_fallocate() is faster, then it's just faster, right? > >> > >> Well, it's a bit more complex than that. Fallocate doesn't actually > >> initializes the disk space in most filesystems, just marks it as > >> allocated and zeroed which is one of the reasons it can be noticeably > >> faster. But that can make the runtime overhead of writing to those pages > >> higher. > > > > Maybe it would be good to measure that impact. Something like this: > > > > 1. Write 16MB of zeroes to an empty file in the same size chunks we're > > currently using (8kB?). Time that. Rewrite the file with real data. > > Time that. > > 2. posix_fallocate() an empty file out to 16MB. Time that. Rewrite > > the fie with real data. Time that. > > > > Personally, I have trouble believing that writing 16MB of zeroes by > > hand is "better" than telling the OS to do it for us. If that's so, > > the OS is just stupid, because it ought to be able to optimize the > > crap out of that compared to anything we can do. Of course, it is > > more than possible that the OS is in fact stupid. But I'd like to > > hope not. > > I wrote a little C program to do something very similar to that (which > I'll hope to post later today). > It opens a new file, fallocates 16MB, calls fdatasync. Then it loops > 10 times: seek to the start of the file, writes 16MB of ones, calls > fdatasync.
You need to call fsync() not fdatasync() the first time round. fdatasync doesn't guarantee metadata is synced. > Then it closes and removes the file, re-opens it, and this time writes > out 16MB of zeroes, calls fdatasync, and then does the same loop as > above. The program times the process from file open to file unlink, > inclusive. > > The results - for me - are pretty consistent: using fallocate is > 12-13% quicker than writing out zeroes. Cool! > I used fdatasync twice to (attempt) to mimic what the WAL writer does. Not sure what you mean by that though? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers