On Fri, 2012-07-13 at 15:21 -0400, Tom Lane wrote: > No, that's incorrect: the fadvise is there, inside pg_flush_data() which > is done during the writing phase.
Yeah, Andres pointed that out, also. > It seems to me that if we think > sync_file_range is a win, we ought to be using it in pg_flush_data just > as much as in initdb. It was pretty clearly a win for initdb, but I wasn't convinced it was a good idea for other use cases. The mechanism is outlined in the email you linked below. To paraphrase, fadvise tries to put the data in the io scheduler queue (still in the kernel before going to the device), and gives up if there is no room; sync_file_range waits for there to be room if none is available. For the case of initdb, the data needing to be fsync'd is effectively constant, and it's a lot of small files. If the requests don't make it to the io scheduler queue before fsync, the kernel doesn't have an opportunity to schedule them properly. But for larger amounts of data copying (like ALTER DATABASE SET TABLESPACE), it seemed like there was more risk that sync_file_range would starve out other processes by continuously filling up the io scheduler queue (I'm not sure if there are protections against that or not). Also, if the files are larger, a single fsync represents more data, and the kernel may be able to schedule it well enough anyway. I'm not an authority in this area though, so if you are comfortable extending sync_file_range to copydir() as well, that's fine with me. > However, I'm a bit confused because in > http://archives.postgresql.org/pgsql-hackers/2012-03/msg01098.php > you said > > > So, it looks like fadvise is the "right" thing to do, but I expect we'll > > Was that backwards? If not, why are we bothering with taking any > portability risks by adding use of sync_file_range? That part of the email was less than conclusive, and I can't remember exactly what point I was trying to make. sync_file_range is a big practical win for the reasons I mentioned above (and also avoids some unpleasant noises coming from the disk drive). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers