On Sat, Feb 5, 2011 at 4:19 PM, Cédric Villemain <cedric.villemain.deb...@gmail.com> wrote: > just reading the patch... > I understand the idea of the 5% flush. > *maybe* it make sense to use effective_io_concurrency GUC here to > improve the ratio, but it might be perceived as a bad usage .. > currently effective_io_concurrency is for planning purpose.
effective_io_concurrency is supposed to be set based on how many spindles your RAID array has. There's no reason to think that the correct flush percentage is in any way related to that value. The reason why we might not want backends to write out too many dirty-only-for-hint-bits buffers during a large sequential scan are that (a) the actual write() system calls take time to copy the buffers into kernel space, slowing the scan, and (b) flushing too many buffers this way could lead to I/O spikes. Increasing the flush percentage slows down the first few scans, but takes fewer scans to reach optimal performance (all hit bits set on disk). Decreasing the flush percentage speeds up the first few scans, but is overall less efficient. We could make this a tunable, but I'm not clear that there is much point. If writing 100% of the pages that have only hint-bit updates slows the scan by 80% and writing 5% of the pages slows the scan by 25%, then dropping below 5% doesn't seem likely to buy much further improvement. You could argue for raising the flush percentage above 5%, but if you go too much higher then it's not clear that you're gaining anything over just flushing them all. I don't think we necessarily have enough experience to know whether this is a good idea at all, so worrying about whether different people need different percentages seems a bit premature. Another point here is that no matter how many times you sequential-scan the table, you never get performance as good as what you would get if you vacuumed it, even if the table contains no dead tuples. I believe this is because VACUUM will not only set the HEAP_XMIN_COMMITTED hint bits; it'll also set PD_ALL_VISIBLE on the page. I wonder if we shouldn't be autovacuuming even tables that are insert-only for precisely this reason, as well as to prevent the case where someone inserts small batches of records for a long time and then finally deletes some stuff. There are no visibility map bits set so, boom, you get this huge, expensive vacuum. This will, of course, be even more of an issue when we get index-only scans. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers