On Wed 15-01-14 14:38:44, Hannu Krosing wrote: > On 01/15/2014 02:01 PM, Jan Kara wrote: > > On Wed 15-01-14 12:16:50, Hannu Krosing wrote: > >> On 01/14/2014 06:12 PM, Robert Haas wrote: > >>> This would be pretty similar to copy-on-write, except > >>> without the copying. It would just be > >>> forget-from-the-buffer-pool-on-write. > >> +1 > >> > >> A version of this could probably already be implement using MADV_DONTNEED > >> and MADV_WILLNEED > >> > >> Thet is, just after reading the page in, use MADV_DONTNEED on it. When > >> evicting > >> a clean page, check that it is still in cache and if it is, then > >> MADV_WILLNEED it. > >> > >> Another nice thing to do would be dynamically adjusting kernel > >> dirty_background_ratio > >> and other related knobs in real time based on how many buffers are dirty > >> inside postgresql. > >> Maybe in background writer. > >> > >> Question to LKM folks - will kernel react well to frequent changes to > >> /proc/sys/vm/dirty_* ? > >> How frequent can they be (every few second? every second? 100Hz ?) > > So the question is what do you mean by 'react'. We check whether we > > should start background writeback every dirty_writeback_centisecs (5s). We > > will also check whether we didn't exceed the background dirty limit (and > > wake writeback thread) when dirtying pages. However this check happens once > > per several dirtied MB (unless we are close to dirty_bytes). > > > > When writeback is running we check roughly once per second (the logic is > > more complex there but I don't think explaining details would be useful > > here) whether we are below dirty_background_bytes and stop writeback in > > that case. > > > > So changing dirty_background_bytes every few seconds should work > > reasonably, once a second is pushing it and 100 Hz - no way. But I'd also > > note that you have conflicting requirements on the kernel writeback. On one > > hand you want checkpoint data to steadily trickle to disk (well, trickle > > isn't exactly the proper word since if you need to checkpoing 16 GB every 5 > > minutes than you need a steady throughput of ~50 MB/s just for > > checkpointing) so you want to set dirty_background_bytes low, on the other > > hand you don't want temporary files to get to disk so you want to set > > dirty_background_bytes high. > Is it possible to have more fine-grained control over writeback, like > configuring dirty_background_bytes per file system / device (or even > a file or a group of files) ? Currently it isn't possible to tune dirty_background_bytes per device directly. However see below.
> If not, then how hard would it be to provide this ? We do track amount of dirty pages per device and the thread doing the flushing is also per device. The thing is that currently we compute the per-device background limit as dirty_background_bytes * p, where p is a proportion of writeback happening on this device to total writeback in the system (computed as floating average with exponential time-based backoff). BTW, similarly maximum per-device dirty limit is derived from global dirty_bytes in the same way. And you can also set bounds on the proportion 'p' in /sys/block/sda/bdi/{min,max}_ratio so in theory you should be able to set fixed background limit for a device by setting matching min and max proportions. Honza -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers