On Mon, Jan 20, 2014 at 5:37 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> Agreed; that was the original plan, but implementation delays > prevented the whole vision/discussion/implementation. Requirements > from various areas include WAL rate limiting for replication, I/O rate > limiting, hard CPU and I/O limits for security and mixed workload > coexistence. > > I'd still like to get something on this in 9.4 that alleviates the > replication issues, leaving wider changes for later releases. My first reaction was that we should just have a generic I/O resource throttling. I was only convinced this was a reasonable idea by the replication use case. It would help me to understand the specific situations where replication breaks down due to WAL bandwidth starvation. Heroku has had some problems with slaves falling behind though the immediate problems that causes is the slave filling up disk which we could solve more directly by switching to archive mode rather than slowing down the master. But I would suggest you focus on a specific use case that's problematic so we can judge better if the implementation is really fixing it. > The vacuum_* parameters don't allow any control over WAL production, > which is often the limiting factor. I could, for example, introduce a > new parameter for vacuum_cost_delay that provides a weighting for each > new BLCKSZ chunk of WAL, then rename all parameters to a more general > form. Or I could forget that and just press ahead with the patch as > is, providing a cleaner interface in next release. > >> It's also interesting to wonder about the relationship to >> CHECK_FOR_INTERRUPTS --- although I think that currently, we assume >> that that's *cheap* (1 test and branch) as long as nothing is pending. >> I don't want to see a bunch of arithmetic added to it. > > Good point. I think it should be possible to actually merge it into CHECK_FOR_INTERRUPTS. Have a single global flag io_done_since_check_for_interrupts which is set to 0 after each CHECK_FOR_INTERRUPTS and set to 1 whenever any wal is written. Then CHECK_FOR_INTERRUPTS turns into two tests and branches instead of one in the normal case. In fact you could do all the arithmetic when you do the wal write. Only set the flag if the bandwidth consumed is above the budget. Then the flag should only ever be set when you're about to sleep. I would dearly love to see a generic I/O bandwidth limits so it would be nice to see a nicely general pattern here that could be extended even if we only target wal this release. I'm going to read the existing patch now, do you think it's ready to go or did you want to do more work based on the feedback? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers