On Mon, Nov 4, 2019 at 1:03 PM Darafei "Komяpa" Praliaskouski <m...@komzpa.net> wrote: >> >> >> This is somewhat similar to a memory usage problem with a >> parallel query where each worker is allowed to use up to work_mem of >> memory. We can say that the users using parallel operation can expect >> more system resources to be used as they want to get the operation >> done faster, so we are fine with this. However, I am not sure if that >> is the right thing, so we should try to come up with some solution for >> it and if the solution is too complex, then probably we can think of >> documenting such behavior. > > > In cloud environments (Amazon + gp2) there's a budget on input/output > operations. If you cross it for long time, everything starts looking like you > work with a floppy disk. > > For the ease of configuration, I would need a "max_vacuum_disk_iops" that > would limit number of input-output operations by all of the vacuums in the > system. If I set it to less than value of budget refill, I can be sure than > that no vacuum runs too fast to impact any sibling query. > > There's also value in non-throttled VACUUM for smaller tables. On gp2 such > things will be consumed out of surge budget, and its size is known to > sysadmin. Let's call it "max_vacuum_disk_surge_iops" - if a relation has less > blocks than this value and it's a blocking in any way situation > (antiwraparound, interactive console, ...) - go on and run without throttling. >
I think the need for these things can be addressed by current cost-based-vacuum parameters. See docs [1]. For example, if you set vacuum_cost_delay as zero, it will allow the operation to be performed without throttling. > For how to balance the cost: if we know a number of vacuum processes that > were running in the previous second, we can just divide a slot for this > iteration by that previous number. > > To correct for overshots, we can subtract the previous second's overshot from > next one's. That would also allow to account for surge budget usage and let > it refill, pausing all autovacuum after a manual one for some time. > > Precision of accounting limiting count of operations more than once a second > isn't beneficial for this use case. > I think it is better if we find a way to rebalance the cost on some worker exit rather than every second as anyway it won't change unless any worker exits. [1] - https://www.postgresql.org/docs/devel/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com