Am 14.01.2015 um 12:18 hat Paolo Bonzini geschrieben: > > > On 14/01/2015 11:20, Kevin Wolf wrote: > >> > The same problem applies to coroutine stacks, and those cannot be > >> > throttled down as easily. But I guess if you limit the number of > >> > threads, the guest gets slowed down and doesn't create as many > >> > coroutines. > > Shouldn't we rather try and decrease the stack sizes a bit? 1 MB per > > coroutine is really a lot, and as I understand it, threads take even > > more by default. > > Yup, 2 MB. Last time I proposed this, I think Markus was strongly in > the "better safe than sorry" camp. :) > > But thread pool workers definitely don't need a big stack.
Right, I think we need to consider what kind of thread it is. For the moment, I'm talking about the block layer only. > >> > It would be nice to have a way to measure coroutine stack usage, similar > >> > to what the kernel does. > > The information which pages are mapped should be there somewhere... > > Yes, there is mincore(2). The complicated part is doing it fast, but > perhaps it doesn't need to be fast. Well, what do you want to use it for? I thought it would only be for a one-time check where we usually end up rather than something that would be enabled in production, but maybe I misunderstood. > I tried gathering warning from GCC's -Wstack-usage=1023 option and the > block layer does not seem to have functions with huge stacks in the I/O > path. > > So, assuming a maximum stack depth of 50 (already pretty generous since > there shouldn't be any recursive calls) a 100K stack should be pretty > much okay for coroutines and thread-pool threads. The potential problem in the block layer is long backing file chains. Perhaps we need to do something to solve that iteratively instead of recursively. > That said there are some offenders in the device models. Other > QemuThreads, especially VCPU threads, had better stay with a big stack. Yes, that's not exactly surprising. Kevin