On 8 Oct 2013, at 20:41, Hans de Goede wrote: >>> >>> Wasn't it 1 ms until the offending commit (note 250 us does >>> sound better to me). >> >> I believe you've made it 1 nanosecond not 1 millisecond; > > Correct, the 1 ms I referred to was before your commit which changed > things from ms to ns.
OK I was looking at the patch as it would apply to master now. > The purpose of the 1 ns timeout is to cause os_host_main_loop_wait > to unlock the iothread, as $subject says the problem I'm seeing seems > to be lock starvation not cpu starvation. > > Note as I already indicated I'm in no way an expert in this, if you > and or Paolo suspect cpu starvation may happen too, then bumping > the timeout to 250 us is fine with me too. > > If we go with 250 us that thus pose the question though if we should > always keep a minimum timeout of 250 us when not non-blocking, or only > bump it to 250 us when main_loop_tlg has already expired events and > thus is causing a timeout of 0. I am by no means an expert in the iothread bit, so let's pool our ignorance ... :-) Somewhere within that patch series (7b595f35 I think) I fixed up the spin counter bit, which made it slightly less yucky and work with milliseconds. I hope I didn't break it but there seems something slightly odd about the use case here. If you are getting the spin error, this implies something is pretty much constantly polling os_host_main_loop_wait with a zero timeout. As you point out this is going to be main_loop_wait and almost certainly main_loop_wait called with nonblocking set to 1. The comment at line 208 suggests that "the I/O thread is very busy or we are incorrectly busy waiting in the I/O thread". Do we know which is happening? Perhaps rather than give up the io_thread mutex on every call (which is in practice what a 1 nanosecond timeout does) we should give it up if we have not released it for X nanoseconds (maybe X=250us), or on every Y calls. I think someone other than me should consider the effect of dropping and reacquiring a mutex so frequently under heavy I/O load, but I'm not sure it's a great idea. So on reflection you might be more right with 1 nanosecond than 250us as a timeout of 250us, but I wonder whether a strategy of just dropping the lock occasionally (and still using a zero timeout) might be better. -- Alex Bligh