On Wednesday, April 3, 2024 10:42 PM, Peter Xu wrote: > On Wed, Apr 03, 2024 at 04:35:35PM +0800, Wang, Lei wrote: > > We should change the following line from > > > > while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, > 100)) { > > > > to > > > > while (qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, > 100)) { > > Stupid me.. :( Thanks for figuring this out. > > > > > After that fix, test passed and no segfault. > > > > Given that the test shows a yield to the main loop won't introduce > > much overhead (<1ms), how about first yield unconditionally, then we > > enter the while loop to wait for several ms and yield periodically? > > Shouldn't the expectation be that this should return immediately without a > wait? We're already processing LISTEN command, and on the source as you > said it was much after the connect(). It won't guarantee the ordering but > IIUC > the majority should still have a direct hit? > > What we can do though is reducing the 100ms timeout if you see that's > perhaps a risk of having too large a downtime when by accident. We can even > do it in a tight loop here considering downtime is important, but to provide > an > intermediate ground: how about 100ms -> 1ms poll?
Would it be better to use busy wait here, instead of blocking for even 1ms here? It's likely that the preempt channel is waiting for the main thread to dispatch for accept(), but we are calling qemu_sem_timedwait here to block the main thread for 1 more ms. > > If you agree (and also to Wei; please review this and comment if there's > any!), > would you write up the commit log, fully test it in whatever way you could, > and resend as a formal patch (please do this before Friday if possible)? You > can keep a "Suggested-by:" for me. I want to queue it for > rc3 if it can catch it. It seems important if Wei can always reproduce it. Not sure if Lei would be able to online as the following two days are Chinese holiday. If not, I could help take over to send late tomorrow. Let's see.