On Mon, Oct 4, 2010 at 4:37 PM, Anthony Liguori <anth...@codemonkey.ws> wrote: > On 10/04/2010 05:40 AM, Arun R Bharadwaj wrote: >> >> Hi, >> >> I am working on introducing threading model into Qemu. This introduces >> the Threadlets infrastructure which allows subsystems to offload possibly >> blocking work to a queue to be processed by a pool of threads >> asynchrnonously. >> Threadlets are useful when there are operations that can be performed >> outside the context of the VCPU/IO threads inorder to free these latter >> to service any other guest requests. As of now we have converted a few >> v9fs calls like v9fs_read, v9fs_write etc to this model to test the >> working nature of this model. >> >> I observed that performance is degrading in the threading model for the >> following reason: >> >> Taking the example of v9fs_read call: We submit the blocking work in >> v9fs_read to a queue and do a qemu_cond_signal(). the work will be picked >> up by a worker thread which is waiting on the condition variable to go >> true. I measured the time taken to execute the v9fs_read call; in the >> case without the threading model, it takes around 15microseconds to >> complete, while in the threading model, it takes around 30microsends >> to complete. Most of this extra time (around 22microsends) is spent in >> completing the qemu_cond_signal() call. I suspect this is the reason why >> I am seeing performance hit with the threading model, because this >> time is much more than the time needed to complete the entire >> v9fs_read call in the non threading model case. >> >> I need advice on how to proceed from this situation. Pasting relevant >> code snippets below. >> >> thanks >> arun. >> --- >> >> /* Code to sumbit work to the queue */ >> void submit_threadlet_to_queue(ThreadletQueue *queue, ThreadletWork >> *work) >> { >> qemu_mutex_lock(&(queue->lock)); >> if (queue->idle_threads == 0&& queue->cur_threads< >> queue->max_threads) { >> spawn_threadlet(queue); >> } >> QTAILQ_INSERT_TAIL(&(queue->request_list), work, node); >> qemu_cond_signal(&(queue->cond)); >> qemu_mutex_unlock(&(queue->lock)); >> } >> > > Try moving qemu_cond_signal past qemu_mutex_unlock().
There are race conditions here: 1. When a new threadlet is started because there are no idle threads, qemu_cond_signal() may fire a blank because the threadlet isn't inside qemu_cond_timedwait() yet. The result, the work item is deadlocked until another thread grabs more work off the queue. If I'm reading the code correctly this bug is currently present! 2. Moving qemu_cond_signal() outside queue->lock is dangerous for the same reason: you need to be careful not to qemu_cond_signal() when the thread isn't inside qemu_cond_timedwait(). Stefan Stefan