Am 26.01.2011 18:12, schrieb Anthony Liguori: > On 01/26/2011 10:38 AM, Avi Kivity wrote: >> On 01/26/2011 06:28 PM, Anthony Liguori wrote: >>> On 01/26/2011 10:13 AM, Avi Kivity wrote: >>>> Serializing against a global mutex has the advantage that it can be >>>> treated as a global lock that is decomposed into fine-grained locks. >>>> >>>> For example, we can start the code conversion from an explict async >>>> model to a threaded sync model by converting the mutex into a >>>> shared/exclusive lock. Operations like read and write take the lock >>>> for shared access (and take a fine-grained mutex on the metadata >>>> cache entry), while operation like creating a snapshot take the lock >>>> for exclusive access. That doesn't work with freeze/thaw. >>> >>> The trouble with this is that you increase the amount of re-entrance >>> whereas freeze/thaw doesn't. >>> >>> The code from the beginning of the request to where the mutex is >>> acquired will be executed for every single request even while >>> requests are blocked at the mutex acquisition. >> >> It's just a few instructions. >> >>> >>> With freeze/thaw, you freeze the queue and prevent any request from >>> starting until you thaw. You only thaw and return control to allow >>> another request to execute when you begin executing an asynchronous >>> I/O callback. >>> >> >> What do you actually save? The longjmp() to the coroutine code, >> linking in to the mutex wait queue, and another switch back to the >> main coroutine? Given that we don't expect to block often, it seems >> hardly a cost worth optimizing.
And if you cared, you could probably optimize it in the mutex implementation. > It's a matter of correctness, not optimization. > > Consider the following example: > > coroutine { > l2 = find_l2(offset); > // allocates but does not increment max cluster offset > l2[l2_offset(offset)] = alloc_cluster(); > > co_mutex_lock(&lock); > write_l2(l2); > co_mutex_unlock(&lock); > > l1[l1_offset(offset)] = l2; > > co_mutex_lock(&lock); > write_l1(l1); > co_mutex_unlock(&lock); > > commit_cluster(l2[l2_offset(offset)]); > } > > This code is incorrect. The code before the first co_mutex_lock() may > be called a dozen times before before anyone finishes this routine. > That means the same cluster is reused multiple times. But this is not a new problem. In qcow2 we have this today: cluster_offset = alloc_cluster(); bdrv_aio_write() ... update_l2() There is already code to handle this case, even though it could probably be simplified for coroutines. I don't think we want to make the code worse during the conversion by basically serializing everything. > This code was correct before we introduced coroutines because we > effectively had one big lock around the whole thing. So what's the lesson to learn? You need to take care when you convert a big lock into smaller ones? >>> I think my previous example was wrong, you really want to do: >>> >>> qcow2_aio_writev() { >>> coroutine { >>> freeze(); >>> sync_io(); // existing qcow2 code >>> thaw(); >>> // existing non I/O code >>> bdrv_aio_writev(callback); // no explicit freeze/thaw needed >>> } >>> } >>> >>> This is equivalent to our existing code because no new re-entrance is >>> introduced. The only re-entrancy points are in the >>> bdrv_aio_{readv,writev} calls. >> >> This requires you to know which code is sync, and which code is >> async. My conversion allows you to wrap the code blindly with a >> mutex, and have it do the right thing automatically. This is most >> useful where the code can be sync or async depending on data (which is >> the case for qcow2). Well, but in the case of qcow2, you don't want to have a big mutex around everything. We perfectly know which parts are asynchronous and which are synchronous, so we'd want to do it finer grained from the beginning. Kevin