On 30.05.2016 12:12, Pavel Butsykin wrote:
On 27.05.2016 20:33, Stefan Hajnoczi wrote:
On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote:
+ qemu_co_mutex_lock(&s->lock);
+ cluster_offset = \
+ qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9,
out_len);
The backslash isn't necessary for wrapping lines in C. This kind of
thing is only necessary in languages like Python where the grammar is
whitespace sensistive.
The C compiler is happy with an arbitrary amount of whitespace
(newlines) in the middle of a statement. The backslash in C is handled
by the preprocessor: it joins the line. That's useful for macro
definitions where you need to tell the preprocessor that several lines
belong to one macro definition. But it's not needed for normal C code.
Thanks for the explanation, but the backslash is used more for the
person as a marker a line break. The current coding style misses this
point, but I can remove the backslash, because I don't think it's
something important :)
+ if (!cluster_offset) {
+ qemu_co_mutex_unlock(&s->lock);
+ ret = -EIO;
+ goto fail;
+ }
+ cluster_offset &= s->cluster_offset_mask;
- BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
- ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf,
out_len);
- if (ret < 0) {
- goto fail;
- }
+ ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset,
out_len);
+ qemu_co_mutex_unlock(&s->lock);
+ if (ret < 0) {
+ goto fail;
}
+ iov = (struct iovec) {
+ .iov_base = out_buf,
+ .iov_len = out_len,
+ };
+ qemu_iovec_init_external(&hd_qiov, &iov, 1);
+
+ BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
+ ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len,
&hd_qiov, 0);
There is a race condition here:
If the newly allocated cluster is only partially filled by compressed
data then qcow2_alloc_compressed_cluster_offset() remembers that more
bytes are still available in the cluster. The
qcow2_alloc_compressed_cluster_offset() caller will continue filling the
same cluster.
Imagine two compressed writes running at the same time. Write A
allocates just a few bytes so write B shares a sector with the first
write:
Sorry, but it seems this will never happen, because the second write
will not pass this check:
uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
uint64_t offset,
int compressed_size)
{
...
/* Compression can't overwrite anything. Fail if the cluster was
already
* allocated. */
cluster_offset = be64_to_cpu(l2_table[l2_index]);
if (cluster_offset & L2E_OFFSET_MASK) {
qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
return 0;
}
...
As you can see we can't do the compressed write in the already allocated
cluster.
Sector 1
|AAABBBBBBBBB|
The race condition is that bdrv_co_pwritev() uses read-modify-write (a
bounce buffer). If both requests call bdrv_co_pwritev() around the same
time then the following could happen:
Sector 1
|000BBBBBBBBB|
or:
Sector 1
|AAA000000000|
It's necessary to hold s->lock around the compressed data write to avoid
this race condition.
I agree, there is really a race.. Thank you, this is a very good point!