[Qemu-devel] [PATCH 3/3] block/parallels: fix access to not initialized memory in catalog_bitmap

2014-10-08 Thread Denis V. Lunev
found by valgrind.

Command: ./qemu-img convert -f parallels -O qcow2 1.hds 1.img
Invalid read of size 4
   at 0x17D0EF: parallels_co_read (parallels.c:357)
   by 0x11FEE4: bdrv_aio_rw_vector (block.c:4640)
   by 0x11FFBF: bdrv_aio_readv_em (block.c:4652)
   by 0x11F55F: bdrv_co_readv_em (block.c:4862)
   by 0x123428: bdrv_aligned_preadv (block.c:3056)
   by 0x1239FA: bdrv_co_do_preadv (block.c:3162)
   by 0x125424: bdrv_rw_co_entry (block.c:2706)
   by 0x155DD9: coroutine_trampoline (coroutine-ucontext.c:118)
   by 0x6975B6F: ??? (in /lib/x86_64-linux-gnu/libc-2.19.so)

The problem is that s->catalog_bitmap is allocated/filled as
gmalloc(s->catalog_size) thus index validity check must be
inclusive, i.e. index >= s->catalog_size is invalid.

Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index 2a814f3..4f9cd8d 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -155,7 +155,7 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
 offset = sector_num % s->tracks;
 
 /* not allocated */
-if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
+if ((index >= s->catalog_size) || (s->catalog_bitmap[index] == 0))
 return -1;
 return
 ((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 
512;
-- 
1.9.1




[Qemu-devel] [PATCH 0/3] parallels: additional iotests and a minor bugfix

2014-10-08 Thread Denis V. Lunev
Pls find here test authentic test material, i.e. parallels images
with "WithoutFreeSpace" and "WithouFreSpacExt" signatures created
in authentic way + a minor bug fix for access to non-initialized
memory found by valgrind.

Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 



[Qemu-devel] [PATCH 2/3] iotests: add v2 parallels sample image and simple test for it

2014-10-08 Thread Denis V. Lunev
This is simple test image for the following commit made by me.

commit d25d59802021a747812472780d80a0e792078f40
Author: Denis V. Lunev 
Date:   Mon Jul 28 20:23:55 2014 +0400
parallels: 2TB+ parallels images support

Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 tests/qemu-iotests/076|   5 +
 tests/qemu-iotests/076.out|   4 
 tests/qemu-iotests/sample_images/parallels-v2.bz2 | Bin 0 -> 150 bytes
 3 files changed, 9 insertions(+)
 create mode 100644 tests/qemu-iotests/sample_images/parallels-v2.bz2

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index a300ee2..98d67b3 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -70,6 +70,11 @@ _use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$tracks_offset" "\x00\x00\x00\x00"
 { $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
+echo
+echo "== Read from a valid v2 image =="
+_use_sample_img parallels-v2.bz2
+{ $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index fd26f3c..32ade08 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -15,4 +15,8 @@ no file open, try 'help open'
 == Zero sectors per track ==
 qemu-io: can't open device TEST_DIR/parallels-v1: Invalid image: Zero sectors 
per track
 no file open, try 'help open'
+
+== Read from a valid v2 image ==
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
diff --git a/tests/qemu-iotests/sample_images/parallels-v2.bz2 
b/tests/qemu-iotests/sample_images/parallels-v2.bz2
new file mode 100644
index 
..fd8614d061172faae50a993ac7acd3173de9aa98
GIT binary patch
literal 150
zcmV;H0BQe1T4*^jL0KkKS<`z1ga8U8f6)C%U;t162ml8F2!JYJ)<8f2004jpFaWp=
zU;vl^35GBLOaKJHsVaJ=X*QsGnW)758mCWPt$+@EvggMtTP~K8Aeq(bOlAS_fGVI1
zR{#%`0aSoc2hsqlKqv!50aXBg@8a8e-{1Q8zBk4(ssXG4tO2Y6qyhde

[Qemu-devel] [PATCH 1/3] iotests: replace fake parallels image with authentic one

2014-10-08 Thread Denis V. Lunev
The image was generated using http://openvz.org/Ploop utility and properly
filled with the same content as original one.

Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 tests/qemu-iotests/076  |  10 +-
 tests/qemu-iotests/076.out  |   8 
 tests/qemu-iotests/sample_images/fake.parallels.bz2 | Bin 141 -> 0 bytes
 tests/qemu-iotests/sample_images/parallels-v1.bz2   | Bin 0 -> 147 bytes
 4 files changed, 9 insertions(+), 9 deletions(-)
 delete mode 100644 tests/qemu-iotests/sample_images/fake.parallels.bz2
 create mode 100644 tests/qemu-iotests/sample_images/parallels-v1.bz2

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index b614a7d..a300ee2 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -47,26 +47,26 @@ catalog_entries_offset=$((0x20))
 nb_sectors_offset=$((0x24))
 
 echo
-echo "== Read from a valid (enough) image =="
-_use_sample_img fake.parallels.bz2
+echo "== Read from a valid v1 image =="
+_use_sample_img parallels-v1.bz2
 { $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Negative catalog size =="
-_use_sample_img fake.parallels.bz2
+_use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$catalog_entries_offset" "\xff\xff\xff\xff"
 { $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Overflow in catalog allocation =="
-_use_sample_img fake.parallels.bz2
+_use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$nb_sectors_offset" "\xff\xff\xff\xff"
 poke_file "$TEST_IMG" "$catalog_entries_offset" "\x01\x00\x00\x40"
 { $QEMU_IO -c "read 64M 64M" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Zero sectors per track =="
-_use_sample_img fake.parallels.bz2
+_use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$tracks_offset" "\x00\x00\x00\x00"
 { $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index f7745d8..fd26f3c 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -1,18 +1,18 @@
 QA output created by 076
 
-== Read from a valid (enough) image ==
+== Read from a valid v1 image ==
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 == Negative catalog size ==
-qemu-io: can't open device TEST_DIR/fake.parallels: Catalog too large
+qemu-io: can't open device TEST_DIR/parallels-v1: Catalog too large
 no file open, try 'help open'
 
 == Overflow in catalog allocation ==
-qemu-io: can't open device TEST_DIR/fake.parallels: Catalog too large
+qemu-io: can't open device TEST_DIR/parallels-v1: Catalog too large
 no file open, try 'help open'
 
 == Zero sectors per track ==
-qemu-io: can't open device TEST_DIR/fake.parallels: Invalid image: Zero 
sectors per track
+qemu-io: can't open device TEST_DIR/parallels-v1: Invalid image: Zero sectors 
per track
 no file open, try 'help open'
 *** done
diff --git a/tests/qemu-iotests/sample_images/fake.parallels.bz2 
b/tests/qemu-iotests/sample_images/fake.parallels.bz2
deleted file mode 100644
index 
ffb5f13bac31bc9ab6e1ea5c0cfa26786f2c4cc6..
GIT binary patch
literal 0
HcmV?d1

literal 141
zcmV;80CN9AT4*^jL0KkKS*i&LJ^%_Hf6(xNVE_;S2ml2D2!JYJ)&M{N00969FaWp;
z000b`1pojBOn|7QnnOSv)YEF7cgIVO0ByGSdk7e?fW`f$x`2Bi3t$bd06owJs09G{
vKo+1B1LXi)0CVe)J@eC^zBuEJbFFJA24D=p8Gt*$AL8yvrwS4kK_LggA5<|C

diff --git a/tests/qemu-iotests/sample_images/parallels-v1.bz2 
b/tests/qemu-iotests/sample_images/parallels-v1.bz2
new file mode 100644
index 
..d1ef14205401a8e010d1be68bffeee92ce137d39
GIT binary patch
literal 147
zcmZ>Y%CIzaj8qGbbEBnV0~X
zZfy+=3|x~mCkJ!L1skp^{Ftks!;qfyS}R^xuR}6WLEksaZ~=P@V<pD?kL6rKj5mT56#x^qEyDl+

literal 0
HcmV?d1

-- 
1.9.1




Re: [Qemu-devel] [PATCH 0/3] parallels: additional iotests and a minor bugfix

2014-10-20 Thread Denis V. Lunev

On 08/10/14 13:13, Denis V. Lunev wrote:

Pls find here test authentic test material, i.e. parallels images
with "WithoutFreeSpace" and "WithouFreSpacExt" signatures created
in authentic way + a minor bug fix for access to non-initialized
memory found by valgrind.

Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 

ping



[Qemu-devel] [PATCH 7/9] block/parallels: support padded Parallels images

2014-10-27 Thread Denis V. Lunev
Unfortunately, old guest OSes do not align partitions to page size by
default. This is true for Windows 2003 and Windows XP.

For the time being Parallels was created an optimization for such OSes
in its desktop product. Desktop users are not qualified enough to create
properly aligned installations. Thus Parallels makes a blind guess
on a customer behalf and creates so-called "padded" images if guest
OS type is specified as WinXP, Win2k and Win2k3.

"Padding" is a value which should be added to guest LBA to obtain
sector number inside the image. This results in a shifted images.
   0123offset inside image (in 512 byte sectors)
  +---
  +.012guest data (512 byte sectors)
  +---
The information about this is available in DiskDescriptor.xml ONLY. There
is no such data in the image header.

There share of such images could be evaluated as 6-8% according to the
statistics in my hands.

This patch obtains proper value from XML and applies it on reading.

Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 201c0f1..d07e4f7 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -66,6 +66,7 @@ typedef struct BDRVParallelsState {
 unsigned int tracks;
 
 unsigned int off_multiplier;
+unsigned int padding;
 } BDRVParallelsState;
 
 static int parallels_open_image(BlockDriverState *bs, Error **errp);
@@ -144,6 +145,7 @@ static int parallels_open_xml(BlockDriverState *bs, int 
flags, Error **errp)
 const char *data;
 char image_path[PATH_MAX];
 Error *local_err = NULL;
+BDRVParallelsState *s = bs->opaque;
 
 ret = size = bdrv_getlength(bs->file);
 if (ret < 0) {
@@ -173,6 +175,19 @@ static int parallels_open_xml(BlockDriverState *bs, int 
flags, Error **errp)
 if (root == NULL) {
 goto fail;
 }
+
+data = xml_get_text(root, "/Disk_Parameters/Padding");
+if (data != NULL) {
+char *endptr;
+unsigned long pad;
+
+pad = strtoul(data, &endptr, 0);
+if ((endptr != NULL && *endptr != '\0') || pad > UINT_MAX) {
+goto fail;
+}
+s->padding = (uint32_t)pad;
+}
+
 image = xml_seek(root, "/StorageData/Storage/Image");
 data = ""; /* make gcc happy */
 for (size = 0; image != NULL; image = image->next) {
@@ -385,6 +400,10 @@ static int64_t seek_to_sector(BlockDriverState *bs, 
int64_t sector_num)
 static int parallels_read(BlockDriverState *bs, int64_t sector_num,
 uint8_t *buf, int nb_sectors)
 {
+BDRVParallelsState *s = bs->opaque;
+
+sector_num += s->padding;
+
 while (nb_sectors > 0) {
 int64_t position = seek_to_sector(bs, sector_num);
 if (position >= 0) {
-- 
1.9.1




[Qemu-devel] [PATCH 1/9] iotests: replace fake parallels image with authentic one

2014-10-27 Thread Denis V. Lunev
The image was generated using http://openvz.org/Ploop utility and properly
filled with the same content as original one.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Paolo Bonzini 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 tests/qemu-iotests/076  |  10 +-
 tests/qemu-iotests/076.out  |   8 
 tests/qemu-iotests/sample_images/fake.parallels.bz2 | Bin 141 -> 0 bytes
 tests/qemu-iotests/sample_images/parallels-v1.bz2   | Bin 0 -> 147 bytes
 4 files changed, 9 insertions(+), 9 deletions(-)
 delete mode 100644 tests/qemu-iotests/sample_images/fake.parallels.bz2
 create mode 100644 tests/qemu-iotests/sample_images/parallels-v1.bz2

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index b614a7d..a300ee2 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -47,26 +47,26 @@ catalog_entries_offset=$((0x20))
 nb_sectors_offset=$((0x24))
 
 echo
-echo "== Read from a valid (enough) image =="
-_use_sample_img fake.parallels.bz2
+echo "== Read from a valid v1 image =="
+_use_sample_img parallels-v1.bz2
 { $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Negative catalog size =="
-_use_sample_img fake.parallels.bz2
+_use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$catalog_entries_offset" "\xff\xff\xff\xff"
 { $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Overflow in catalog allocation =="
-_use_sample_img fake.parallels.bz2
+_use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$nb_sectors_offset" "\xff\xff\xff\xff"
 poke_file "$TEST_IMG" "$catalog_entries_offset" "\x01\x00\x00\x40"
 { $QEMU_IO -c "read 64M 64M" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Zero sectors per track =="
-_use_sample_img fake.parallels.bz2
+_use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$tracks_offset" "\x00\x00\x00\x00"
 { $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index f7745d8..fd26f3c 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -1,18 +1,18 @@
 QA output created by 076
 
-== Read from a valid (enough) image ==
+== Read from a valid v1 image ==
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 == Negative catalog size ==
-qemu-io: can't open device TEST_DIR/fake.parallels: Catalog too large
+qemu-io: can't open device TEST_DIR/parallels-v1: Catalog too large
 no file open, try 'help open'
 
 == Overflow in catalog allocation ==
-qemu-io: can't open device TEST_DIR/fake.parallels: Catalog too large
+qemu-io: can't open device TEST_DIR/parallels-v1: Catalog too large
 no file open, try 'help open'
 
 == Zero sectors per track ==
-qemu-io: can't open device TEST_DIR/fake.parallels: Invalid image: Zero 
sectors per track
+qemu-io: can't open device TEST_DIR/parallels-v1: Invalid image: Zero sectors 
per track
 no file open, try 'help open'
 *** done
diff --git a/tests/qemu-iotests/sample_images/fake.parallels.bz2 
b/tests/qemu-iotests/sample_images/fake.parallels.bz2
deleted file mode 100644
index 
ffb5f13bac31bc9ab6e1ea5c0cfa26786f2c4cc6..
GIT binary patch
literal 0
HcmV?d1

literal 141
zcmV;80CN9AT4*^jL0KkKS*i&LJ^%_Hf6(xNVE_;S2ml2D2!JYJ)&M{N00969FaWp;
z000b`1pojBOn|7QnnOSv)YEF7cgIVO0ByGSdk7e?fW`f$x`2Bi3t$bd06owJs09G{
vKo+1B1LXi)0CVe)J@eC^zBuEJbFFJA24D=p8Gt*$AL8yvrwS4kK_LggA5<|C

diff --git a/tests/qemu-iotests/sample_images/parallels-v1.bz2 
b/tests/qemu-iotests/sample_images/parallels-v1.bz2
new file mode 100644
index 
..d1ef14205401a8e010d1be68bffeee92ce137d39
GIT binary patch
literal 147
zcmZ>Y%CIzaj8qGbbEBnV0~X
zZfy+=3|x~mCkJ!L1skp^{Ftks!;qfyS}R^xuR}6WLEksaZ~=P@V<pD?kL6rKj5mT56#x^qEyDl+

literal 0
HcmV?d1

-- 
1.9.1




[Qemu-devel] [PATCH 4/9] configure: add dependency from libxml2

2014-10-27 Thread Denis V. Lunev
This dependency is required for adequate Parallels images support.
Typically the disk consists of several images which are glued by
XML disk descriptor. Also XML hides inside several important parameters
which are not available in the image header.

The patch also adds clause to checkpatch.pl to understand libxml2 types.
In the other case there is the following false positive:
ERROR: need consistent spacing around '*' (ctx:WxB)
#210: FILE: block/parallels.c:232:
+   !xmlStrcmp(root->name, (const xmlChar *)"Parallels_disk_image"))

Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Paolo Bonzini 
---
 configure | 29 +
 scripts/checkpatch.pl |  1 +
 2 files changed, 30 insertions(+)

diff --git a/configure b/configure
index a9e4d49..5426752 100755
--- a/configure
+++ b/configure
@@ -335,6 +335,7 @@ libssh2=""
 vhdx=""
 quorum=""
 numa=""
+libxml2=""
 
 # parse CC options first
 for opt do
@@ -1129,6 +1130,10 @@ for opt do
   ;;
   --enable-numa) numa="yes"
   ;;
+  --disable-libxml2) libxml2="no"
+  ;;
+  --enable-libxml2) libxml2="yes"
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1400,6 +1405,8 @@ Advanced options (experts only):
   --enable-quorum  enable quorum block filter support
   --disable-numa   disable libnuma support
   --enable-numaenable libnuma support
+  --disable-libxml2disable libxml2 (for Parallels image format)
+  --enable-libxml2 enable libxml2 (for Parallels image format)
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -4097,6 +4104,23 @@ if test -z "$zero_malloc" ; then
 fi
 fi
 
+# check for libxml2
+if test "$libxml2" != "no" ; then
+if $pkg_config --exists libxml-2.0; then
+libxml2="yes"
+libxml2_cflags=$($pkg_config --cflags libxml-2.0)
+libxml2_libs=$($pkg_config --libs libxml-2.0)
+QEMU_CFLAGS="$QEMU_CFLAGS $libxml2_cflags"
+libs_softmmu="$libs_softmmu $libxml2_libs"
+libs_tools="$libs_tools $libxml2_libs"
+else
+if test "$libxml2" = "yes"; then
+feature_not_found "libxml2" "Install libxml2 devel"
+fi
+libxml2="no"
+fi
+fi
+
 # Now we've finished running tests it's OK to add -Werror to the compiler flags
 if test "$werror" = "yes"; then
 QEMU_CFLAGS="-Werror $QEMU_CFLAGS"
@@ -4340,6 +4364,7 @@ echo "Quorum$quorum"
 echo "lzo support   $lzo"
 echo "snappy support$snappy"
 echo "NUMA host support $numa"
+echo "libxml2   $libxml2"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -4846,6 +4871,10 @@ if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
 
+if test "$libxml2" = "yes" ; then
+  echo "CONFIG_LIBXML2=y" >> $config_host_mak
+fi
+
 # Hold two types of flag:
 #   CONFIG_THREAD_SETNAME_BYTHREAD  - we've got a way of setting the name on
 # a thread we have a handle to
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 053e432..3ddb097 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -242,6 +242,7 @@ our @typeList = (
qr{${Ident}_t},
qr{${Ident}_handler},
qr{${Ident}_handler_fn},
+   qr{xml${Ident}},
 );
 our @modifierList = (
qr{fastcall},
-- 
1.9.1




[Qemu-devel] [PATCH 6/9] iotests: simple parallels XML disk descriptor file test added

2014-10-27 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 tests/qemu-iotests/076|   6 ++
 tests/qemu-iotests/076.out|   4 
 tests/qemu-iotests/sample_images/parallels-simple.xml.bz2 | Bin 0 -> 374 bytes
 3 files changed, 10 insertions(+)
 create mode 100644 tests/qemu-iotests/sample_images/parallels-simple.xml.bz2

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index 98d67b3..8474fdd 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -75,6 +75,12 @@ echo "== Read from a valid v2 image =="
 _use_sample_img parallels-v2.bz2
 { $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
+echo
+echo "== Read from a valid v2 image opened through xml =="
+_use_sample_img parallels-v2.bz2
+_use_sample_img parallels-simple.xml.bz2
+{ $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index 32ade08..628d9bf 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -19,4 +19,8 @@ no file open, try 'help open'
 == Read from a valid v2 image ==
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Read from a valid v2 image opened through xml ==
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
diff --git a/tests/qemu-iotests/sample_images/parallels-simple.xml.bz2 
b/tests/qemu-iotests/sample_images/parallels-simple.xml.bz2
new file mode 100644
index 
..13760bc9f30246ba01f43cdca3d3d90780430f61
GIT binary patch
literal 374
zcmV-+0g3)XT4*^jL0KkKSx`RPoB#ls-+)w5Py_$xpWsdazwh08Faddp1^@s613&-(
z00UFVg-_KD8fY4NVup`X5ukcZBTOVwCXAT_OhRY?0}wJ9sXCPa+IAr*0!sLV#k$+r
zFRIp6TJ>A%(u7Qkn2MJtX_(TWtce9REA{g&qLoa&l#2??q)aKM7Qx|d6glC9JSZw~
z-{Dy+9yLxWKg_2Xm02klNIFQqySl|-EJ?WkLWzOBm`Xui7_);kgut`D^IdcIOOiAVM)WoC{=C8L4k%Ybsug63k@+f`frkh@_Ij10W;xr->@K!oLkGw3Lv$
zS&-sh3*_e*@-+(JnDfSD#7GS;nh3VYlxtRDgM

[Qemu-devel] [PATCH 2/9] iotests: add v2 parallels sample image and simple test for it

2014-10-27 Thread Denis V. Lunev
This is simple test image for the following commit made by me.

commit d25d59802021a747812472780d80a0e792078f40
Author: Denis V. Lunev 
Date:   Mon Jul 28 20:23:55 2014 +0400
parallels: 2TB+ parallels images support

Signed-off-by: Denis V. Lunev 
Reviewed-by: Paolo Bonzini 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 tests/qemu-iotests/076|   5 +
 tests/qemu-iotests/076.out|   4 
 tests/qemu-iotests/sample_images/parallels-v2.bz2 | Bin 0 -> 150 bytes
 3 files changed, 9 insertions(+)
 create mode 100644 tests/qemu-iotests/sample_images/parallels-v2.bz2

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index a300ee2..98d67b3 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -70,6 +70,11 @@ _use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$tracks_offset" "\x00\x00\x00\x00"
 { $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
+echo
+echo "== Read from a valid v2 image =="
+_use_sample_img parallels-v2.bz2
+{ $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index fd26f3c..32ade08 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -15,4 +15,8 @@ no file open, try 'help open'
 == Zero sectors per track ==
 qemu-io: can't open device TEST_DIR/parallels-v1: Invalid image: Zero sectors 
per track
 no file open, try 'help open'
+
+== Read from a valid v2 image ==
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
diff --git a/tests/qemu-iotests/sample_images/parallels-v2.bz2 
b/tests/qemu-iotests/sample_images/parallels-v2.bz2
new file mode 100644
index 
..fd8614d061172faae50a993ac7acd3173de9aa98
GIT binary patch
literal 150
zcmV;H0BQe1T4*^jL0KkKS<`z1ga8U8f6)C%U;t162ml8F2!JYJ)<8f2004jpFaWp=
zU;vl^35GBLOaKJHsVaJ=X*QsGnW)758mCWPt$+@EvggMtTP~K8Aeq(bOlAS_fGVI1
zR{#%`0aSoc2hsqlKqv!50aXBg@8a8e-{1Q8zBk4(ssXG4tO2Y6qyhde

[Qemu-devel] [PATCH 5/9] block/parallels: allow to specify DiskDescriptor.xml instead of image file

2014-10-27 Thread Denis V. Lunev
Typically Parallels disk bundle consists of several images which are
glued by XML disk descriptor. Also XML hides inside several important
parameters which are not available in the image header.

This patch allows to specify this XML file as a filename for an image
to open. It is allowed only to open Compressed images with the only
snapshot inside. No additional options are parsed at the moment.

The code itself is dumb enough for a while. If XML file is specified,
the file is parsed and the image is reopened as bs->file to keep the
rest of the driver untouched. This would be changed later with more
features added.

Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 231 --
 1 file changed, 226 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 4f9cd8d..201c0f1 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -27,6 +27,11 @@
 #include "block/block_int.h"
 #include "qemu/module.h"
 
+#if CONFIG_LIBXML2
+#include 
+#include 
+#endif
+
 /**/
 
 #define HEADER_MAGIC "WithoutFreeSpace"
@@ -34,6 +39,10 @@
 #define HEADER_VERSION 2
 #define HEADER_SIZE 64
 
+#define PARALLELS_XML   100
+#define PARALLELS_IMAGE 101
+
+
 // always little-endian
 struct parallels_header {
 char magic[16]; // "WithoutFreeSpace"
@@ -59,6 +68,194 @@ typedef struct BDRVParallelsState {
 unsigned int off_multiplier;
 } BDRVParallelsState;
 
+static int parallels_open_image(BlockDriverState *bs, Error **errp);
+
+#if CONFIG_LIBXML2
+static xmlNodePtr xml_find(xmlNode *node, const char *elem)
+{
+xmlNode *child;
+
+for (child = node->xmlChildrenNode; child != NULL; child = child->next) {
+if (!xmlStrcmp(child->name, (const xmlChar *)elem) &&
+child->type == XML_ELEMENT_NODE) {
+return child;
+}
+}
+return NULL;
+}
+
+static xmlNodePtr xml_seek(xmlNode *root, const char *elem)
+{
+xmlNode *child = root;
+const char *path;
+char nodename[128];
+int last = 0;
+
+path = elem;
+if (path[0] == '/') {
+path++;
+}
+if (path[0] == 0) {
+return NULL;
+}
+while (!last) {
+const char *p = strchr(path, '/');
+int length;
+if (p == NULL) {
+length = strlen(path);
+last = 1;
+} else {
+length = p - path;
+}
+memcpy(nodename, path, length);
+nodename[length] = 0;
+child = xml_find(child, nodename);
+if (child == NULL) {
+return NULL;
+}
+path = ++p;
+}
+return child;
+}
+
+static const char *xml_get_text(xmlNode *node, const char *name)
+{
+xmlNode *child;
+
+node = xml_seek(node, name);
+if (node == NULL) {
+return NULL;
+}
+
+for (child = node->xmlChildrenNode; child; child = child->next) {
+if (child->type == XML_TEXT_NODE) {
+return (const char *)child->content;
+}
+}
+return NULL;
+}
+
+static int parallels_open_xml(BlockDriverState *bs, int flags, Error **errp)
+{
+int size, ret;
+xmlDoc *doc = NULL;
+xmlNode *root, *image;
+char *xml = NULL;
+const char *data;
+char image_path[PATH_MAX];
+Error *local_err = NULL;
+
+ret = size = bdrv_getlength(bs->file);
+if (ret < 0) {
+goto fail;
+}
+/* XML file size should be resonable */
+ret = -EFBIG;
+if (size > 65536) {
+goto fail;
+}
+
+xml = g_malloc(size + 1);
+
+ret = bdrv_pread(bs->file, 0, xml, size);
+if (ret != size) {
+goto fail;
+}
+xml[size] = 0;
+
+ret = -EINVAL;
+doc = xmlReadMemory(xml, size, NULL, NULL,
+XML_PARSE_NOERROR | XML_PARSE_NOWARNING);
+if (doc == NULL) {
+goto fail;
+}
+root = xmlDocGetRootElement(doc);
+if (root == NULL) {
+goto fail;
+}
+image = xml_seek(root, "/StorageData/Storage/Image");
+data = ""; /* make gcc happy */
+for (size = 0; image != NULL; image = image->next) {
+if (image->type != XML_ELEMENT_NODE) {
+continue;
+}
+
+size++;
+data = xml_get_text(image, "Type");
+if (data != NULL && strcmp(data, "Compressed")) {
+error_setg(errp, "Only compressed Parallels images are supported");
+goto done;
+}
+
+data = xml_get_text(image, "File");
+if (data == NULL) {
+goto fail;
+}
+}
+/* Images with more than 1 snapshots are not supported at the moment */
+if (size != 1) {
+error_setg(errp, "Parallels images with sn

[Qemu-devel] [PATCH 0/9] parallels format support improvements

2014-10-27 Thread Denis V. Lunev
This patchset contains my previous patchset sent originally 08.10.2014
with minor Valgrind fix and iotests update as a base.

The patchset implements additional compatibility bits for Parallels
format:
- initial support of parsing of Parallels DiskDeskriptor.xml
  Typically Parallels disk bundle consists of several images which are
  glued by XML disk descriptor. Also XML hides inside several important
  parameters which are not available in the image header.
- support for padded Parallels images.
  For the time being Parallels was created an optimization for such OSes
  in its desktop product. Desktop users are not qualified enough to create
  properly aligned installations. Thus Parallels makes a blind guess
  on a customer behalf and creates so-called "padded" images if guest
  OS type is specified as WinXP, Win2k and Win2k3.

The code uses approach from VMDK support, either image or XML descriptor
could be used. Though there is temporary hack in the opening code:
BlockDriverState->file is being reopened inside parallels_open. I prefer
to keep this code in this state till proper Parallels snapshots support
in order to minimize current changes.

CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Roman Kagan 




[Qemu-devel] [PATCH 3/9] block/parallels: fix access to not initialized memory in catalog_bitmap

2014-10-27 Thread Denis V. Lunev
found by valgrind.

Command: ./qemu-img convert -f parallels -O qcow2 1.hds 1.img
Invalid read of size 4
   at 0x17D0EF: parallels_co_read (parallels.c:357)
   by 0x11FEE4: bdrv_aio_rw_vector (block.c:4640)
   by 0x11FFBF: bdrv_aio_readv_em (block.c:4652)
   by 0x11F55F: bdrv_co_readv_em (block.c:4862)
   by 0x123428: bdrv_aligned_preadv (block.c:3056)
   by 0x1239FA: bdrv_co_do_preadv (block.c:3162)
   by 0x125424: bdrv_rw_co_entry (block.c:2706)
   by 0x155DD9: coroutine_trampoline (coroutine-ucontext.c:118)
   by 0x6975B6F: ??? (in /lib/x86_64-linux-gnu/libc-2.19.so)

The problem is that s->catalog_bitmap is allocated/filled as
gmalloc(s->catalog_size) thus index validity check must be
inclusive, i.e. index >= s->catalog_size is invalid.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Paolo Bonzini 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index 2a814f3..4f9cd8d 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -155,7 +155,7 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
 offset = sector_num % s->tracks;
 
 /* not allocated */
-if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
+if ((index >= s->catalog_size) || (s->catalog_bitmap[index] == 0))
 return -1;
 return
 ((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 
512;
-- 
1.9.1




[Qemu-devel] [PATCH 9/9] parallels: change copyright information in the image header

2014-10-27 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index d07e4f7..1491211 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -2,8 +2,10 @@
  * Block driver for Parallels disk image format
  *
  * Copyright (c) 2007 Alex Beregszaszi
+ * Copyright (c) 2014 Denis V. Lunev 
  *
- * This code is based on comparing different disk images created by Parallels.
+ * This code was originally based on comparing different disk images created
+ * by Parallels.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
-- 
1.9.1




[Qemu-devel] [PATCH 8/9] iotests: padded parallels image test

2014-10-27 Thread Denis V. Lunev
Unfortunately, old guest OSes do not align partitions to page size by
default. This is true for Windows 2003 and Windows XP.

"Padding" is a value which should be added to guest LBA to obtain
sector number inside the image. This results in a shifted images.
   0123offset inside image (in 512 byte sectors)
  +---
  +.012guest data (512 byte sectors)
  +---
The information about this is available in DiskDescriptor.xml ONLY. There
is no such data in the image header.

This patch contains very simple image with padding and corresponding
XML disk descriptor created in authentic way.

Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 tests/qemu-iotests/076|   6 ++
 tests/qemu-iotests/076.out|   4 
 tests/qemu-iotests/sample_images/parallels-padded.xml.bz2 | Bin 0 -> 377 bytes
 tests/qemu-iotests/sample_images/parallels-v2-padded.bz2  | Bin 0 -> 139 bytes
 4 files changed, 10 insertions(+)
 create mode 100644 tests/qemu-iotests/sample_images/parallels-padded.xml.bz2
 create mode 100644 tests/qemu-iotests/sample_images/parallels-v2-padded.bz2

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index 8474fdd..fb2b41d 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -81,6 +81,12 @@ _use_sample_img parallels-v2.bz2
 _use_sample_img parallels-simple.xml.bz2
 { $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
+echo
+echo "== Read from a valid v2 image opened through xml with padding =="
+_use_sample_img parallels-v2-padded.bz2
+_use_sample_img parallels-padded.xml.bz2
+{ $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index 628d9bf..46680d8 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -23,4 +23,8 @@ read 65536/65536 bytes at offset 0
 == Read from a valid v2 image opened through xml ==
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Read from a valid v2 image opened through xml with padding ==
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
diff --git a/tests/qemu-iotests/sample_images/parallels-padded.xml.bz2 
b/tests/qemu-iotests/sample_images/parallels-padded.xml.bz2
new file mode 100644
index 
..90e481b88e1f4d65256d0511a8509114c6358052
GIT binary patch
literal 377
zcmV-<0fzoUT4*^jL0KkKS+JP`&Hw4nhyVtdfq>MRN`P%Vs7e5mydg1cvP<`s
zYo%7Ss&>_C5i%-bDqP)#HmEBiP+TU)8WyoiurHMw!l_ZF1>8_3m3N^^QwEd8=8hkN
zLO3;VtVoT0C3B518KN#oIV4@)7DfOSbkchYBQVO$9`K;#(G?m(FfikRxR&q(!eB`z
zPfQaAWzYbTAj%3PZH#S=y0Cm|7Ih97TD6vNg%Klm)F8;i7;OxS)eka
zB8A?w-LbwRX=_?uTcBnnP&Yyj78F)siB

literal 0
HcmV?d1

-- 
1.9.1




[Qemu-devel] [PATCH 3/6] iotests: simple parallels XML disk descriptor file test added

2014-10-29 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 tests/qemu-iotests/076|   6 ++
 tests/qemu-iotests/076.out|   4 
 tests/qemu-iotests/sample_images/parallels-simple.xml.bz2 | Bin 0 -> 374 bytes
 3 files changed, 10 insertions(+)
 create mode 100644 tests/qemu-iotests/sample_images/parallels-simple.xml.bz2

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index ed2be35..242d0c3 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -75,6 +75,12 @@ echo "== Read from a valid v2 image =="
 _use_sample_img parallels-v2.bz2
 { $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
+echo
+echo "== Read from a valid v2 image opened through xml =="
+_use_sample_img parallels-v2.bz2
+_use_sample_img parallels-simple.xml.bz2
+{ $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index 32ade08..628d9bf 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -19,4 +19,8 @@ no file open, try 'help open'
 == Read from a valid v2 image ==
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Read from a valid v2 image opened through xml ==
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
diff --git a/tests/qemu-iotests/sample_images/parallels-simple.xml.bz2 
b/tests/qemu-iotests/sample_images/parallels-simple.xml.bz2
new file mode 100644
index 
..13760bc9f30246ba01f43cdca3d3d90780430f61
GIT binary patch
literal 374
zcmV-+0g3)XT4*^jL0KkKSx`RPoB#ls-+)w5Py_$xpWsdazwh08Faddp1^@s613&-(
z00UFVg-_KD8fY4NVup`X5ukcZBTOVwCXAT_OhRY?0}wJ9sXCPa+IAr*0!sLV#k$+r
zFRIp6TJ>A%(u7Qkn2MJtX_(TWtce9REA{g&qLoa&l#2??q)aKM7Qx|d6glC9JSZw~
z-{Dy+9yLxWKg_2Xm02klNIFQqySl|-EJ?WkLWzOBm`Xui7_);kgut`D^IdcIOOiAVM)WoC{=C8L4k%Ybsug63k@+f`frkh@_Ij10W;xr->@K!oLkGw3Lv$
zS&-sh3*_e*@-+(JnDfSD#7GS;nh3VYlxtRDgM

[Qemu-devel] [PATCH 1/6] configure: add dependency from libxml2

2014-10-29 Thread Denis V. Lunev
This dependency is required for adequate Parallels images support.
Typically the disk consists of several images which are glued by
XML disk descriptor. Also XML hides inside several important parameters
which are not available in the image header.

The patch also adds clause to checkpatch.pl to understand libxml2 types.
In the other case there is the following false positive:
ERROR: need consistent spacing around '*' (ctx:WxB)
#210: FILE: block/parallels.c:232:
+   !xmlStrcmp(root->name, (const xmlChar *)"Parallels_disk_image"))

Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Paolo Bonzini 
---
 configure | 29 +
 scripts/checkpatch.pl |  1 +
 2 files changed, 30 insertions(+)

diff --git a/configure b/configure
index a9e4d49..5426752 100755
--- a/configure
+++ b/configure
@@ -335,6 +335,7 @@ libssh2=""
 vhdx=""
 quorum=""
 numa=""
+libxml2=""
 
 # parse CC options first
 for opt do
@@ -1129,6 +1130,10 @@ for opt do
   ;;
   --enable-numa) numa="yes"
   ;;
+  --disable-libxml2) libxml2="no"
+  ;;
+  --enable-libxml2) libxml2="yes"
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1400,6 +1405,8 @@ Advanced options (experts only):
   --enable-quorum  enable quorum block filter support
   --disable-numa   disable libnuma support
   --enable-numaenable libnuma support
+  --disable-libxml2disable libxml2 (for Parallels image format)
+  --enable-libxml2 enable libxml2 (for Parallels image format)
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -4097,6 +4104,23 @@ if test -z "$zero_malloc" ; then
 fi
 fi
 
+# check for libxml2
+if test "$libxml2" != "no" ; then
+if $pkg_config --exists libxml-2.0; then
+libxml2="yes"
+libxml2_cflags=$($pkg_config --cflags libxml-2.0)
+libxml2_libs=$($pkg_config --libs libxml-2.0)
+QEMU_CFLAGS="$QEMU_CFLAGS $libxml2_cflags"
+libs_softmmu="$libs_softmmu $libxml2_libs"
+libs_tools="$libs_tools $libxml2_libs"
+else
+if test "$libxml2" = "yes"; then
+feature_not_found "libxml2" "Install libxml2 devel"
+fi
+libxml2="no"
+fi
+fi
+
 # Now we've finished running tests it's OK to add -Werror to the compiler flags
 if test "$werror" = "yes"; then
 QEMU_CFLAGS="-Werror $QEMU_CFLAGS"
@@ -4340,6 +4364,7 @@ echo "Quorum$quorum"
 echo "lzo support   $lzo"
 echo "snappy support$snappy"
 echo "NUMA host support $numa"
+echo "libxml2   $libxml2"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -4846,6 +4871,10 @@ if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
 
+if test "$libxml2" = "yes" ; then
+  echo "CONFIG_LIBXML2=y" >> $config_host_mak
+fi
+
 # Hold two types of flag:
 #   CONFIG_THREAD_SETNAME_BYTHREAD  - we've got a way of setting the name on
 # a thread we have a handle to
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 053e432..3ddb097 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -242,6 +242,7 @@ our @typeList = (
qr{${Ident}_t},
qr{${Ident}_handler},
qr{${Ident}_handler_fn},
+   qr{xml${Ident}},
 );
 our @modifierList = (
qr{fastcall},
-- 
1.9.1




[Qemu-devel] [PATCH 2/6] block/parallels: allow to specify DiskDescriptor.xml instead of image file

2014-10-29 Thread Denis V. Lunev
Typically Parallels disk bundle consists of several images which are
glued by XML disk descriptor. Also XML hides inside several important
parameters which are not available in the image header.

This patch allows to specify this XML file as a filename for an image
to open. It is allowed only to open Compressed images with the only
snapshot inside. No additional options are parsed at the moment.

The code itself is dumb enough for a while. If XML file is specified,
the file is parsed and the image is reopened as bs->file to keep the
rest of the driver untouched. This would be changed later with more
features added.

Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 231 --
 1 file changed, 226 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 4f9cd8d..201c0f1 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -27,6 +27,11 @@
 #include "block/block_int.h"
 #include "qemu/module.h"
 
+#if CONFIG_LIBXML2
+#include 
+#include 
+#endif
+
 /**/
 
 #define HEADER_MAGIC "WithoutFreeSpace"
@@ -34,6 +39,10 @@
 #define HEADER_VERSION 2
 #define HEADER_SIZE 64
 
+#define PARALLELS_XML   100
+#define PARALLELS_IMAGE 101
+
+
 // always little-endian
 struct parallels_header {
 char magic[16]; // "WithoutFreeSpace"
@@ -59,6 +68,194 @@ typedef struct BDRVParallelsState {
 unsigned int off_multiplier;
 } BDRVParallelsState;
 
+static int parallels_open_image(BlockDriverState *bs, Error **errp);
+
+#if CONFIG_LIBXML2
+static xmlNodePtr xml_find(xmlNode *node, const char *elem)
+{
+xmlNode *child;
+
+for (child = node->xmlChildrenNode; child != NULL; child = child->next) {
+if (!xmlStrcmp(child->name, (const xmlChar *)elem) &&
+child->type == XML_ELEMENT_NODE) {
+return child;
+}
+}
+return NULL;
+}
+
+static xmlNodePtr xml_seek(xmlNode *root, const char *elem)
+{
+xmlNode *child = root;
+const char *path;
+char nodename[128];
+int last = 0;
+
+path = elem;
+if (path[0] == '/') {
+path++;
+}
+if (path[0] == 0) {
+return NULL;
+}
+while (!last) {
+const char *p = strchr(path, '/');
+int length;
+if (p == NULL) {
+length = strlen(path);
+last = 1;
+} else {
+length = p - path;
+}
+memcpy(nodename, path, length);
+nodename[length] = 0;
+child = xml_find(child, nodename);
+if (child == NULL) {
+return NULL;
+}
+path = ++p;
+}
+return child;
+}
+
+static const char *xml_get_text(xmlNode *node, const char *name)
+{
+xmlNode *child;
+
+node = xml_seek(node, name);
+if (node == NULL) {
+return NULL;
+}
+
+for (child = node->xmlChildrenNode; child; child = child->next) {
+if (child->type == XML_TEXT_NODE) {
+return (const char *)child->content;
+}
+}
+return NULL;
+}
+
+static int parallels_open_xml(BlockDriverState *bs, int flags, Error **errp)
+{
+int size, ret;
+xmlDoc *doc = NULL;
+xmlNode *root, *image;
+char *xml = NULL;
+const char *data;
+char image_path[PATH_MAX];
+Error *local_err = NULL;
+
+ret = size = bdrv_getlength(bs->file);
+if (ret < 0) {
+goto fail;
+}
+/* XML file size should be resonable */
+ret = -EFBIG;
+if (size > 65536) {
+goto fail;
+}
+
+xml = g_malloc(size + 1);
+
+ret = bdrv_pread(bs->file, 0, xml, size);
+if (ret != size) {
+goto fail;
+}
+xml[size] = 0;
+
+ret = -EINVAL;
+doc = xmlReadMemory(xml, size, NULL, NULL,
+XML_PARSE_NOERROR | XML_PARSE_NOWARNING);
+if (doc == NULL) {
+goto fail;
+}
+root = xmlDocGetRootElement(doc);
+if (root == NULL) {
+goto fail;
+}
+image = xml_seek(root, "/StorageData/Storage/Image");
+data = ""; /* make gcc happy */
+for (size = 0; image != NULL; image = image->next) {
+if (image->type != XML_ELEMENT_NODE) {
+continue;
+}
+
+size++;
+data = xml_get_text(image, "Type");
+if (data != NULL && strcmp(data, "Compressed")) {
+error_setg(errp, "Only compressed Parallels images are supported");
+goto done;
+}
+
+data = xml_get_text(image, "File");
+if (data == NULL) {
+goto fail;
+}
+}
+/* Images with more than 1 snapshots are not supported at the moment */
+if (size != 1) {
+error_setg(errp, "Parallels images with sn

[Qemu-devel] [PATCH 6/6] parallels: change copyright information in the image header

2014-10-29 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index d07e4f7..1491211 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -2,8 +2,10 @@
  * Block driver for Parallels disk image format
  *
  * Copyright (c) 2007 Alex Beregszaszi
+ * Copyright (c) 2014 Denis V. Lunev 
  *
- * This code is based on comparing different disk images created by Parallels.
+ * This code was originally based on comparing different disk images created
+ * by Parallels.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
-- 
1.9.1




[Qemu-devel] [PATCH 4/6] block/parallels: support padded Parallels images

2014-10-29 Thread Denis V. Lunev
Unfortunately, old guest OSes do not align partitions to page size by
default. This is true for Windows 2003 and Windows XP.

For the time being Parallels was created an optimization for such OSes
in its desktop product. Desktop users are not qualified enough to create
properly aligned installations. Thus Parallels makes a blind guess
on a customer behalf and creates so-called "padded" images if guest
OS type is specified as WinXP, Win2k and Win2k3.

"Padding" is a value which should be added to guest LBA to obtain
sector number inside the image. This results in a shifted images.
   0123offset inside image (in 512 byte sectors)
  +---
  +.012guest data (512 byte sectors)
  +---
The information about this is available in DiskDescriptor.xml ONLY. There
is no such data in the image header.

There share of such images could be evaluated as 6-8% according to the
statistics in my hands.

This patch obtains proper value from XML and applies it on reading.

Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 201c0f1..d07e4f7 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -66,6 +66,7 @@ typedef struct BDRVParallelsState {
 unsigned int tracks;
 
 unsigned int off_multiplier;
+unsigned int padding;
 } BDRVParallelsState;
 
 static int parallels_open_image(BlockDriverState *bs, Error **errp);
@@ -144,6 +145,7 @@ static int parallels_open_xml(BlockDriverState *bs, int 
flags, Error **errp)
 const char *data;
 char image_path[PATH_MAX];
 Error *local_err = NULL;
+BDRVParallelsState *s = bs->opaque;
 
 ret = size = bdrv_getlength(bs->file);
 if (ret < 0) {
@@ -173,6 +175,19 @@ static int parallels_open_xml(BlockDriverState *bs, int 
flags, Error **errp)
 if (root == NULL) {
 goto fail;
 }
+
+data = xml_get_text(root, "/Disk_Parameters/Padding");
+if (data != NULL) {
+char *endptr;
+unsigned long pad;
+
+pad = strtoul(data, &endptr, 0);
+if ((endptr != NULL && *endptr != '\0') || pad > UINT_MAX) {
+goto fail;
+}
+s->padding = (uint32_t)pad;
+}
+
 image = xml_seek(root, "/StorageData/Storage/Image");
 data = ""; /* make gcc happy */
 for (size = 0; image != NULL; image = image->next) {
@@ -385,6 +400,10 @@ static int64_t seek_to_sector(BlockDriverState *bs, 
int64_t sector_num)
 static int parallels_read(BlockDriverState *bs, int64_t sector_num,
 uint8_t *buf, int nb_sectors)
 {
+BDRVParallelsState *s = bs->opaque;
+
+sector_num += s->padding;
+
 while (nb_sectors > 0) {
 int64_t position = seek_to_sector(bs, sector_num);
 if (position >= 0) {
-- 
1.9.1




[Qemu-devel] [PATCH v2 0/6] parallels format support improvements

2014-10-29 Thread Denis V. Lunev
The patchset implements additional compatibility bits for Parallels
format:
- initial support of parsing of Parallels DiskDeskriptor.xml
  Typically Parallels disk bundle consists of several images which are
  glued by XML disk descriptor. Also XML hides inside several important
  parameters which are not available in the image header.
- support for padded Parallels images.
  For the time being Parallels was created an optimization for such OSes
  in its desktop product. Desktop users are not qualified enough to create
  properly aligned installations. Thus Parallels makes a blind guess
  on a customer behalf and creates so-called "padded" images if guest
  OS type is specified as WinXP, Win2k and Win2k3.

The code uses approach from VMDK support, either image or XML descriptor
could be used. Though there is temporary hack in the opening code:
BlockDriverState->file is being reopened inside parallels_open. I prefer
to keep this code in this state till proper Parallels snapshots support
in order to minimize current changes.

Changes from v1:
- dropped already merged part (original patches 1-3)

CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Roman Kagan 




[Qemu-devel] [PATCH 5/6] iotests: padded parallels image test

2014-10-29 Thread Denis V. Lunev
Unfortunately, old guest OSes do not align partitions to page size by
default. This is true for Windows 2003 and Windows XP.

"Padding" is a value which should be added to guest LBA to obtain
sector number inside the image. This results in a shifted images.
   0123offset inside image (in 512 byte sectors)
  +---
  +.012guest data (512 byte sectors)
  +---
The information about this is available in DiskDescriptor.xml ONLY. There
is no such data in the image header.

This patch contains very simple image with padding and corresponding
XML disk descriptor created in authentic way.

Signed-off-by: Denis V. Lunev 
Acked-by: Roman Kagan 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 tests/qemu-iotests/076|   6 ++
 tests/qemu-iotests/076.out|   4 
 tests/qemu-iotests/sample_images/parallels-padded.xml.bz2 | Bin 0 -> 377 bytes
 tests/qemu-iotests/sample_images/parallels-v2-padded.bz2  | Bin 0 -> 139 bytes
 4 files changed, 10 insertions(+)
 create mode 100644 tests/qemu-iotests/sample_images/parallels-padded.xml.bz2
 create mode 100644 tests/qemu-iotests/sample_images/parallels-v2-padded.bz2

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index 242d0c3..c83b953 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -81,6 +81,12 @@ _use_sample_img parallels-v2.bz2
 _use_sample_img parallels-simple.xml.bz2
 { $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
+echo
+echo "== Read from a valid v2 image opened through xml with padding =="
+_use_sample_img parallels-v2-padded.bz2
+_use_sample_img parallels-padded.xml.bz2
+{ $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index 628d9bf..46680d8 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -23,4 +23,8 @@ read 65536/65536 bytes at offset 0
 == Read from a valid v2 image opened through xml ==
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Read from a valid v2 image opened through xml with padding ==
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
diff --git a/tests/qemu-iotests/sample_images/parallels-padded.xml.bz2 
b/tests/qemu-iotests/sample_images/parallels-padded.xml.bz2
new file mode 100644
index 
..90e481b88e1f4d65256d0511a8509114c6358052
GIT binary patch
literal 377
zcmV-<0fzoUT4*^jL0KkKS+JP`&Hw4nhyVtdfq>MRN`P%Vs7e5mydg1cvP<`s
zYo%7Ss&>_C5i%-bDqP)#HmEBiP+TU)8WyoiurHMw!l_ZF1>8_3m3N^^QwEd8=8hkN
zLO3;VtVoT0C3B518KN#oIV4@)7DfOSbkchYBQVO$9`K;#(G?m(FfikRxR&q(!eB`z
zPfQaAWzYbTAj%3PZH#S=y0Cm|7Ih97TD6vNg%Klm)F8;i7;OxS)eka
zB8A?w-LbwRX=_?uTcBnnP&Yyj78F)siB

literal 0
HcmV?d1

-- 
1.9.1




[Qemu-devel] [PATCH 3/4] block/parallels: split check for parallels format in parallels_open

2014-07-22 Thread Denis V. Lunev
and rework error path a bit. There is no difference at the moment, but
the code will be definitely shorter when additional processing will
be required for WithouFreSpacExt

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 8f9ec8a..02739cf 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -85,12 +85,10 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
-if (memcmp(ph.magic, HEADER_MAGIC, 16) ||
-(le32_to_cpu(ph.version) != HEADER_VERSION)) {
-error_setg(errp, "Image not in Parallels format");
-ret = -EINVAL;
-goto fail;
-}
+if (le32_to_cpu(ph.version) != HEADER_VERSION)
+goto fail_format;
+if (memcmp(ph.magic, HEADER_MAGIC, 16))
+goto fail_format;
 
 bs->total_sectors = (uint32_t)le64_to_cpu(ph.nb_sectors);
 
@@ -120,6 +118,9 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 qemu_co_mutex_init(&s->lock);
 return 0;
 
+fail_format:
+error_setg(errp, "Image not in Parallels format");
+ret = -EINVAL;
 fail:
 g_free(s->catalog_bitmap);
 return ret;
-- 
1.9.1




[Qemu-devel] [PATCH 0/4] block/parallels: 2TB+ parallels images support

2014-07-22 Thread Denis V. Lunev
Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case to code the virtual disk size for such images nb_sectors
field is extended to 64 bits. The reader of older images with signature
WithoutFreeSpace must manually zero most valuable bits of nb_sectors
on open.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 




[Qemu-devel] [PATCH 2/4] block/parallels: replace tabs with spaces in block/parallels.c

2014-07-22 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index c44df87..8f9ec8a 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -61,11 +61,11 @@ static int parallels_probe(const uint8_t *buf, int 
buf_size, const char *filenam
 const struct parallels_header *ph = (const void *)buf;
 
 if (buf_size < HEADER_SIZE)
-   return 0;
+return 0;
 
 if (!memcmp(ph->magic, HEADER_MAGIC, 16) &&
-   (le32_to_cpu(ph->version) == HEADER_VERSION))
-   return 100;
+(le32_to_cpu(ph->version) == HEADER_VERSION))
+return 100;
 
 return 0;
 }
@@ -115,7 +115,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 for (i = 0; i < s->catalog_size; i++)
-   le32_to_cpus(&s->catalog_bitmap[i]);
+le32_to_cpus(&s->catalog_bitmap[i]);
 
 qemu_co_mutex_init(&s->lock);
 return 0;
@@ -135,7 +135,7 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
 
 /* not allocated */
 if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
-   return -1;
+return -1;
 return (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH 1/4] block/parallels: extend parallels format header with actual data values

2014-07-22 Thread Denis V. Lunev
Parallels image format has several additional fields inside:
- nb_sectors is actually 64 bit wide. Upper 32bits are not used for
  images with signature "WithoutFreeSpace" and must be explicitely
  zeroed according to Parallels. They will be used for images with
  signature "WithouFreSpacExt"
- inuse is magic which means that the image is currently opened for
  read/write or was not closed correctly, the magic is 0x746f6e59
- data_off is the location of the first data block. It can be zero
  and in this case

This patch adds these values to struct parallels_header and adds
proper handling of nb_sectors for currently supported WithoutFreeSpace
images.

WithouFreSpacExt will be covered in the next patch.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 1a5bd35..c44df87 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -41,8 +41,10 @@ struct parallels_header {
 uint32_t cylinders;
 uint32_t tracks;
 uint32_t catalog_entries;
-uint32_t nb_sectors;
-char padding[24];
+uint64_t nb_sectors;
+uint32_t inuse;
+uint32_t data_off;
+char padding[12];
 } QEMU_PACKED;
 
 typedef struct BDRVParallelsState {
@@ -90,7 +92,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
-bs->total_sectors = le32_to_cpu(ph.nb_sectors);
+bs->total_sectors = (uint32_t)le64_to_cpu(ph.nb_sectors);
 
 s->tracks = le32_to_cpu(ph.tracks);
 if (s->tracks == 0) {
-- 
1.9.1




[Qemu-devel] [PATCH 4/4] block/parallels: 2TB+ parallels images support

2014-07-22 Thread Denis V. Lunev
Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case all 64 bits of header->nb_sectors are used for image size.

This patch implements support of this for qemu-img.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 02739cf..d9cb04f 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -30,6 +30,7 @@
 /**/
 
 #define HEADER_MAGIC "WithoutFreeSpace"
+#define HEADER_MAGIC2 "WithouFreSpacExt"
 #define HEADER_VERSION 2
 #define HEADER_SIZE 64
 
@@ -54,6 +55,8 @@ typedef struct BDRVParallelsState {
 unsigned int catalog_size;
 
 unsigned int tracks;
+
+unsigned int off_multiplier;
 } BDRVParallelsState;
 
 static int parallels_probe(const uint8_t *buf, int buf_size, const char 
*filename)
@@ -63,7 +66,8 @@ static int parallels_probe(const uint8_t *buf, int buf_size, 
const char *filenam
 if (buf_size < HEADER_SIZE)
 return 0;
 
-if (!memcmp(ph->magic, HEADER_MAGIC, 16) &&
+if ((!memcmp(ph->magic, HEADER_MAGIC, 16) ||
+!memcmp(ph->magic, HEADER_MAGIC2, 16)) &&
 (le32_to_cpu(ph->version) == HEADER_VERSION))
 return 100;
 
@@ -85,13 +89,18 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
+bs->total_sectors = le64_to_cpu(ph.nb_sectors);
+
 if (le32_to_cpu(ph.version) != HEADER_VERSION)
 goto fail_format;
-if (memcmp(ph.magic, HEADER_MAGIC, 16))
+if (!memcmp(ph.magic, HEADER_MAGIC, 16)) {
+s->off_multiplier = 1;
+bs->total_sectors = (uint32_t)bs->total_sectors;
+} else if (!memcmp(ph.magic, HEADER_MAGIC2, 16))
+s->off_multiplier = le32_to_cpu(ph.tracks);
+else
 goto fail_format;
 
-bs->total_sectors = (uint32_t)le64_to_cpu(ph.nb_sectors);
-
 s->tracks = le32_to_cpu(ph.tracks);
 if (s->tracks == 0) {
 error_setg(errp, "Invalid image: Zero sectors per track");
@@ -137,7 +146,8 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
 /* not allocated */
 if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
 return -1;
-return (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
+return
+((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 
512;
 }
 
 static int parallels_read(BlockDriverState *bs, int64_t sector_num,
-- 
1.9.1




Re: [Qemu-devel] [PATCH 1/4] block/parallels: extend parallels format header with actual data values

2014-07-25 Thread Denis V. Lunev

On 24/07/14 22:34, Jeff Cody wrote:

On Tue, Jul 22, 2014 at 05:19:34PM +0400, Denis V. Lunev wrote:

Parallels image format has several additional fields inside:
- nb_sectors is actually 64 bit wide. Upper 32bits are not used for
   images with signature "WithoutFreeSpace" and must be explicitely

s/explicitely/explicitly


   zeroed according to Parallels. They will be used for images with
   signature "WithouFreSpacExt"
- inuse is magic which means that the image is currently opened for
   read/write or was not closed correctly, the magic is 0x746f6e59
- data_off is the location of the first data block. It can be zero
   and in this case

I think you may have forgotten to finish this sentence :)

ok



This patch adds these values to struct parallels_header and adds
proper handling of nb_sectors for currently supported WithoutFreeSpace
images.

WithouFreSpacExt will be covered in the next patch.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
  block/parallels.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 1a5bd35..c44df87 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -41,8 +41,10 @@ struct parallels_header {
  uint32_t cylinders;
  uint32_t tracks;
  uint32_t catalog_entries;
-uint32_t nb_sectors;
-char padding[24];
+uint64_t nb_sectors;
+uint32_t inuse;
+uint32_t data_off;
+char padding[12];
  } QEMU_PACKED;
  
  typedef struct BDRVParallelsState {

@@ -90,7 +92,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  goto fail;
  }
  
-bs->total_sectors = le32_to_cpu(ph.nb_sectors);

+bs->total_sectors = (uint32_t)le64_to_cpu(ph.nb_sectors);

I think an explicit bit mask on the upper 32 bits would fit better
here than a cast, especially since neither 'bs->total_sectors' nor
'ph.nb_sectors' is a uint32_t. E.g.:

 bs->total_sectors = 0x & le64_to_cpu(ph.nb_sectors);


ok, will do



Re: [Qemu-devel] [PATCH 4/4] block/parallels: 2TB+ parallels images support

2014-07-25 Thread Denis V. Lunev

On 24/07/14 23:25, Jeff Cody wrote:

On Tue, Jul 22, 2014 at 05:19:37PM +0400, Denis V. Lunev wrote:

Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case all 64 bits of header->nb_sectors are used for image size.

This patch implements support of this for qemu-img.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
  block/parallels.c | 20 +++-
  1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 02739cf..d9cb04f 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -30,6 +30,7 @@
  /**/
  
  #define HEADER_MAGIC "WithoutFreeSpace"

+#define HEADER_MAGIC2 "WithouFreSpacExt"
  #define HEADER_VERSION 2
  #define HEADER_SIZE 64
  
@@ -54,6 +55,8 @@ typedef struct BDRVParallelsState {

  unsigned int catalog_size;
  
  unsigned int tracks;

+
+unsigned int off_multiplier;
  } BDRVParallelsState;
  
  static int parallels_probe(const uint8_t *buf, int buf_size, const char *filename)

@@ -63,7 +66,8 @@ static int parallels_probe(const uint8_t *buf, int buf_size, 
const char *filenam
  if (buf_size < HEADER_SIZE)
  return 0;
  
-if (!memcmp(ph->magic, HEADER_MAGIC, 16) &&

+if ((!memcmp(ph->magic, HEADER_MAGIC, 16) ||
+!memcmp(ph->magic, HEADER_MAGIC2, 16)) &&
  (le32_to_cpu(ph->version) == HEADER_VERSION))
  return 100;
  
@@ -85,13 +89,18 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,

  goto fail;
  }
  
+bs->total_sectors = le64_to_cpu(ph.nb_sectors);

+

bs->total_sectors is a int64_t, and ph.nb_sectors is a uint64_t.
Should we do a sanity check here on the max number of sectors?

in reality such image will not be usable as we will later have to multiply
this count with 512 to calculate offset in the file

  if (le32_to_cpu(ph.version) != HEADER_VERSION)
  goto fail_format;
-if (memcmp(ph.magic, HEADER_MAGIC, 16))
+if (!memcmp(ph.magic, HEADER_MAGIC, 16)) {
+s->off_multiplier = 1;
+bs->total_sectors = (uint32_t)bs->total_sectors;

(same comment as in patch 1, w.r.t. cast vs. bitmask)

ok

+} else if (!memcmp(ph.magic, HEADER_MAGIC2, 16))
+s->off_multiplier = le32_to_cpu(ph.tracks);

Is there a maximum size in the specification for ph.tracks?

no, there is no limitation. Though in reality I have seen the
following images only:
  252 sectors, 63 sectors, 256 sectors, 2048 sectors
and only 2048 was used with this new header.

This is just an observation.


+else
  goto fail_format;

(same comment as the last patch, w.r.t. braces)

  
-bs->total_sectors = (uint32_t)le64_to_cpu(ph.nb_sectors);

-
  s->tracks = le32_to_cpu(ph.tracks);
  if (s->tracks == 0) {
  error_setg(errp, "Invalid image: Zero sectors per track");
@@ -137,7 +146,8 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
  /* not allocated */
  if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
  return -1;
-return (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
+return
+((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 
512;

Do we need to worry about overflow here, depending on the value of
off_multiplier?

Also, (and this existed prior to this patch), - we are casting to
uint64_t, although the function returns int64_t.

good catch. I'll change the declaration to uint64_t and will return 0
from previous if aka

if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))

It is not possible to obtain 0 as a real offset of any sector as
this is an offset of the header.

Regarding overflow check. Do you think that we should return
error to the upper layer or we could silently fill result data
with zeroes as was done originally for the following case
'index > s->catalog_size' ?

  }
  
  static int parallels_read(BlockDriverState *bs, int64_t sector_num,

--
1.9.1







Re: [Qemu-devel] [PATCH 3/4] block/parallels: split check for parallels format in parallels_open

2014-07-25 Thread Denis V. Lunev

On 24/07/14 22:50, Jeff Cody wrote:

On Tue, Jul 22, 2014 at 05:19:36PM +0400, Denis V. Lunev wrote:

and rework error path a bit. There is no difference at the moment, but
the code will be definitely shorter when additional processing will
be required for WithouFreSpacExt

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
  block/parallels.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 8f9ec8a..02739cf 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -85,12 +85,10 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  goto fail;
  }
  
-if (memcmp(ph.magic, HEADER_MAGIC, 16) ||

-(le32_to_cpu(ph.version) != HEADER_VERSION)) {
-error_setg(errp, "Image not in Parallels format");
-ret = -EINVAL;
-goto fail;
-}
+if (le32_to_cpu(ph.version) != HEADER_VERSION)
+goto fail_format;
+if (memcmp(ph.magic, HEADER_MAGIC, 16))
+goto fail_format;

QEMU coding style dictates these statements have curly braces, even
though they are just one liners.  (If you run your patches against
scripts/checkpatch.pl, it should catch most style issues).

ok, I just used a kernel convention. Will redo this here and in the
next path. This is not a problem :) Thank you for pointing this
out.


I think this patch could also just be squashed into patch 4, if
desired.

I'd prefer not to do this to have behavior changing and behavior
non-changing stuff separated.

  
  bs->total_sectors = (uint32_t)le64_to_cpu(ph.nb_sectors);
  
@@ -120,6 +118,9 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,

  qemu_co_mutex_init(&s->lock);
  return 0;
  
+fail_format:

+error_setg(errp, "Image not in Parallels format");
+ret = -EINVAL;
  fail:
  g_free(s->catalog_bitmap);
  return ret;
--
1.9.1







Re: [Qemu-devel] [PATCH 4/4] block/parallels: 2TB+ parallels images support

2014-07-25 Thread Denis V. Lunev

On 25/07/14 17:08, Jeff Cody wrote:

On Fri, Jul 25, 2014 at 07:51:47AM +0400, Denis V. Lunev wrote:

On 24/07/14 23:25, Jeff Cody wrote:

On Tue, Jul 22, 2014 at 05:19:37PM +0400, Denis V. Lunev wrote:

Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case all 64 bits of header->nb_sectors are used for image size.

This patch implements support of this for qemu-img.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
  block/parallels.c | 20 +++-
  1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 02739cf..d9cb04f 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -30,6 +30,7 @@
  /**/
  #define HEADER_MAGIC "WithoutFreeSpace"
+#define HEADER_MAGIC2 "WithouFreSpacExt"
  #define HEADER_VERSION 2
  #define HEADER_SIZE 64
@@ -54,6 +55,8 @@ typedef struct BDRVParallelsState {
  unsigned int catalog_size;
  unsigned int tracks;
+
+unsigned int off_multiplier;
  } BDRVParallelsState;
  static int parallels_probe(const uint8_t *buf, int buf_size, const char 
*filename)
@@ -63,7 +66,8 @@ static int parallels_probe(const uint8_t *buf, int buf_size, 
const char *filenam
  if (buf_size < HEADER_SIZE)
  return 0;
-if (!memcmp(ph->magic, HEADER_MAGIC, 16) &&
+if ((!memcmp(ph->magic, HEADER_MAGIC, 16) ||
+!memcmp(ph->magic, HEADER_MAGIC2, 16)) &&
  (le32_to_cpu(ph->version) == HEADER_VERSION))
  return 100;
@@ -85,13 +89,18 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  goto fail;
  }
+bs->total_sectors = le64_to_cpu(ph.nb_sectors);
+

bs->total_sectors is a int64_t, and ph.nb_sectors is a uint64_t.
Should we do a sanity check here on the max number of sectors?

in reality such image will not be usable as we will later have to multiply
this count with 512 to calculate offset in the file

  if (le32_to_cpu(ph.version) != HEADER_VERSION)
  goto fail_format;
-if (memcmp(ph.magic, HEADER_MAGIC, 16))
+if (!memcmp(ph.magic, HEADER_MAGIC, 16)) {
+s->off_multiplier = 1;
+bs->total_sectors = (uint32_t)bs->total_sectors;

(same comment as in patch 1, w.r.t. cast vs. bitmask)

ok

+} else if (!memcmp(ph.magic, HEADER_MAGIC2, 16))
+s->off_multiplier = le32_to_cpu(ph.tracks);

Is there a maximum size in the specification for ph.tracks?

no, there is no limitation. Though in reality I have seen the
following images only:
   252 sectors, 63 sectors, 256 sectors, 2048 sectors
and only 2048 was used with this new header.

This is just an observation.


+else
  goto fail_format;

(same comment as the last patch, w.r.t. braces)


-bs->total_sectors = (uint32_t)le64_to_cpu(ph.nb_sectors);
-
  s->tracks = le32_to_cpu(ph.tracks);
  if (s->tracks == 0) {
  error_setg(errp, "Invalid image: Zero sectors per track");
@@ -137,7 +146,8 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
  /* not allocated */
  if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
  return -1;
-return (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
+return
+((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 
512;

Do we need to worry about overflow here, depending on the value of
off_multiplier?

Also, (and this existed prior to this patch), - we are casting to
uint64_t, although the function returns int64_t.

good catch. I'll change the declaration to uint64_t and will return 0
from previous if aka


Thanks.

I'm not sure that is the best option though - the only place the
return value from this function is used is in parallels_read(), which
passes the output into bdrv_pread() as the offset.  The offset
parameter for bdrv_pread/pwrite is an int64_t, not a uint64_t.



if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))

It is not possible to obtain 0 as a real offset of any sector as
this is an offset of the header.

Regarding overflow check. Do you think that we should return
error to the upper layer or we could silently fill result data
with zeroes as was done originally for the following case
'index > s->catalog_size' ?

I think it would be best to return error (anything < 0 will also cause
bdrv_pread to return -EIO, and it is also checked for in
parallels_read).

I also don't mean to over-complicate things here for you, which is why
I asked about the max value of ph.tracks.  If that has a reasonable
bounds check, t

[Qemu-devel] [PATCH v2 1/4] parallels: extend parallels format header with actual data values

2014-07-28 Thread Denis V. Lunev
Parallels image format has several additional fields inside:
- nb_sectors is actually 64 bit wide. Upper 32bits are not used for
  images with signature "WithoutFreeSpace" and must be explicitly
  zeroed according to Parallels. They will be used for images with
  signature "WithouFreSpacExt"
- inuse is magic which means that the image is currently opened for
  read/write or was not closed correctly, the magic is 0x746f6e59
- data_off is the location of the first data block. It can be zero
  and in this case data starts just beyond the header aligned to
  512 bytes. Though this field does not matter for read-only driver

This patch adds these values to struct parallels_header and adds
proper handling of nb_sectors for currently supported WithoutFreeSpace
images.

WithouFreSpacExt will be covered in next patches.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Jeff Cody 
---
 block/parallels.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 1a5bd35..e39 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -41,8 +41,10 @@ struct parallels_header {
 uint32_t cylinders;
 uint32_t tracks;
 uint32_t catalog_entries;
-uint32_t nb_sectors;
-char padding[24];
+uint64_t nb_sectors;
+uint32_t inuse;
+uint32_t data_off;
+char padding[12];
 } QEMU_PACKED;
 
 typedef struct BDRVParallelsState {
@@ -90,7 +92,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
-bs->total_sectors = le32_to_cpu(ph.nb_sectors);
+bs->total_sectors = 0x & le64_to_cpu(ph.nb_sectors);
 
 s->tracks = le32_to_cpu(ph.tracks);
 if (s->tracks == 0) {
-- 
1.9.1




[Qemu-devel] [PATCH v2 2/4] parallels: replace tabs with spaces in block/parallels.c

2014-07-28 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev 
Reviewed-by: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index e39..16d14ad 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -61,11 +61,11 @@ static int parallels_probe(const uint8_t *buf, int 
buf_size, const char *filenam
 const struct parallels_header *ph = (const void *)buf;
 
 if (buf_size < HEADER_SIZE)
-   return 0;
+return 0;
 
 if (!memcmp(ph->magic, HEADER_MAGIC, 16) &&
-   (le32_to_cpu(ph->version) == HEADER_VERSION))
-   return 100;
+(le32_to_cpu(ph->version) == HEADER_VERSION))
+return 100;
 
 return 0;
 }
@@ -115,7 +115,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 for (i = 0; i < s->catalog_size; i++)
-   le32_to_cpus(&s->catalog_bitmap[i]);
+le32_to_cpus(&s->catalog_bitmap[i]);
 
 qemu_co_mutex_init(&s->lock);
 return 0;
@@ -135,7 +135,7 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
 
 /* not allocated */
 if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
-   return -1;
+return -1;
 return (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH v2 3/4] parallels: split check for parallels format in parallels_open

2014-07-28 Thread Denis V. Lunev
and rework error path a bit. There is no difference at the moment, but
the code will be definitely shorter when additional processing will
be required for WithouFreSpacExt

Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 16d14ad..466705e 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -85,11 +85,11 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
-if (memcmp(ph.magic, HEADER_MAGIC, 16) ||
-(le32_to_cpu(ph.version) != HEADER_VERSION)) {
-error_setg(errp, "Image not in Parallels format");
-ret = -EINVAL;
-goto fail;
+if (le32_to_cpu(ph.version) != HEADER_VERSION) {
+goto fail_format;
+}
+if (memcmp(ph.magic, HEADER_MAGIC, 16)) {
+goto fail_format;
 }
 
 bs->total_sectors = 0x & le64_to_cpu(ph.nb_sectors);
@@ -120,6 +120,9 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 qemu_co_mutex_init(&s->lock);
 return 0;
 
+fail_format:
+error_setg(errp, "Image not in Parallels format");
+ret = -EINVAL;
 fail:
 g_free(s->catalog_bitmap);
 return ret;
-- 
1.9.1




[Qemu-devel] [PATCH v2 0/4] block/parallels: 2TB+ parallels images support

2014-07-28 Thread Denis V. Lunev
Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case to code the virtual disk size for such images nb_sectors
field is extended to 64 bits. The reader of older images with signature
WithoutFreeSpace must manually zero most valuable bits of nb_sectors
on open.

Changes from v1:
- fixed message in patch 1
- added braces to conform qemu coding style in patches 3 & 4
- added check for ph.tracks in patch 4 to avoid offset overflow as suggested
  by Jeff

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Jeff Cody 



[Qemu-devel] [PATCH v2 4/4] parallels: 2TB+ parallels images support

2014-07-28 Thread Denis V. Lunev
Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case all 64 bits of header->nb_sectors are used for image size.

This patch implements support of this for qemu-img and also adds specific
check for an incorrect image. Images with block size greater than
INT_MAX/513 are not supported. The biggest available Parallels image
cluster size in the field is 1 Mb. Thus this limit will not hurt
anyone.

Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 466705e..4414a9d 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -30,6 +30,7 @@
 /**/
 
 #define HEADER_MAGIC "WithoutFreeSpace"
+#define HEADER_MAGIC2 "WithouFreSpacExt"
 #define HEADER_VERSION 2
 #define HEADER_SIZE 64
 
@@ -54,6 +55,8 @@ typedef struct BDRVParallelsState {
 unsigned int catalog_size;
 
 unsigned int tracks;
+
+unsigned int off_multiplier;
 } BDRVParallelsState;
 
 static int parallels_probe(const uint8_t *buf, int buf_size, const char 
*filename)
@@ -63,7 +66,8 @@ static int parallels_probe(const uint8_t *buf, int buf_size, 
const char *filenam
 if (buf_size < HEADER_SIZE)
 return 0;
 
-if (!memcmp(ph->magic, HEADER_MAGIC, 16) &&
+if ((!memcmp(ph->magic, HEADER_MAGIC, 16) ||
+!memcmp(ph->magic, HEADER_MAGIC2, 16)) &&
 (le32_to_cpu(ph->version) == HEADER_VERSION))
 return 100;
 
@@ -85,21 +89,31 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
+bs->total_sectors = le64_to_cpu(ph.nb_sectors);
+
 if (le32_to_cpu(ph.version) != HEADER_VERSION) {
 goto fail_format;
 }
-if (memcmp(ph.magic, HEADER_MAGIC, 16)) {
+if (!memcmp(ph.magic, HEADER_MAGIC, 16)) {
+s->off_multiplier = 1;
+bs->total_sectors = 0x & bs->total_sectors;
+} else if (!memcmp(ph.magic, HEADER_MAGIC2, 16)) {
+s->off_multiplier = le32_to_cpu(ph.tracks);
+} else {
 goto fail_format;
 }
 
-bs->total_sectors = 0x & le64_to_cpu(ph.nb_sectors);
-
 s->tracks = le32_to_cpu(ph.tracks);
 if (s->tracks == 0) {
 error_setg(errp, "Invalid image: Zero sectors per track");
 ret = -EINVAL;
 goto fail;
 }
+if (s->tracks > INT32_MAX/513) {
+error_setg(errp, "Invalid image: Too big cluster");
+ret = -EFBIG;
+goto fail;
+}
 
 s->catalog_size = le32_to_cpu(ph.catalog_entries);
 if (s->catalog_size > INT_MAX / 4) {
@@ -139,7 +153,8 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
 /* not allocated */
 if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
 return -1;
-return (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
+return
+((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 
512;
 }
 
 static int parallels_read(BlockDriverState *bs, int64_t sector_num,
-- 
1.9.1




Re: [Qemu-devel] [PATCH v2 0/4] block/parallels: 2TB+ parallels images support

2014-08-07 Thread Denis V. Lunev

On 28/07/14 20:23, Denis V. Lunev wrote:

Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case to code the virtual disk size for such images nb_sectors
field is extended to 64 bits. The reader of older images with signature
WithoutFreeSpace must manually zero most valuable bits of nb_sectors
on open.

Changes from v1:
- fixed message in patch 1
- added braces to conform qemu coding style in patches 3 & 4
- added check for ph.tracks in patch 4 to avoid offset overflow as suggested
   by Jeff

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Jeff Cody 

can you pls look/commit. I have some other changes on top of this



Re: [Qemu-devel] [PATCH v2 4/4] parallels: 2TB+ parallels images support

2014-08-07 Thread Denis V. Lunev

On 07/08/14 18:39, Jeff Cody wrote:

On Mon, Jul 28, 2014 at 08:23:55PM +0400, Denis V. Lunev wrote:

Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case all 64 bits of header->nb_sectors are used for image size.

This patch implements support of this for qemu-img and also adds specific
check for an incorrect image. Images with block size greater than
INT_MAX/513 are not supported. The biggest available Parallels image
cluster size in the field is 1 Mb. Thus this limit will not hurt
anyone.

Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
  block/parallels.c | 25 -
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 466705e..4414a9d 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -30,6 +30,7 @@
  /**/
  
  #define HEADER_MAGIC "WithoutFreeSpace"

+#define HEADER_MAGIC2 "WithouFreSpacExt"
  #define HEADER_VERSION 2
  #define HEADER_SIZE 64
  
@@ -54,6 +55,8 @@ typedef struct BDRVParallelsState {

  unsigned int catalog_size;
  
  unsigned int tracks;

+
+unsigned int off_multiplier;
  } BDRVParallelsState;
  
  static int parallels_probe(const uint8_t *buf, int buf_size, const char *filename)

@@ -63,7 +66,8 @@ static int parallels_probe(const uint8_t *buf, int buf_size, 
const char *filenam
  if (buf_size < HEADER_SIZE)
  return 0;
  
-if (!memcmp(ph->magic, HEADER_MAGIC, 16) &&

+if ((!memcmp(ph->magic, HEADER_MAGIC, 16) ||
+!memcmp(ph->magic, HEADER_MAGIC2, 16)) &&
  (le32_to_cpu(ph->version) == HEADER_VERSION))
  return 100;
  
@@ -85,21 +89,31 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,

  goto fail;
  }
  
+bs->total_sectors = le64_to_cpu(ph.nb_sectors);

+
  if (le32_to_cpu(ph.version) != HEADER_VERSION) {
  goto fail_format;
  }
-if (memcmp(ph.magic, HEADER_MAGIC, 16)) {
+if (!memcmp(ph.magic, HEADER_MAGIC, 16)) {
+s->off_multiplier = 1;
+bs->total_sectors = 0x & bs->total_sectors;
+} else if (!memcmp(ph.magic, HEADER_MAGIC2, 16)) {
+s->off_multiplier = le32_to_cpu(ph.tracks);
+} else {
  goto fail_format;
  }
  
-bs->total_sectors = 0x & le64_to_cpu(ph.nb_sectors);

-
  s->tracks = le32_to_cpu(ph.tracks);
  if (s->tracks == 0) {
  error_setg(errp, "Invalid image: Zero sectors per track");
  ret = -EINVAL;
  goto fail;
  }
+if (s->tracks > INT32_MAX/513) {
+error_setg(errp, "Invalid image: Too big cluster");
+ret = -EFBIG;
+goto fail;
+}
  
  s->catalog_size = le32_to_cpu(ph.catalog_entries);

  if (s->catalog_size > INT_MAX / 4) {
@@ -139,7 +153,8 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
  /* not allocated */
  if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
  return -1;
-return (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
+return
+((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 
512;

This still does a cast to uint_64_t, instead of int64_t; not sure it
really matters in practice, as we should be safe now from exceeding an
int64_t value in the end result.


this is safe due to above check for s->tracks > INT32_MAX/513
Actually, original code has exactly the same cast and the situation
is exactly the same before the patch (uint32_t value * 1) and after
the patch (uint32_t * (something < INT32_MAX/513))

Though I can change the cast to int64_t, I do not see much difference.
Should I do this?


  }
  
  static int parallels_read(BlockDriverState *bs, int64_t sector_num,

--
1.9.1







Re: [Qemu-devel] [PATCH v2 4/4] parallels: 2TB+ parallels images support

2014-08-07 Thread Denis V. Lunev

On 07/08/14 19:14, Jeff Cody wrote:

On Thu, Aug 07, 2014 at 07:03:12PM +0400, Denis V. Lunev wrote:

On 07/08/14 18:39, Jeff Cody wrote:

On Mon, Jul 28, 2014 at 08:23:55PM +0400, Denis V. Lunev wrote:

Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case all 64 bits of header->nb_sectors are used for image size.

This patch implements support of this for qemu-img and also adds specific
check for an incorrect image. Images with block size greater than
INT_MAX/513 are not supported. The biggest available Parallels image
cluster size in the field is 1 Mb. Thus this limit will not hurt
anyone.

Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
  block/parallels.c | 25 -
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 466705e..4414a9d 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -30,6 +30,7 @@
  /**/
  #define HEADER_MAGIC "WithoutFreeSpace"
+#define HEADER_MAGIC2 "WithouFreSpacExt"
  #define HEADER_VERSION 2
  #define HEADER_SIZE 64
@@ -54,6 +55,8 @@ typedef struct BDRVParallelsState {
  unsigned int catalog_size;
  unsigned int tracks;
+
+unsigned int off_multiplier;
  } BDRVParallelsState;
  static int parallels_probe(const uint8_t *buf, int buf_size, const char 
*filename)
@@ -63,7 +66,8 @@ static int parallels_probe(const uint8_t *buf, int buf_size, 
const char *filenam
  if (buf_size < HEADER_SIZE)
  return 0;
-if (!memcmp(ph->magic, HEADER_MAGIC, 16) &&
+if ((!memcmp(ph->magic, HEADER_MAGIC, 16) ||
+!memcmp(ph->magic, HEADER_MAGIC2, 16)) &&
  (le32_to_cpu(ph->version) == HEADER_VERSION))
  return 100;
@@ -85,21 +89,31 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  goto fail;
  }
+bs->total_sectors = le64_to_cpu(ph.nb_sectors);
+
  if (le32_to_cpu(ph.version) != HEADER_VERSION) {
  goto fail_format;
  }
-if (memcmp(ph.magic, HEADER_MAGIC, 16)) {
+if (!memcmp(ph.magic, HEADER_MAGIC, 16)) {
+s->off_multiplier = 1;
+bs->total_sectors = 0x & bs->total_sectors;
+} else if (!memcmp(ph.magic, HEADER_MAGIC2, 16)) {
+s->off_multiplier = le32_to_cpu(ph.tracks);
+} else {
  goto fail_format;
  }
-bs->total_sectors = 0x & le64_to_cpu(ph.nb_sectors);
-
  s->tracks = le32_to_cpu(ph.tracks);
  if (s->tracks == 0) {
  error_setg(errp, "Invalid image: Zero sectors per track");
  ret = -EINVAL;
  goto fail;
  }
+if (s->tracks > INT32_MAX/513) {
+error_setg(errp, "Invalid image: Too big cluster");
+ret = -EFBIG;
+goto fail;
+}
  s->catalog_size = le32_to_cpu(ph.catalog_entries);
  if (s->catalog_size > INT_MAX / 4) {
@@ -139,7 +153,8 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
  /* not allocated */
  if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
  return -1;
-return (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
+return
+((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 
512;

This still does a cast to uint_64_t, instead of int64_t; not sure it
really matters in practice, as we should be safe now from exceeding an
int64_t value in the end result.

this is safe due to above check for s->tracks > INT32_MAX/513
Actually, original code has exactly the same cast and the situation
is exactly the same before the patch (uint32_t value * 1) and after
the patch (uint32_t * (something < INT32_MAX/513))

Though I can change the cast to int64_t, I do not see much difference.
Should I do this?


Right, as I said in practice it should now be safe from exceeding an
int64_t (due to the bounds check on s->tracks). I think it is worth
changing if someone else requests a respin for another reason, but
probably not to do a respin for this on its own.

Another question - do you have any sample images?  If they compress
well (bzip2 does a good job with most blank images) it would be nice
to have a couple of parallels images (e.g. one "WithouFreSpacExt" and
one "WithoutFreeSpace") in the tests sample_images directory.  If you
can provide both types of images, I'll amend test 076 to include them.

I can go ahead and give my R-b for this patch:
no prob, I have access to them. I'll provide a collection within next 
series.


There is some other incompatible stuff add...


Reviewed-by: Jeff Cody 


Thank you :)






  }
  static int parallels_read(BlockDriverState *bs, int64_t sector_num,
--
1.9.1









Re: [Qemu-devel] [PATCH v2 4/4] parallels: 2TB+ parallels images support

2014-08-07 Thread Denis V. Lunev

On 07/08/14 19:22, Denis V. Lunev wrote:

Another question - do you have any sample images?  If they compress

well (bzip2 does a good job with most blank images) it would be nice
to have a couple of parallels images (e.g. one "WithouFreSpacExt" and
one "WithoutFreeSpace") in the tests sample_images directory. If you
can provide both types of images, I'll amend test 076 to include them.

I can go ahead and give my R-b for this patch:
no prob, I have access to them. I'll provide a collection within next 
series.


There is some other incompatible stuff add...


/stuff add.../stuff to add.../



Re: [Qemu-devel] Wierd hack to sound/pci/intel8x0.c

2011-11-06 Thread Denis V. Lunev

On 11/6/11 6:51 PM, Avi Kivity wrote:

The recently merged 228cf79376f1 ("ALSA: intel8x0: Improve performance
in virtual environment") is hacky and somewhat wrong.

First, the detection code

+   if (inside_vm<  0) {
+   /* detect KVM and Parallels virtual environments */
+   inside_vm = kvm_para_available();
+#if defined(__i386__) || defined(__x86_64__)
+   inside_vm = inside_vm ||
boot_cpu_has(X86_FEATURE_HYPERVISOR);
+#endif
+   }
+

is incorrect.  It detects that you're running in a guest, but that
doesn't imply that the device you're accessing is emulated.  It may be a
host device assigned to the guest; presumably the optimization you apply
doesn't work for real devices.

Second, the optimization itself looks fishy:

 spin_lock(&chip->reg_lock);
 do {
 civ = igetbyte(chip, ichdev->reg_offset + ICH_REG_OFF_CIV);
 ptr1 = igetword(chip, ichdev->reg_offset +
ichdev->roff_picb);
 position = ichdev->position;
 if (ptr1 == 0) {
 udelay(10);
 continue;
 }
-   if (civ == igetbyte(chip, ichdev->reg_offset +
ICH_REG_OFF_CIV)&&
-   ptr1 == igetword(chip, ichdev->reg_offset +
ichdev->roff_picb))
+   if (civ != igetbyte(chip, ichdev->reg_offset +
ICH_REG_OFF_CIV))
+   continue;
+   if (chip->inside_vm)
+   break;
+   if (ptr1 == igetword(chip, ichdev->reg_offset +
ichdev->roff_picb))
 break;
 } while (timeout--);


Why is the emulated device timing out?  Can't the emulation be fixed to
behave like real hardware?

Last, please copy k...@vger.kernel.org on such issues.


The problem is that emulation can not be fixed.

How this is working for real hardware? You get data from real sound card 
register.

The scheduling is off at the moment thus you can not be re-scheduled.

In the virtual environment the situation is different. Any IO emulation 
is expensive.
The processor is switched from guest to hypervisor and further to 
emulation process
takes a lot of time.  This time is enough to obtain different value on 
next register read.
That's why this code is really timed out. Please also note that host 
scheduler also

plays his games and could schedule out VCPU thread.

The problem could be potentially fixed reducing precision of PICB emulation,
but this results in lower sound quality.

This kludge has been written this way in order not to break legacy card 
for which we
do not have an access. The code reading PICB/CIV registers second time 
was added
to fix issues on unknown for now platform and it looks not possible how 
to find/test
against this platform now. We have checked Windows drivers written by 
Intel/AMD
(32/64 bit) and MacOS ones. There is no second reading of CIV/PICB 
inside. We

hope that this is relay needed only for some rare hadware devices.

The only thing we can is to improve detection code. Suggestions are welcome.

Regards,
Den



Re: [Qemu-devel] Wierd hack to sound/pci/intel8x0.c

2011-11-06 Thread Denis V. Lunev

On 11/6/11 8:31 PM, Avi Kivity wrote:

On 11/06/2011 06:15 PM, Denis V. Lunev wrote:

On 11/6/11 6:51 PM, Avi Kivity wrote:

The recently merged 228cf79376f1 ("ALSA: intel8x0: Improve performance
in virtual environment") is hacky and somewhat wrong.

First, the detection code

+   if (inside_vm<   0) {
+   /* detect KVM and Parallels virtual environments */
+   inside_vm = kvm_para_available();
+#if defined(__i386__) || defined(__x86_64__)
+   inside_vm = inside_vm ||
boot_cpu_has(X86_FEATURE_HYPERVISOR);
+#endif
+   }
+

is incorrect.  It detects that you're running in a guest, but that
doesn't imply that the device you're accessing is emulated.  It may be a
host device assigned to the guest; presumably the optimization you apply
doesn't work for real devices.

Second, the optimization itself looks fishy:

  spin_lock(&chip->reg_lock);
  do {
  civ = igetbyte(chip, ichdev->reg_offset +
ICH_REG_OFF_CIV);
  ptr1 = igetword(chip, ichdev->reg_offset +
ichdev->roff_picb);
  position = ichdev->position;
  if (ptr1 == 0) {
  udelay(10);
  continue;
  }
-   if (civ == igetbyte(chip, ichdev->reg_offset +
ICH_REG_OFF_CIV)&&
-   ptr1 == igetword(chip, ichdev->reg_offset +
ichdev->roff_picb))
+   if (civ != igetbyte(chip, ichdev->reg_offset +
ICH_REG_OFF_CIV))
+   continue;
+   if (chip->inside_vm)
+   break;
+   if (ptr1 == igetword(chip, ichdev->reg_offset +
ichdev->roff_picb))
  break;
  } while (timeout--);


Why is the emulated device timing out?  Can't the emulation be fixed to
behave like real hardware?

Last, please copy k...@vger.kernel.org on such issues.


The problem is that emulation can not be fixed.

How this is working for real hardware? You get data from real sound
card register.
The scheduling is off at the moment thus you can not be re-scheduled.

In the virtual environment the situation is different. Any IO
emulation is expensive.
The processor is switched from guest to hypervisor and further to
emulation process
takes a lot of time.  This time is enough to obtain different value on
next register read.
That's why this code is really timed out. Please also note that host
scheduler also
plays his games and could schedule out VCPU thread.


Note on kvm this is rare, since the guest thread and the emulator thread
are the same.


It is the same for Parallels too, but it can be scheduled out anyway.
The most important part here is context switches cost, which is actually 
high and enough to result

in different PICB value.


The problem could be potentially fixed reducing precision of PICB
emulation,
but this results in lower sound quality.

This kludge has been written this way in order not to break legacy
card for which we
do not have an access. The code reading PICB/CIV registers second time
was added
to fix issues on unknown for now platform and it looks not possible
how to find/test
against this platform now. We have checked Windows drivers written by
Intel/AMD
(32/64 bit) and MacOS ones. There is no second reading of CIV/PICB
inside. We
hope that this is relay needed only for some rare hadware devices.


Ok, so if I understand correctly, this loop is a hack for broken
hardware, and this patch basically unhacks it back, assuming that the
emulated (or assigned) hardware is not broken.


The only thing we can is to improve detection code. Suggestions are
welcome.

I think it's fine to assume that you're not assigning a 2004 era sound
card to a guest.  So I think the code is fine as it is, and can only
suggest to add a comment explaining the mess.

Thanks for explaining.


no prob

Regard,
Den



Re: [Qemu-devel] Wierd hack to sound/pci/intel8x0.c

2011-11-06 Thread Denis V. Lunev

On 11/6/11 8:47 PM, Takashi Iwai wrote:

At Sun, 06 Nov 2011 18:31:42 +0200,
Avi Kivity wrote:

On 11/06/2011 06:15 PM, Denis V. Lunev wrote:

On 11/6/11 6:51 PM, Avi Kivity wrote:

The recently merged 228cf79376f1 ("ALSA: intel8x0: Improve performance
in virtual environment") is hacky and somewhat wrong.

First, the detection code

+   if (inside_vm<   0) {
+   /* detect KVM and Parallels virtual environments */
+   inside_vm = kvm_para_available();
+#if defined(__i386__) || defined(__x86_64__)
+   inside_vm = inside_vm ||
boot_cpu_has(X86_FEATURE_HYPERVISOR);
+#endif
+   }
+

is incorrect.  It detects that you're running in a guest, but that
doesn't imply that the device you're accessing is emulated.  It may be a
host device assigned to the guest; presumably the optimization you apply
doesn't work for real devices.

Second, the optimization itself looks fishy:

  spin_lock(&chip->reg_lock);
  do {
  civ = igetbyte(chip, ichdev->reg_offset +
ICH_REG_OFF_CIV);
  ptr1 = igetword(chip, ichdev->reg_offset +
ichdev->roff_picb);
  position = ichdev->position;
  if (ptr1 == 0) {
  udelay(10);
  continue;
  }
-   if (civ == igetbyte(chip, ichdev->reg_offset +
ICH_REG_OFF_CIV)&&
-   ptr1 == igetword(chip, ichdev->reg_offset +
ichdev->roff_picb))
+   if (civ != igetbyte(chip, ichdev->reg_offset +
ICH_REG_OFF_CIV))
+   continue;
+   if (chip->inside_vm)
+   break;
+   if (ptr1 == igetword(chip, ichdev->reg_offset +
ichdev->roff_picb))
  break;
  } while (timeout--);


Why is the emulated device timing out?  Can't the emulation be fixed to
behave like real hardware?

Last, please copy k...@vger.kernel.org on such issues.


The problem is that emulation can not be fixed.

How this is working for real hardware? You get data from real sound
card register.
The scheduling is off at the moment thus you can not be re-scheduled.

In the virtual environment the situation is different. Any IO
emulation is expensive.
The processor is switched from guest to hypervisor and further to
emulation process
takes a lot of time.  This time is enough to obtain different value on
next register read.
That's why this code is really timed out. Please also note that host
scheduler also
plays his games and could schedule out VCPU thread.


Note on kvm this is rare, since the guest thread and the emulator thread
are the same.


The problem could be potentially fixed reducing precision of PICB
emulation,
but this results in lower sound quality.

This kludge has been written this way in order not to break legacy
card for which we
do not have an access. The code reading PICB/CIV registers second time
was added
to fix issues on unknown for now platform and it looks not possible
how to find/test
against this platform now. We have checked Windows drivers written by
Intel/AMD
(32/64 bit) and MacOS ones. There is no second reading of CIV/PICB
inside. We
hope that this is relay needed only for some rare hadware devices.


Ok, so if I understand correctly, this loop is a hack for broken
hardware, and this patch basically unhacks it back, assuming that the
emulated (or assigned) hardware is not broken.

Exactly.  The loop itself is still needed, but the check can be
less restrictive.


The only thing we can is to improve detection code. Suggestions are
welcome.

I think it's fine to assume that you're not assigning a 2004 era sound
card to a guest.  So I think the code is fine as it is, and can only
suggest to add a comment explaining the mess.

True.  Denis, care to submit a patch?

sure! I'll do that tomorrow



Re: [Qemu-devel] [PULL 05/42] block/raw-posix: refactor handle_aiocb_write_zeroes a bit

2015-02-11 Thread Denis V. Lunev

On 12/02/15 05:29, Peter Maydell wrote:

On 6 February 2015 at 16:40, Kevin Wolf  wrote:

From: "Denis V. Lunev" 

move code dealing with a block device to a separate function. This will
allow to implement additional processing for ordinary files.
+static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
+{
+BDRVRawState *s = aiocb->bs->opaque;
+
+if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
+return handle_aiocb_write_zeroes_block(aiocb);
+}
+
+#ifdef CONFIG_XFS
+if (s->is_xfs) {
+return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
+}
+#endif
+
+return -ENOTSUP;
+}

Hi. This patch has introduced a new compiler warning on OSX:
block/raw-posix.c:947:19: warning: unused variable 's' [-Wunused-variable]
 BDRVRawState *s = aiocb->bs->opaque;
   ^

and indeed on any host which doesn't define any of CONFIG_XFS,
CONFIG_FALLOCATE, CONFIG_FALLOCATE_PUNCH_HOLE or
CONFIG_FALLOCATE_ZERO_RANGE.

What's your preferred fix? Surrounding the variable declaration
with #if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
would work but I dunno if some less ugly approach is possible.

[I don't yet build OSX with -Werror because we haven't fixed
some of the deprecation warnings about the audio APIs, but
I would like to eventually...]

thanks
-- PMM

we can just add
  (void)s;
immediately after the declaration.

On the other hand, CONFIG_FALLOCATE_PUNCH_HOLE and 
CONFIG_FALLOCATE_ZERO_RANGE

are completely impossible without CONFIG_FALLOCATE thus

#if defined(CONFIG_FALLOCATE) or defined(CONFIG_XFS)
would be enough.

Regards,
Den



[Qemu-devel] [PATCH 1/1] block/raw-posix: fix compilation warning on OSX

2015-02-11 Thread Denis V. Lunev
block/raw-posix.c:947:19: warning: unused variable 's' [-Wunused-variable]
BDRVRawState *s = aiocb->bs->opaque;

This variable is used only when on of the following macros are defined
CONFIG_XFS, CONFIG_FALLOCATE, CONFIG_FALLOCATE_PUNCH_HOLE or
CONFIG_FALLOCATE_ZERO_RANGE. Fortunately, CONFIG_FALLOCATE_PUNCH_HOLE
and CONFIG_FALLOCATE_ZERO_RANGE could be defined only along with
CONFIG_FALLOCATE. Therefore checking for CONFIG_XFS or CONFIG_FALLOCATE
would be enough.

Signed-off-by: Denis V. Lunev 
CC: Peter Maydell 
CC: Kevin Wolf 
---
 block/raw-posix.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e474c17..bf7d6f7 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -944,7 +944,9 @@ static ssize_t 
handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
 
 static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
 {
+#if defined(CONFIG_FALLOCATE) || defined(CONFIG_XFS)
 BDRVRawState *s = aiocb->bs->opaque;
+#endif
 
 if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
 return handle_aiocb_write_zeroes_block(aiocb);
-- 
1.9.1




Re: [Qemu-devel] [PATCH 0/3] qemu guest agent: support guest-file-* command for Windows

2015-02-12 Thread Denis V. Lunev

On 06/02/15 20:59, Denis V. Lunev wrote:

This was a part of patchset implemented guest-exec command.
It was suggested to submit it separately by Michael.

The set contains small refactoring to fix mingw 4.9.1 compilation
and safe part of the rework of posix file interface plus main
Windows file commands implementation.

Changes from previous version:
- compilation is fixed by gropping strtok_r at all
- error_setg_win32 is used for error decoding

Signed-off-by: Olga Krishtal 
Signed-off-by: Denis V. Lunev 
CC: Michael Roth 


ping



Re: [Qemu-devel] [PATCH v4 0/1] block: enforce minimal 4096 alignment in qemu_blockalign

2015-02-12 Thread Denis V. Lunev

On 06/02/15 20:37, Denis V. Lunev wrote:

The following sequence
 int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
 for (i = 0; i < 10; i++)
 write(fd, buf, 4096);
iperforms 5% better if buf is aligned to 4096 bytes rather then to
512 bytes.

I have used the following program to test
#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
 int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
 void *buf;
 int i = 0, align = atoi(argv[2]);

 do {
 buf = memalign(align, 4096);
 if (align >= 4096)
 break;
 if ((unsigned long)buf & 4095)
 break;
 i++;
 } while (1);
 printf("%d %p\n", i, buf);

 memset(buf, 0x11, 4096);

 for (i = 0; i < 10; i++) {
 lseek(fd, SEEK_CUR, 4096);
 write(fd, buf, 4096);
 }

 close(fd);
 return 0;
}
for in in `seq 1 30` ; do a.out aa ; done

The file was placed into 8 GB partition on HDD below to avoid speed
change due to different offset on disk. Results are reliable:
- 189 vs 180 seconds on Linux 3.16

The following setups have been tested:
1) ext4 with block size equals to 1024 over 512/512 physical/logical
sector size SSD disk
2) ext4 with block size equals to 4096 over 512/512 physical/logical
sector size SSD disk
3) ext4 with block size equals to 4096 over 512/4096 physical/logical
sector size rotational disk (WDC WD20EZRX)
4) xfs with block size equals to 4096 over 512/512 physical/logical
sector size SSD disk

The difference is quite reliable and the same 5%.
   qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
for image in qcow2 format is 1% faster.

Changes from v3:
- portable way to calculate system page size used
- 512/4096 values are replaced with proper macros/values

Changes from v2:
- opt_mem_alignment is split to opt_mem_alignment for bounce buffering
   and min_mem_alignment to check buffers coming from guest.

Changes from v1:
- enforces 4096 alignment in qemu_(try_)blockalign, avoid touching of
   bdrv_qiov_is_aligned path not to enforce additional bounce buffering
   as suggested by Paolo
- reduces 10% to 5% in patch description to better fit 180 vs 189
   difference

Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Kevin Wolf 


ping



Re: [Qemu-devel] [PATCH 1/2] block, raw-posix: replace 512/4096 constants with proper macros/values

2015-02-16 Thread Denis V. Lunev

On 16/02/15 13:32, Kevin Wolf wrote:

Am 06.02.2015 um 18:37 hat Denis V. Lunev geschrieben:

Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Kevin Wolf 
---
  block.c   | 10 +-
  block/raw-posix.c | 16 
  2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index d45e4dd..e98d651 100644
--- a/block.c
+++ b/block.c
@@ -225,8 +225,8 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
  size_t bdrv_opt_mem_align(BlockDriverState *bs)
  {
  if (!bs || !bs->drv) {
-/* 4k should be on the safe side */
-return 4096;
+/* page size should be on the safe side */
+return getpagesize();

Can we make this MAX(4096, getpagesize())? The 4k aren't chosen because
of buffer alignment in RAM, but because of a disk sector size of 4k is
the highest common value.


  }
  
  return bs->bl.opt_mem_alignment;

@@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
  bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
  bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
  } else {
-bs->bl.opt_mem_alignment = 512;
+bs->bl.opt_mem_alignment = BDRV_SECTOR_SIZE;

I wouldn't make this conversion. The value happens to be the same, but
BDRV_SECTOR_SIZE is just the arbitrary unit that the block layer uses
internally for things like sector_num/nb_sectors.

The 512 in this place, however, is the minimum alignment that hardware
could require, and logically independent of BDRV_SECTOR_SIZE.

Similar considerations apply to the other conversions of 512 in this
patch. They would all be better left as literal 512 (unless you want to
introduce new constants for their specific purpose).


  }
  
  if (bs->backing_hd) {

@@ -965,8 +965,8 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
  }
  
  bs->open_flags = flags;

-bs->guest_block_size = 512;
-bs->request_alignment = 512;
+bs->guest_block_size = BDRV_SECTOR_SIZE;
+bs->request_alignment = BDRV_SECTOR_SIZE;
  bs->zero_beyond_eof = true;
  open_flags = bdrv_open_flags(bs, flags);
  bs->read_only = !(open_flags & BDRV_O_RDWR);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 853ffa6..9848f83 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -122,7 +122,6 @@
 reopen it to see if the disk has been changed */
  #define FD_OPEN_TIMEOUT (10)
  
-#define MAX_BLOCKSIZE	4096
  
  typedef struct BDRVRawState {

  int fd;
@@ -222,6 +221,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
  BDRVRawState *s = bs->opaque;
  char *buf;
  unsigned int sector_size;
+size_t page_size = getpagesize();
  
  /* For /dev/sg devices the alignment is not really used.

 With buffered I/O, we don't have any restrictions. */
@@ -264,9 +264,9 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
  /* If we could not get the sizes so far, we can only guess them */
  if (!s->buf_align) {
  size_t align;
-buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
-for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
-if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
+buf = qemu_memalign(page_size, 2 * page_size);
+for (align = BDRV_SECTOR_SIZE; align <= page_size; align <<= 1) {
+if (pread(fd, buf + align, page_size, 0) >= 0) {
  s->buf_align = align;
  break;
  }

Here, I'd prefer MAX(getpagesize(), MAX_BLOCKSIZE) as well.

Kevin

this makes sense. OK



Re: [Qemu-devel] [PATCH 2/2] block: align bounce buffers to page

2015-02-16 Thread Denis V. Lunev

On 16/02/15 13:59, Kevin Wolf wrote:

Am 06.02.2015 um 18:37 hat Denis V. Lunev geschrieben:

The following sequence
 int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
 for (i = 0; i < 10; i++)
 write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes.

The difference is quite reliable.

On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes.

The patch introduces new concept: minimal memory alignment for bounce
buffers. Original so called "optimal" value is actually minimal required
value for aligment. Optimal should be set to page size by default.
There is no driver which should change this default at the moment.

Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Kevin Wolf 

I don't think the subject lines describes the patch correctly. IIUC,
patch 1 introduces a hard enforcement of page size alignment, and this
patch only loosens it again.

Perhaps the order of patches should be changed so that min_mem_alignment
is introduced first (without changes in the behaviour), and only then
opt_mem_alignment is increased.

Kevin

ok :)



Re: [Qemu-devel] [PATCH 03/10] guest agent: guest-file-open: refactoring

2015-02-17 Thread Denis V. Lunev

On 17/02/15 18:27, Eric Blake wrote:

On 02/16/2015 08:14 PM, Michael Roth wrote:

From: Simon Zolin 

Moved the code that sets non-blocking flag on fd into a separate function.

Signed-off-by: Simon Zolin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Michael Roth 
CC: Eric Blake 
Signed-off-by: Michael Roth 
---
  qga/commands-posix.c | 31 +++
  1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 57409d0..ed527a3 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -376,13 +376,33 @@ safe_open_or_create(const char *path, const char *mode, 
Error **errp)
  return NULL;
  }
  
+static int guest_file_toggle_flags(int fd, int flags, bool set, Error **err)

+{

Why are you reinventing qemu_set_nonblock()?


because we are uneducated :)

Anyway, qemu_set_nonblock() does not handle error
and resides in a strange header aka  "include/qemu/sockets.h"
Technically I can switch to it immediately. Though error
check condition will be lost.

What is better at your opinion?

a) return error from qemu_set_nonblock()/qemu_set_block()
b) drop error check here. The descriptor is just opened
and we know that it is valid. I could not imagine real
error other than broken descriptor for this exact fcntl.

Regards,
Den



Re: [Qemu-devel] [PATCH 03/10] guest agent: guest-file-open: refactoring

2015-02-17 Thread Denis V. Lunev

On 17/02/15 20:56, Michael Roth wrote:

Quoting Denis V. Lunev (2015-02-17 10:06:49)

On 17/02/15 18:27, Eric Blake wrote:

On 02/16/2015 08:14 PM, Michael Roth wrote:

From: Simon Zolin 

Moved the code that sets non-blocking flag on fd into a separate function.

Signed-off-by: Simon Zolin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Michael Roth 
CC: Eric Blake 
Signed-off-by: Michael Roth 
---
   qga/commands-posix.c | 31 +++
   1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 57409d0..ed527a3 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -376,13 +376,33 @@ safe_open_or_create(const char *path, const char *mode, 
Error **errp)
   return NULL;
   }

+static int guest_file_toggle_flags(int fd, int flags, bool set, Error **err)
+{

Why are you reinventing qemu_set_nonblock()?


because we are uneducated :)

Anyway, qemu_set_nonblock() does not handle error
and resides in a strange header aka  "include/qemu/sockets.h"
Technically I can switch to it immediately. Though error
check condition will be lost.

What is better at your opinion?

a) return error from qemu_set_nonblock()/qemu_set_block()


I think making the existing functions a non-error-checking
wrapper around qemu_set_{block,nonblock}_error or something
would be best.

These are meant to be os-agnostic utility functions though,
but in the case of qemu-ga the win32 variant won't work as
expected, so I'm not sure it's a good idea to rely on them.

If the lack of it's usage in net/tap* compared to other parts
of QEMU that build on w32 is any indication, that seems to
be the pattern followed by other users.

In any case, since I was actually the one who re-invented it,
and this code just moves it to another function, I think we
can address it as a seperate patch and leave the PULL
intact (unless there are other objections).



OK to me.

I'll check Win32 version with Olga to be somewhat
sure with upcoming cleanup.



Re: [Qemu-devel] [RESEND PATCH v3 0/2] balloon: add a feature bit to let Guest OS deflate virtio_balloon on OOM

2015-02-18 Thread Denis V. Lunev

On 29/01/15 17:24, Denis V. Lunev wrote:

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless it
is often the case that these control tools does not have enough time to
react on fast changing memory load. As a result OS runs out of memory and
invokes OOM-killer. The balancing of memory by use of the virtio balloon
should not cause the termination of processes while there are pages in the
balloon. Now there is no way for virtio balloon driver to free memory at
the last moment before some process get killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside Guest OS and is working in the cooperation with the host. Thus
some improvements from Guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make the
system to return and retry the allocation that forced the out of memory
killer to run.

This behavior should be enabled if and only if appropriate feature bit
is set on the device. It is off by default.

This functionality was recently merged into vanilla Linux (actually in
linux-next at the moment)

   commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
   Author: Raushaniya Maksudova 
   Date:   Mon Nov 10 09:36:29 2014 +1030

This patch adds respective control bits into QEMU. It introduces
deflate-on-oom option for baloon device which do the trick.

Changes from v2:
- fixed mistake with bit number in virtio_balloon_get_features

Changes from v1:
- From: in patch 1 according to the original ownership
- feature processing in patch 2 as suggested by Michael. It could be done
   without additional field, but this will require to move the property
   level up, i.e. to PCI & CCW level.

Signed-off-by: Raushaniya Maksudova 
Signed-off-by: Denis V. Lunev 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 


ping

any opinion on this? At least patch 1 should be committed.



Re: [Qemu-devel] [RESEND PATCH 1/2] balloon: call qdev_alias_all_properties for proxy dev in balloon class init

2015-02-19 Thread Denis V. Lunev

On 19/02/15 12:25, Michael S. Tsirkin wrote:

On Thu, Jan 29, 2015 at 05:24:41PM +0300, Denis V. Lunev wrote:

The idea is that all other virtio devices are calling this helper
to merge properties of the proxy device. This is the only difference
in between this helper and code in inside virtio_instance_init_common.
The patch should not cause any harm as property list in generic balloon
code is empty.

This also allows to avoid some dummy errors like fixed by this
 commit 91ba21208839643603e7f7fa5864723c3f371ebe
 Author: Gonglei 
 Date:   Tue Sep 30 14:10:35 2014 +0800
 virtio-balloon: fix virtio-balloon child refcount in transports

Signed-off-by: Denis V. Lunev 
Signed-off-by: Raushaniya Maksudova 
Revieved-by: Cornelia Huck 
CC: Christian Borntraeger 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 
---
  hw/s390x/virtio-ccw.c  | 5 ++---
  hw/virtio/virtio-pci.c | 5 ++---
  2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index ea236c9..82da894 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -899,9 +899,8 @@ static void balloon_ccw_stats_set_poll_interval(Object 
*obj, struct Visitor *v,
  static void virtio_ccw_balloon_instance_init(Object *obj)
  {
  VirtIOBalloonCcw *dev = VIRTIO_BALLOON_CCW(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
  object_property_add(obj, "guest-stats", "guest statistics",
  balloon_ccw_stats_get_all, NULL, NULL, dev, NULL);
  
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c

index dde1d73..745324b 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1316,9 +1316,8 @@ static void virtio_balloon_pci_class_init(ObjectClass 
*klass, void *data)
  static void virtio_balloon_pci_instance_init(Object *obj)
  {
  VirtIOBalloonPCI *dev = VIRTIO_BALLOON_PCI(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
  object_property_add(obj, "guest-stats", "guest statistics",
  balloon_pci_stats_get_all, NULL, NULL, dev,
  NULL);

OK, but what about this guest-stats property?
Should it get the same treatment?


--
1.9.1

hmm, IMHO no. init_common is actually do the following

void virtio_instance_init_common(Object *proxy_obj, void *data,
 size_t vdev_size, const char *vdev_name)
{
DeviceState *vdev = data;

object_initialize(vdev, vdev_size, vdev_name);
object_property_add_child(proxy_obj, "virtio-backend", 
OBJECT(vdev), NULL);

object_unref(OBJECT(vdev));
qdev_alias_all_properties(vdev, proxy_obj);
}

on the other hand there is the following code in s390

static void s390_virtio_net_instance_init(Object *obj)
{
VirtIONetS390 *dev = VIRTIO_NET_S390(obj);

virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
TYPE_VIRTIO_NET);
object_property_add_alias(obj, "bootindex", OBJECT(&dev->vdev),
  "bootindex", &error_abort);
}

which does not contain guest-stats property.



Re: [Qemu-devel] [RESEND PATCH 1/2] balloon: call qdev_alias_all_properties for proxy dev in balloon class init

2015-02-19 Thread Denis V. Lunev

On 19/02/15 12:39, Michael S. Tsirkin wrote:

On Thu, Feb 19, 2015 at 12:36:37PM +0300, Denis V. Lunev wrote:

On 19/02/15 12:25, Michael S. Tsirkin wrote:

On Thu, Jan 29, 2015 at 05:24:41PM +0300, Denis V. Lunev wrote:

The idea is that all other virtio devices are calling this helper
to merge properties of the proxy device. This is the only difference
in between this helper and code in inside virtio_instance_init_common.
The patch should not cause any harm as property list in generic balloon
code is empty.

This also allows to avoid some dummy errors like fixed by this
 commit 91ba21208839643603e7f7fa5864723c3f371ebe
 Author: Gonglei 
 Date:   Tue Sep 30 14:10:35 2014 +0800
 virtio-balloon: fix virtio-balloon child refcount in transports

Signed-off-by: Denis V. Lunev 
Signed-off-by: Raushaniya Maksudova 
Revieved-by: Cornelia Huck 
CC: Christian Borntraeger 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 
---
  hw/s390x/virtio-ccw.c  | 5 ++---
  hw/virtio/virtio-pci.c | 5 ++---
  2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index ea236c9..82da894 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -899,9 +899,8 @@ static void balloon_ccw_stats_set_poll_interval(Object 
*obj, struct Visitor *v,
  static void virtio_ccw_balloon_instance_init(Object *obj)
  {
  VirtIOBalloonCcw *dev = VIRTIO_BALLOON_CCW(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
  object_property_add(obj, "guest-stats", "guest statistics",
  balloon_ccw_stats_get_all, NULL, NULL, dev, NULL);
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index dde1d73..745324b 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1316,9 +1316,8 @@ static void virtio_balloon_pci_class_init(ObjectClass 
*klass, void *data)
  static void virtio_balloon_pci_instance_init(Object *obj)
  {
  VirtIOBalloonPCI *dev = VIRTIO_BALLOON_PCI(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
  object_property_add(obj, "guest-stats", "guest statistics",
  balloon_pci_stats_get_all, NULL, NULL, dev,
  NULL);

OK, but what about this guest-stats property?
Should it get the same treatment?


--
1.9.1

hmm, IMHO no. init_common is actually do the following

void virtio_instance_init_common(Object *proxy_obj, void *data,
  size_t vdev_size, const char *vdev_name)
{
 DeviceState *vdev = data;

 object_initialize(vdev, vdev_size, vdev_name);
 object_property_add_child(proxy_obj, "virtio-backend", OBJECT(vdev),
NULL);
 object_unref(OBJECT(vdev));
 qdev_alias_all_properties(vdev, proxy_obj);
}

on the other hand there is the following code in s390

static void s390_virtio_net_instance_init(Object *obj)
{
 VirtIONetS390 *dev = VIRTIO_NET_S390(obj);

 virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
 TYPE_VIRTIO_NET);
 object_property_add_alias(obj, "bootindex", OBJECT(&dev->vdev),
   "bootindex", &error_abort);
}

which does not contain guest-stats property.

But why doesn't it?
Seems like an obvious omission?


no it is not

cfind . | xargs fgrep "guest-stats"
./hw/virtio/virtio-pci.c: object_property_get(OBJECT(&dev->vdev), v, 
"guest-stats", errp);
./hw/virtio/virtio-pci.c: object_property_get(OBJECT(&dev->vdev), v, 
"guest-stats-polling-interval",
./hw/virtio/virtio-pci.c: object_property_set(OBJECT(&dev->vdev), v, 
"guest-stats-polling-interval",
./hw/virtio/virtio-pci.c:object_property_add(obj, "guest-stats", 
"guest statistics",
./hw/virtio/virtio-pci.c:object_property_add(obj, 
"guest-stats-polling-interval", "int",
./hw/virtio/virtio-balloon.c:visit_start_struct(v, NULL, 
"guest-stats", name, 0, &err);
./hw/virtio/virtio-balloon.c:object_property_add(OBJECT(dev), 
"guest-stats", "guest statistics",
./hw/virtio/virtio-balloon.c:object_property_add(OBJECT(dev), 
"guest-stats-polling-interval", "int",
./hw/s390x/virtio-ccw.c: obje

Re: [Qemu-devel] [RESEND PATCH 1/2] balloon: call qdev_alias_all_properties for proxy dev in balloon class init

2015-02-19 Thread Denis V. Lunev

On 19/02/15 13:17, Michael S. Tsirkin wrote:

On Thu, Feb 19, 2015 at 12:46:34PM +0300, Denis V. Lunev wrote:

On 19/02/15 12:39, Michael S. Tsirkin wrote:

On Thu, Feb 19, 2015 at 12:36:37PM +0300, Denis V. Lunev wrote:

On 19/02/15 12:25, Michael S. Tsirkin wrote:

On Thu, Jan 29, 2015 at 05:24:41PM +0300, Denis V. Lunev wrote:

The idea is that all other virtio devices are calling this helper
to merge properties of the proxy device. This is the only difference
in between this helper and code in inside virtio_instance_init_common.
The patch should not cause any harm as property list in generic balloon
code is empty.

This also allows to avoid some dummy errors like fixed by this
 commit 91ba21208839643603e7f7fa5864723c3f371ebe
 Author: Gonglei 
 Date:   Tue Sep 30 14:10:35 2014 +0800
 virtio-balloon: fix virtio-balloon child refcount in transports

Signed-off-by: Denis V. Lunev 
Signed-off-by: Raushaniya Maksudova 
Revieved-by: Cornelia Huck 
CC: Christian Borntraeger 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 
---
  hw/s390x/virtio-ccw.c  | 5 ++---
  hw/virtio/virtio-pci.c | 5 ++---
  2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index ea236c9..82da894 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -899,9 +899,8 @@ static void balloon_ccw_stats_set_poll_interval(Object 
*obj, struct Visitor *v,
  static void virtio_ccw_balloon_instance_init(Object *obj)
  {
  VirtIOBalloonCcw *dev = VIRTIO_BALLOON_CCW(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
  object_property_add(obj, "guest-stats", "guest statistics",
  balloon_ccw_stats_get_all, NULL, NULL, dev, NULL);
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index dde1d73..745324b 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1316,9 +1316,8 @@ static void virtio_balloon_pci_class_init(ObjectClass 
*klass, void *data)
  static void virtio_balloon_pci_instance_init(Object *obj)
  {
  VirtIOBalloonPCI *dev = VIRTIO_BALLOON_PCI(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
  object_property_add(obj, "guest-stats", "guest statistics",
  balloon_pci_stats_get_all, NULL, NULL, dev,
  NULL);

OK, but what about this guest-stats property?
Should it get the same treatment?


--
1.9.1

hmm, IMHO no. init_common is actually do the following

void virtio_instance_init_common(Object *proxy_obj, void *data,
  size_t vdev_size, const char *vdev_name)
{
 DeviceState *vdev = data;

 object_initialize(vdev, vdev_size, vdev_name);
 object_property_add_child(proxy_obj, "virtio-backend", OBJECT(vdev),
NULL);
 object_unref(OBJECT(vdev));
 qdev_alias_all_properties(vdev, proxy_obj);
}

on the other hand there is the following code in s390

static void s390_virtio_net_instance_init(Object *obj)
{
 VirtIONetS390 *dev = VIRTIO_NET_S390(obj);

 virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
 TYPE_VIRTIO_NET);
 object_property_add_alias(obj, "bootindex", OBJECT(&dev->vdev),
   "bootindex", &error_abort);
}

which does not contain guest-stats property.

But why doesn't it?
Seems like an obvious omission?


no it is not

cfind . | xargs fgrep "guest-stats"
./hw/virtio/virtio-pci.c: object_property_get(OBJECT(&dev->vdev), v,
"guest-stats", errp);
./hw/virtio/virtio-pci.c: object_property_get(OBJECT(&dev->vdev), v,
"guest-stats-polling-interval",
./hw/virtio/virtio-pci.c: object_property_set(OBJECT(&dev->vdev), v,
"guest-stats-polling-interval",
./hw/virtio/virtio-pci.c:object_property_add(obj, "guest-stats", "guest
statistics",
./hw/virtio/virtio-pci.c:object_property_add(obj,
"guest-stats-polling-interval", "int",
./hw/virtio/virtio-balloon.c:visit_start_struct(v, NULL, "guest-stats",
name, 0, &err);
./hw/virtio/virtio-balloon.c:object_property_add(OBJECT(dev),
"guest-stats", "guest statistics",
./hw/virtio/virtio-balloon.c:object_property_a

Re: [Qemu-devel] [PATCH 03/10] guest agent: guest-file-open: refactoring

2015-02-19 Thread Denis V. Lunev

On 17/02/15 20:59, Eric Blake wrote:

On 02/17/2015 10:56 AM, Michael Roth wrote:

In any case, since I was actually the one who re-invented it,
and this code just moves it to another function, I think we
can address it as a seperate patch and leave the PULL
intact (unless there are other objections).

Agree that a separate cleanup patch would be okay.


OK. There is a portable way to detect whether
the descriptor is a pipe or socket on Windows.
Thus generic helper is possible.

We'll provide you with one. ETA is something around
next week.



[Qemu-devel] [PATCH 1/2] balloon: call qdev_alias_all_properties for proxy dev in balloon class init

2015-02-26 Thread Denis V. Lunev
The idea is that all other virtio devices are calling this helper
to merge properties of the proxy device. This is the only difference
in between this helper and code in inside virtio_instance_init_common.
The patch should not cause any harm as property list in generic balloon
code is empty.

This also allows to avoid some dummy errors like fixed by this
commit 91ba21208839643603e7f7fa5864723c3f371ebe
Author: Gonglei 
Date:   Tue Sep 30 14:10:35 2014 +0800
virtio-balloon: fix virtio-balloon child refcount in transports

Signed-off-by: Denis V. Lunev 
Signed-off-by: Raushaniya Maksudova 
Revieved-by: Cornelia Huck 
CC: Christian Borntraeger 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 
Signed-off-by: Denis V. Lunev 
---
 hw/s390x/virtio-ccw.c  | 5 ++---
 hw/virtio/virtio-pci.c | 5 ++---
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 3fee4aa..ffbb9c2 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -898,9 +898,8 @@ static void balloon_ccw_stats_set_poll_interval(Object 
*obj, struct Visitor *v,
 static void virtio_ccw_balloon_instance_init(Object *obj)
 {
 VirtIOBalloonCcw *dev = VIRTIO_BALLOON_CCW(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
 object_property_add(obj, "guest-stats", "guest statistics",
 balloon_ccw_stats_get_all, NULL, NULL, dev, NULL);
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 6dd41b9..e7baf7b 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1274,9 +1274,8 @@ static void virtio_balloon_pci_class_init(ObjectClass 
*klass, void *data)
 static void virtio_balloon_pci_instance_init(Object *obj)
 {
 VirtIOBalloonPCI *dev = VIRTIO_BALLOON_PCI(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
 object_property_add(obj, "guest-stats", "guest statistics",
 balloon_pci_stats_get_all, NULL, NULL, dev,
 NULL);
-- 
1.9.1




[Qemu-devel] [PATCH 2/2] balloon: add a feature bit to let Guest OS deflate balloon on oom

2015-02-26 Thread Denis V. Lunev
Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless it
is often the case that these control tools does not have enough time to
react on fast changing memory load. As a result OS runs out of memory and
invokes OOM-killer. The balancing of memory by use of the virtio balloon
should not cause the termination of processes while there are pages in the
balloon. Now there is no way for virtio balloon driver to free memory at
the last moment before some process get killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside Guest OS and is working in the cooperation with the host. Thus
some improvements from Guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make the
system return and retry the allocation that forced the out of memory
killer to run.

This behavior should be enabled if and only if appropriate feature bit
is set on the device. It is off by default.

This functionality was recently merged into vanilla Linux.

  commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
  Author: Raushaniya Maksudova 
  Date:   Mon Nov 10 09:36:29 2014 +1030

This patch adds respective control bits into QEMU. It introduces
deflate-on-oom option for balloon device which does the trick.

Signed-off-by: Denis V. Lunev 
CC: Raushaniya Maksudova 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 
---
 hw/virtio/virtio-balloon.c | 6 --
 include/hw/virtio/virtio-balloon.h | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 21e449a..cbc5f7f 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -305,8 +305,8 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 
 static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f)
 {
-f |= (1 << VIRTIO_BALLOON_F_STATS_VQ);
-return f;
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+return f | (1u << VIRTIO_BALLOON_F_STATS_VQ) | dev->host_features;
 }
 
 static void virtio_balloon_stat(void *opaque, BalloonInfo *info)
@@ -409,6 +409,8 @@ static void virtio_balloon_device_unrealize(DeviceState 
*dev, Error **errp)
 }
 
 static Property virtio_balloon_properties[] = {
+DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 4ab8f54..7f49b1f 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -36,6 +36,7 @@ typedef struct VirtIOBalloon {
 QEMUTimer *stats_timer;
 int64_t stats_last_update;
 int64_t stats_poll_interval;
+uint32_t host_features;
 } VirtIOBalloon;
 
 #endif
-- 
1.9.1




Re: [Qemu-devel] [PATCH v2 0/8] qemu: guest agent: implement guest-exec command for Linux

2015-01-27 Thread Denis V. Lunev

On 31/12/14 16:06, Denis V. Lunev wrote:

hese patches for guest-agent add the functionality to execute commands on
a guest UNIX machine.

These patches add the following interfaces:

guest-pipe-open
guest-exec
guest-exec-status

With these interfaces it's possible to:

* Open an anonymous pipe and work with it as with a file using already
implemented interfaces guest-file-{read,write,flush,close}.

* Execute a binary or a script on a guest machine.
We can pass arbitrary arguments and environment to a new child process.

* Pass redirected IO from/to stdin, stdout, stderr from a child process to a
local file or an anonymous pipe.

* Check the status of a child process, get its exit status if it exited or get
signal number if it was killed.

We plan to add support for Windows in the near future using the same interfaces
we introduce here.

Example of usage:

{"execute": "guest-pipe-open", "arguments":{"mode": "r"}}
{'return': 1000}

{"execute":"guest-exec", "arguments":{ "path": "id", "params": ["user"],
 "env": ["MYENV=myvalue"], "handle_stdout": 1000 }}'
{"return": 2636}

{"execute": "guest-exec-status", "arguments": {"pid": 2636}}
{"return":{"exit":0,"handle_stderr":-1,"handle_stdin":-1,"handle_stdout":1000,"signal":-1}}

{"execute": "guest-file-read", "arguments": {"handle": 1000, "count":128}}
{"return":{"count":58,"buf-b64":"dWlk...","eof":true}}

{"execute": "guest-file-close", "arguments": {"handle": 1000}}
{"return":{}}


These patches are based on the patches proposed by Michael Roth in 2011:
http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00722.html

We made several modifications to the interfaces proposed by Michael to simplify
their usage and we also added several new functions:

* Arguments to an executable file are passed as an array of strings.

 Before that we had to pass parameters like this:
 'params': [ {'param': '-b'}, {'param': '-n1'} ]

 Now we may pass them just as an array of strings:
 'params': [ '-b', '-n1' ]

* Environment can be passed to a child process.

 "env": ["MYENV=myvalue"]

* Removed "detach" argument from "guest-exec" - it never waits for a child
process to signal.  With this change it's possible to return just PID from
"guest-exec", a user must call "guest-exec-status" to get the status of a child.

* Removed "wait" argument from "guest-exec-status" - waiting for a child process
to signal is dangerous, because it may block the whole qemu-ga.  Instead, the
command polls just once a status of a child.

* "guest-exec-status" returns exit status of a child process or a signal number,
in case a process was killed.

* Simplified the command "guest-pipe-open" - the way how it stores the pipe
fd's.  No additional logic is needed about which end of pipe should be closed
with "guest-file-close".  This way we avoid modifying the interface of the
existing "guest-file-close".

In the conversation about the original patches there was a suggestion to merge
"path" and "params" into one single list.  But we didn't do so, because having
separate "path" is similar to exec*() family functions in UNIX.  That way it
looks more consistent.

Changes from v1:
- Windows version of the patchset is added
- SIGPIPE processing is added for Unix version of the patchset
- added guest_exec_file_busy() to prevent GuestFileHandle* object from being
   deleted in case it's used in guest-exec command.

Signed-off-by: Semen Zolin 
Signed-off-by: Olga Krishtal 
Signed-off-by: Denis V. Lunev 


ping



[Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-01-27 Thread Denis V. Lunev
fallocate() works fine and could handle properly with arbitrary size
requests. There is no sense to reduce the amount of space to fallocate.
The bigger is the size, the better is the performance as the amount of
journal updates is reduced.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index fa05239..e0b35c9 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -292,6 +292,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 }
 }
 
+static void raw_probe_max_write_zeroes(BlockDriverState *bs)
+{
+BDRVRawState *s = bs->opaque;
+struct stat st;
+
+if (fstat(s->fd, &st) < 0) {
+return; /* no problem, keep default value */
+}
+if (!S_ISREG(st.st_mode) || !s->discard_zeroes) {
+return;
+}
+bs->bl.max_write_zeroes = INT_MAX;
+}
+
 static void raw_parse_flags(int bdrv_flags, int *open_flags)
 {
 assert(open_flags != NULL);
@@ -598,6 +612,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 /* Fail already reopen_prepare() if we can't get a working O_DIRECT
  * alignment with the new fd. */
 if (raw_s->fd != -1) {
+raw_probe_max_write_zeroes(state->bs);
 raw_probe_alignment(state->bs, raw_s->fd, &local_err);
 if (local_err) {
 qemu_close(raw_s->fd);
@@ -651,6 +666,8 @@ static void raw_refresh_limits(BlockDriverState *bs, Error 
**errp)
 
 raw_probe_alignment(bs, s->fd, errp);
 bs->bl.opt_mem_alignment = s->buf_align;
+
+raw_probe_max_write_zeroes(bs);
 }
 
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
-- 
1.9.1




[Qemu-devel] [PATCH 3/7] block/raw-posix: refactor handle_aiocb_write_zeroes a bit

2015-01-27 Thread Denis V. Lunev
move code dealing with a block device to a separate function. This will
allow to implement additional processing for ordinary files.

Pls note, that xfs_code has been moved before checking for
s->has_write_zeroes as xfs_write_zeroes does not touch this flag inside.
This makes code a bit more consistent.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 48 +---
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2aa268a..24e1fab 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -914,41 +914,51 @@ static int do_fallocate(int fd, int mode, off_t offset, 
off_t len)
 }
 #endif
 
-static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
+static ssize_t handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
 {
-int ret = -EOPNOTSUPP;
+int ret = -ENOTSUP;
 BDRVRawState *s = aiocb->bs->opaque;
 
-if (s->has_write_zeroes == 0) {
+if (!s->has_write_zeroes) {
 return -ENOTSUP;
 }
 
-if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
 #ifdef BLKZEROOUT
-do {
-uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
-if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
-return 0;
-}
-} while (errno == EINTR);
-
-ret = -errno;
-#endif
-} else {
-#ifdef CONFIG_XFS
-if (s->is_xfs) {
-return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
+do {
+uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
+if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
+return 0;
 }
+} while (errno == EINTR);
+
+ret = translate_err(-errno);
 #endif
-}
 
-ret = translate_err(ret);
 if (ret == -ENOTSUP) {
 s->has_write_zeroes = false;
 }
 return ret;
 }
 
+static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
+{
+int ret = -ENOTSUP;
+BDRVRawState *s = aiocb->bs->opaque;
+
+if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
+return handle_aiocb_write_zeroes_block(aiocb);
+}
+
+#ifdef CONFIG_XFS
+if (s->is_xfs) {
+return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
+}
+#endif
+
+s->has_write_zeroes = false;
+return ret;
+}
+
 static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
-- 
1.9.1




[Qemu-devel] [PATCH 1/7] block/raw-posix: create translate_err helper to merge errno values

2015-01-27 Thread Denis V. Lunev
actually the code
if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
ret == -ENOTTY) {
ret = -ENOTSUP;
}
is present twice and will be added a couple more times. Create helper
for this.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e51293a..24300d0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -893,6 +893,15 @@ static int xfs_discard(BDRVRawState *s, int64_t offset, 
uint64_t bytes)
 }
 #endif
 
+static int translate_err(int err)
+{
+if (err == -ENODEV || err == -ENOSYS || err == -EOPNOTSUPP ||
+err == -ENOTTY) {
+err = -ENOTSUP;
+}
+return err;
+}
+
 static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
@@ -921,10 +930,9 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 #endif
 }
 
-if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
-ret == -ENOTTY) {
+ret = translate_err(ret);
+if (ret == -ENOTSUP) {
 s->has_write_zeroes = false;
-ret = -ENOTSUP;
 }
 return ret;
 }
@@ -968,10 +976,9 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 #endif
 }
 
-if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
-ret == -ENOTTY) {
+ret = translate_err(ret);
+if (ret == -ENOTSUP) {
 s->has_discard = false;
-ret = -ENOTSUP;
 }
 return ret;
 }
-- 
1.9.1




[Qemu-devel] [PATCH 6/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-27 Thread Denis V. Lunev
There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply call
fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index c039bef..fa05239 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,7 +60,7 @@
 #define FS_NOCOW_FL 0x0080 /* Do not cow file */
 #endif
 #endif
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
+#ifdef CONFIG_FALLOCATE
 #include 
 #endif
 #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -902,7 +902,7 @@ static int translate_err(int err)
 return err;
 }
 
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
+#ifdef CONFIG_FALLOCATE
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
 {
 do {
@@ -981,6 +981,12 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE
+if (aiocb->aio_offset >= aiocb->bs->total_sectors << BDRV_SECTOR_BITS) {
+return do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
+}
+#endif
+
 s->has_write_zeroes = false;
 return ret;
 }
-- 
1.9.1




[Qemu-devel] [PATCH 4/7] block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes

2015-01-27 Thread Denis V. Lunev
This efficiently writes zeroes on Linux if the kernel is capable enough.
FALLOC_FL_ZERO_RANGE correctly handles all cases, including and not
including file expansion.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 16 ++--
 configure | 19 +++
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 24e1fab..3c35b2f 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,7 +60,7 @@
 #define FS_NOCOW_FL 0x0080 /* Do not cow file */
 #endif
 #endif
-#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
 #include 
 #endif
 #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -902,7 +902,7 @@ static int translate_err(int err)
 return err;
 }
 
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
 {
 do {
@@ -955,6 +955,18 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE_ZERO_RANGE
+if (s->has_write_zeroes) {
+ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0 || ret != -ENOTSUP) {
+return ret;
+}
+s->has_write_zeroes = false;
+return ret;
+}
+#endif
+
 s->has_write_zeroes = false;
 return ret;
 }
diff --git a/configure b/configure
index f185dd0..e00e03a 100755
--- a/configure
+++ b/configure
@@ -3335,6 +3335,22 @@ if compile_prog "" "" ; then
   fallocate_punch_hole=yes
 fi
 
+# check that fallocate supports range zeroing inside the file
+fallocate_zero_range=no
+cat > $TMPC << EOF
+#include 
+#include 
+
+int main(void)
+{
+fallocate(0, FALLOC_FL_ZERO_RANGE, 0, 0);
+return 0;
+}
+EOF
+if compile_prog "" "" ; then
+  fallocate_zero_range=yes
+fi
+
 # check for posix_fallocate
 posix_fallocate=no
 cat > $TMPC << EOF
@@ -4567,6 +4583,9 @@ fi
 if test "$fallocate_punch_hole" = "yes" ; then
   echo "CONFIG_FALLOCATE_PUNCH_HOLE=y" >> $config_host_mak
 fi
+if test "$fallocate_zero_range" = "yes" ; then
+  echo "CONFIG_FALLOCATE_ZERO_RANGE=y" >> $config_host_mak
+fi
 if test "$posix_fallocate" = "yes" ; then
   echo "CONFIG_POSIX_FALLOCATE=y" >> $config_host_mak
 fi
-- 
1.9.1




[Qemu-devel] [PATCH v4 0/7] eliminate data write in bdrv_write_zeroes on Linux in raw-posix.c

2015-01-27 Thread Denis V. Lunev
These patches eliminate data writes completely on Linux if fallocate
FALLOC_FL_ZERO_RANGE or FALLOC_FL_PUNCH_HOLE are  supported on
underlying filesystem.

I have performed several tests with non-aligned fallocate calls and
in all cases (with non-aligned fallocates) Linux performs fine, i.e.
areas are zeroed correctly. Checks were made on
Linux 3.16.0-28-generic #38-Ubuntu SMP

This should seriously increase performance of bdrv_write_zeroes

Changes from v3:
- dropped original patch 1, equivalent stuff was merged already
- reordered patches as suggested by Fam
- fixes spelling errors as suggested by Fam
- fixed not initialized value in handle_aiocb_write_zeroes
- fixed wrong error processing from do_fallocate(FALLOC_FL_ZERO_RANGE)

Changes from v2:
- added Peter Lieven to CC
- added CONFIG_FALLOCATE check to call do_fallocate in patch 7
- dropped patch 1 as NACK-ed
- added processing of very large data areas in bdrv_co_write_zeroes (new
 patch 1)
- set bl.max_write_zeroes to INT_MAX in raw-posix.c for regular files
 (new patch 8)

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 



[Qemu-devel] [PATCH 2/7] block/raw-posix: create do_fallocate helper

2015-01-27 Thread Denis V. Lunev
The pattern
do {
if (fallocate(s->fd, mode, offset, len) == 0) {
return 0;
}
} while (errno == EINTR);
ret = translate_err(-errno);
will be commonly useful in next patches. Create helper for it.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 24300d0..2aa268a 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -902,6 +902,18 @@ static int translate_err(int err)
 return err;
 }
 
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)
+static int do_fallocate(int fd, int mode, off_t offset, off_t len)
+{
+do {
+if (fallocate(fd, mode, offset, len) == 0) {
+return 0;
+}
+} while (errno == EINTR);
+return translate_err(-errno);
+}
+#endif
+
 static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
@@ -965,14 +977,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 #endif
 
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
-do {
-if (fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
-  aiocb->aio_offset, aiocb->aio_nbytes) == 0) {
-return 0;
-}
-} while (errno == EINTR);
-
-ret = -errno;
+ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
 #endif
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH 5/7] block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes

2015-01-27 Thread Denis V. Lunev
This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported.
Unfortunately, FALLOC_FL_ZERO_RANGE is supported on really modern systems
and only for a couple of filesystems. FALLOC_FL_PUNCH_HOLE is much more
mature.

The sequence of 2 operations FALLOC_FL_PUNCH_HOLE and 0 is necessary due
to the following reasons:
- FALLOC_FL_PUNCH_HOLE creates a hole in the file, the file becomes
  sparse. In order to retain original functionality we must allocate
  disk space afterwards. This is done using fallocate(0) call
- fallocate(0) without preceeding FALLOC_FL_PUNCH_HOLE will do nothing
  if called above already allocated areas of the file, i.e. the content
  will not be zeroed

This should increase the performance a bit for not-so-modern kernels.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 3c35b2f..c039bef 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -967,6 +967,20 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+if (s->has_discard) {
+ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret < 0) {
+if (ret == -ENOTSUP) {
+s->has_discard = false;
+}
+return ret;
+}
+return do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
+}
+#endif
+
 s->has_write_zeroes = false;
 return ret;
 }
-- 
1.9.1




Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-01-27 Thread Denis V. Lunev

On 27/01/15 21:05, Max Reitz wrote:

On 2015-01-27 at 08:51, Denis V. Lunev wrote:

fallocate() works fine and could handle properly with arbitrary size
requests.


Maybe "could properly handle arbitrary size requests" (or 
"...arbitrarily sized requests")?



There is no sense to reduce the amount of space to fallocate.
The bigger is the size, the better is the performance as the amount of
journal updates is reduced.


True for fallocate(). But is it true for xfs_write_zeroes(), too? I 
guess so, but I don't know.


If it does, the patch looks good to me.

Max


Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
  block/raw-posix.c | 17 +
  1 file changed, 17 insertions(+)

thank you very much for a review. I will proceed with these findings,
they look quite reasonable.



Re: [Qemu-devel] [PATCH 6/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-27 Thread Denis V. Lunev

On 27/01/15 20:57, Max Reitz wrote:

On 2015-01-27 at 08:51, Denis V. Lunev wrote:

There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply 
call

fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
  block/raw-posix.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index c039bef..fa05239 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,7 +60,7 @@
  #define FS_NOCOW_FL 0x0080 /* Do not cow 
file */

  #endif
  #endif
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)

+#ifdef CONFIG_FALLOCATE


This change doesn't seem right; CONFIG_FALLOCATE is set if 
posix_fallocate() is available, not for the Linux-specific fallocate() 
from linux/falloc.h.




here is a check for fallocate and posix_fallocate in configure script

# check for fallocate
fallocate=no
cat > $TMPC << EOF
#include 

int main(void)
{
fallocate(0, 0, 0, 0);
return 0;
}
EOF
if compile_prog "" "" ; then
  fallocate=yes
fi
...
# check for posix_fallocate
posix_fallocate=no
cat > $TMPC << EOF
#include 

int main(void)
{
posix_fallocate(0, 0, 0);
return 0;
}
EOF
if compile_prog "" "" ; then
posix_fallocate=yes
fi
...
if test "$fallocate" = "yes" ; then
  echo "CONFIG_FALLOCATE=y" >> $config_host_mak
fi
...
if test "$posix_fallocate" = "yes" ; then
  echo "CONFIG_POSIX_FALLOCATE=y" >> $config_host_mak
fi

Thus my check looks correct to me.


  #include 
  #endif
  #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -902,7 +902,7 @@ static int translate_err(int err)
  return err;
  }
  -#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)

+#ifdef CONFIG_FALLOCATE


Same here.


  static int do_fallocate(int fd, int mode, off_t offset, off_t len)
  {
  do {
@@ -981,6 +981,12 @@ static ssize_t 
handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)

  }
  #endif
  +#ifdef CONFIG_FALLOCATE
+if (aiocb->aio_offset >= aiocb->bs->total_sectors << 
BDRV_SECTOR_BITS) {
+return do_fallocate(s->fd, 0, aiocb->aio_offset, 
aiocb->aio_nbytes);

+}
+#endif
+


This seems fine though, but as I've asked in patch 5: Do we want to 
have a "has_fallocate"?


Other than that, this is the first usage of bs->total_sectors in this 
file; raw_co_get_block_status() does a similar check, but it uses 
bdrv_getlength() instead. If bs->total_sectors is correct, 
bdrv_getlength() will actually do nothing but return bs->total_sectors 
* BDRV_SECTOR_SIZE; it will only do more (that is, update 
bs->total_sectors) if it is not correct to use bs->total_sectors (and 
I feel like it may not be correct because 
BlockDriver.has_variable_length is true).


Max


ok, will do



Re: [Qemu-devel] [PATCH 6/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-27 Thread Denis V. Lunev

On 27/01/15 21:24, Max Reitz wrote:

On 2015-01-27 at 13:19, Denis V. Lunev wrote:

On 27/01/15 20:57, Max Reitz wrote:

On 2015-01-27 at 08:51, Denis V. Lunev wrote:
There is a possibility that we are extending our image and thus 
writing

zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could 
simply call

fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
  block/raw-posix.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index c039bef..fa05239 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,7 +60,7 @@
  #define FS_NOCOW_FL 0x0080 /* Do not cow 
file */

  #endif
  #endif
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)

+#ifdef CONFIG_FALLOCATE


This change doesn't seem right; CONFIG_FALLOCATE is set if 
posix_fallocate() is available, not for the Linux-specific 
fallocate() from linux/falloc.h.




here is a check for fallocate and posix_fallocate in configure script

# check for fallocate
fallocate=no
cat > $TMPC << EOF
#include 

int main(void)
{
fallocate(0, 0, 0, 0);
return 0;
}
EOF
if compile_prog "" "" ; then
  fallocate=yes
fi
...
# check for posix_fallocate
posix_fallocate=no
cat > $TMPC << EOF
#include 

int main(void)
{
posix_fallocate(0, 0, 0);
return 0;
}
EOF
if compile_prog "" "" ; then
posix_fallocate=yes
fi
...
if test "$fallocate" = "yes" ; then
  echo "CONFIG_FALLOCATE=y" >> $config_host_mak
fi
...
if test "$posix_fallocate" = "yes" ; then
  echo "CONFIG_POSIX_FALLOCATE=y" >> $config_host_mak
fi

Thus my check looks correct to me.


Oh, sorry, I somehow mixed those checks. You're right.

Very well then; maybe you want to mention this change in the commit 
message, though?


Max


no prob



Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-01-27 Thread Denis V. Lunev

On 27/01/15 21:05, Max Reitz wrote:

On 2015-01-27 at 08:51, Denis V. Lunev wrote:

fallocate() works fine and could handle properly with arbitrary size
requests.


Maybe "could properly handle arbitrary size requests" (or 
"...arbitrarily sized requests")?



There is no sense to reduce the amount of space to fallocate.
The bigger is the size, the better is the performance as the amount of
journal updates is reduced.


True for fallocate(). But is it true for xfs_write_zeroes(), too? I 
guess so, but I don't know.


If it does, the patch looks good to me.

Max



checked.

xfs_ioc_space (ioctl handler) calls exactly the same xfs_zero_file_space as
performed by xfs_file_fallocate on fallocate path.

I will reflect this in the description

Regards,
Den



[Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-01-28 Thread Denis V. Lunev
fallocate() works fine and could handle properly with arbitrary size
requests. There is no sense to reduce the amount of space to fallocate.
The bigger is the size, the better is the performance as the amount of
journal updates is reduced.

The patch changes behavior for both generic filesystem and XFS codepaths,
which are different in handle_aiocb_write_zeroes. The implementation
of falocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
thus the change is fine for both ways.

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 3db911a..ec38fee 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 }
 }
 
+static void raw_probe_max_write_zeroes(BlockDriverState *bs)
+{
+BDRVRawState *s = bs->opaque;
+struct stat st;
+
+if (fstat(s->fd, &st) < 0) {
+return; /* no problem, keep default value */
+}
+if (!S_ISREG(st.st_mode) || !s->discard_zeroes) {
+return;
+}
+bs->bl.max_write_zeroes = INT_MAX;
+}
+
 static void raw_parse_flags(int bdrv_flags, int *open_flags)
 {
 assert(open_flags != NULL);
@@ -600,6 +614,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 /* Fail already reopen_prepare() if we can't get a working O_DIRECT
  * alignment with the new fd. */
 if (raw_s->fd != -1) {
+raw_probe_max_write_zeroes(state->bs);
 raw_probe_alignment(state->bs, raw_s->fd, &local_err);
 if (local_err) {
 qemu_close(raw_s->fd);
@@ -653,6 +668,8 @@ static void raw_refresh_limits(BlockDriverState *bs, Error 
**errp)
 
 raw_probe_alignment(bs, s->fd, errp);
 bs->bl.opt_mem_alignment = s->buf_align;
+
+raw_probe_max_write_zeroes(bs);
 }
 
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
-- 
1.9.1




[Qemu-devel] [PATCH 3/7] block/raw-posix: refactor handle_aiocb_write_zeroes a bit

2015-01-28 Thread Denis V. Lunev
move code dealing with a block device to a separate function. This will
allow to implement additional processing for ordinary files.

Please note, that xfs_code has been moved before checking for
s->has_write_zeroes as xfs_write_zeroes does not touch this flag inside.
This makes code a bit more consistent.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 46 +++---
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2aa268a..d3910fd 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -914,41 +914,49 @@ static int do_fallocate(int fd, int mode, off_t offset, 
off_t len)
 }
 #endif
 
-static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
+static ssize_t handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
 {
-int ret = -EOPNOTSUPP;
+int ret = -ENOTSUP;
 BDRVRawState *s = aiocb->bs->opaque;
 
-if (s->has_write_zeroes == 0) {
+if (!s->has_write_zeroes) {
 return -ENOTSUP;
 }
 
-if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
 #ifdef BLKZEROOUT
-do {
-uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
-if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
-return 0;
-}
-} while (errno == EINTR);
-
-ret = -errno;
-#endif
-} else {
-#ifdef CONFIG_XFS
-if (s->is_xfs) {
-return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
+do {
+uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
+if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
+return 0;
 }
+} while (errno == EINTR);
+
+ret = translate_err(-errno);
 #endif
-}
 
-ret = translate_err(ret);
 if (ret == -ENOTSUP) {
 s->has_write_zeroes = false;
 }
 return ret;
 }
 
+static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
+{
+BDRVRawState *s = aiocb->bs->opaque;
+
+if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
+return handle_aiocb_write_zeroes_block(aiocb);
+}
+
+#ifdef CONFIG_XFS
+if (s->is_xfs) {
+return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
+}
+#endif
+
+return -ENOTSUP;
+}
+
 static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
-- 
1.9.1




[Qemu-devel] [PATCH 2/7] block/raw-posix: create do_fallocate helper

2015-01-28 Thread Denis V. Lunev
The pattern
do {
if (fallocate(s->fd, mode, offset, len) == 0) {
return 0;
}
} while (errno == EINTR);
ret = translate_err(-errno);
will be commonly useful in next patches. Create helper for it.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 24300d0..2aa268a 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -902,6 +902,18 @@ static int translate_err(int err)
 return err;
 }
 
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)
+static int do_fallocate(int fd, int mode, off_t offset, off_t len)
+{
+do {
+if (fallocate(fd, mode, offset, len) == 0) {
+return 0;
+}
+} while (errno == EINTR);
+return translate_err(-errno);
+}
+#endif
+
 static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
@@ -965,14 +977,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 #endif
 
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
-do {
-if (fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
-  aiocb->aio_offset, aiocb->aio_nbytes) == 0) {
-return 0;
-}
-} while (errno == EINTR);
-
-ret = -errno;
+ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
 #endif
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH v5 0/7] eliminate data write in bdrv_write_zeroes on Linux in raw-posix.c

2015-01-28 Thread Denis V. Lunev
These patches eliminate data writes completely on Linux if fallocate
FALLOC_FL_ZERO_RANGE or FALLOC_FL_PUNCH_HOLE are  supported on
underlying filesystem.

I have performed several tests with non-aligned fallocate calls and
in all cases (with non-aligned fallocates) Linux performs fine, i.e.
areas are zeroed correctly. Checks were made on
Linux 3.16.0-28-generic #38-Ubuntu SMP

This should seriously increase performance of bdrv_write_zeroes

Changes from v4:
- comments to patches are improved by Max Reitz suggestions
- replaced 'return ret' with 'return -ENOTSUP' handle_aiocb_write_zeroes
  The idea is that we can reach that point only if original ret was
  equal to -ENOTSUP
- added has_fallocate flag to indicate that fallocate is working for
  given BDS
- checked file length with bdrv_getlength

Changes from v3:
- dropped original patch 1, equivalent stuff was merged already
- reordered patches as suggested by Fam
- fixes spelling errors as suggested by Fam
- fixed not initialized value in handle_aiocb_write_zeroes
- fixed wrong error processing from do_fallocate(FALLOC_FL_ZERO_RANGE)

Changes from v2:
- added Peter Lieven to CC
- added CONFIG_FALLOCATE check to call do_fallocate in patch 7
- dropped patch 1 as NACK-ed
- added processing of very large data areas in bdrv_co_write_zeroes (new
  patch 1)
- set bl.max_write_zeroes to INT_MAX in raw-posix.c for regular files
  (new patch 8)

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 




[Qemu-devel] [PATCH 5/7] block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes

2015-01-28 Thread Denis V. Lunev
This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported.
Unfortunately, FALLOC_FL_ZERO_RANGE is supported on really modern systems
and only for a couple of filesystems. FALLOC_FL_PUNCH_HOLE is much more
mature.

The sequence of 2 operations FALLOC_FL_PUNCH_HOLE and 0 is necessary due
to the following reasons:
- FALLOC_FL_PUNCH_HOLE creates a hole in the file, the file becomes
  sparse. In order to retain original functionality we must allocate
  disk space afterwards. This is done using fallocate(0) call
- fallocate(0) without preceeding FALLOC_FL_PUNCH_HOLE will do nothing
  if called above already allocated areas of the file, i.e. the content
  will not be zeroed

This should increase the performance a bit for not-so-modern kernels.

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 5a777e7..2e24829 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -965,6 +965,21 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+if (s->has_discard) {
+int ret = do_fallocate(s->fd,
+   FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0) {
+ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
+}
+if (ret != -ENOTSUP) {
+return ret;
+}
+s->has_discard = false;
+}
+#endif
+
 return -ENOTSUP;
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH 6/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-28 Thread Denis V. Lunev
There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply call
fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Before the patch do_fallocate was used when either
CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
Now the story is different. CONFIG_FALLOCATE is defined when Linux
fallocate is defined, posix_fallocate is completely different story
(CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
thus we are on the safe side.

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2e24829..3db911a 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -147,6 +147,7 @@ typedef struct BDRVRawState {
 bool has_discard:1;
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
+bool has_fallocate;
 bool needs_alignment;
 } BDRVRawState;
 
@@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 }
 if (S_ISREG(st.st_mode)) {
 s->discard_zeroes = true;
+s->has_fallocate = true;
 }
 if (S_ISBLK(st.st_mode)) {
 #ifdef BLKDISCARDZEROES
@@ -902,7 +904,7 @@ static int translate_err(int err)
 return err;
 }
 
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
+#ifdef CONFIG_FALLOCATE
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
 {
 do {
@@ -980,6 +982,16 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE
+if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
+int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0 || ret != -ENOTSUP) {
+return ret;
+}
+s->has_fallocate = false;
+}
+#endif
+
 return -ENOTSUP;
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH 1/7] block/raw-posix: create translate_err helper to merge errno values

2015-01-28 Thread Denis V. Lunev
actually the code
if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
ret == -ENOTTY) {
ret = -ENOTSUP;
}
is present twice and will be added a couple more times. Create helper
for this.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e51293a..24300d0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -893,6 +893,15 @@ static int xfs_discard(BDRVRawState *s, int64_t offset, 
uint64_t bytes)
 }
 #endif
 
+static int translate_err(int err)
+{
+if (err == -ENODEV || err == -ENOSYS || err == -EOPNOTSUPP ||
+err == -ENOTTY) {
+err = -ENOTSUP;
+}
+return err;
+}
+
 static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
@@ -921,10 +930,9 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 #endif
 }
 
-if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
-ret == -ENOTTY) {
+ret = translate_err(ret);
+if (ret == -ENOTSUP) {
 s->has_write_zeroes = false;
-ret = -ENOTSUP;
 }
 return ret;
 }
@@ -968,10 +976,9 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 #endif
 }
 
-if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
-ret == -ENOTTY) {
+ret = translate_err(ret);
+if (ret == -ENOTSUP) {
 s->has_discard = false;
-ret = -ENOTSUP;
 }
 return ret;
 }
-- 
1.9.1




[Qemu-devel] [PATCH 4/7] block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes

2015-01-28 Thread Denis V. Lunev
This efficiently writes zeroes on Linux if the kernel is capable enough.
FALLOC_FL_ZERO_RANGE correctly handles all cases, including and not
including file expansion.

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 15 +--
 configure | 19 +++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index d3910fd..5a777e7 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,7 +60,7 @@
 #define FS_NOCOW_FL 0x0080 /* Do not cow file */
 #endif
 #endif
-#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
 #include 
 #endif
 #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -902,7 +902,7 @@ static int translate_err(int err)
 return err;
 }
 
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
 {
 do {
@@ -954,6 +954,17 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE_ZERO_RANGE
+if (s->has_write_zeroes) {
+int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0 || ret != -ENOTSUP) {
+return ret;
+}
+s->has_write_zeroes = false;
+}
+#endif
+
 return -ENOTSUP;
 }
 
diff --git a/configure b/configure
index f185dd0..e00e03a 100755
--- a/configure
+++ b/configure
@@ -3335,6 +3335,22 @@ if compile_prog "" "" ; then
   fallocate_punch_hole=yes
 fi
 
+# check that fallocate supports range zeroing inside the file
+fallocate_zero_range=no
+cat > $TMPC << EOF
+#include 
+#include 
+
+int main(void)
+{
+fallocate(0, FALLOC_FL_ZERO_RANGE, 0, 0);
+return 0;
+}
+EOF
+if compile_prog "" "" ; then
+  fallocate_zero_range=yes
+fi
+
 # check for posix_fallocate
 posix_fallocate=no
 cat > $TMPC << EOF
@@ -4567,6 +4583,9 @@ fi
 if test "$fallocate_punch_hole" = "yes" ; then
   echo "CONFIG_FALLOCATE_PUNCH_HOLE=y" >> $config_host_mak
 fi
+if test "$fallocate_zero_range" = "yes" ; then
+  echo "CONFIG_FALLOCATE_ZERO_RANGE=y" >> $config_host_mak
+fi
 if test "$posix_fallocate" = "yes" ; then
   echo "CONFIG_POSIX_FALLOCATE=y" >> $config_host_mak
 fi
-- 
1.9.1




[Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096

2015-01-28 Thread Denis V. Lunev
The following sequence
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
for (i = 0; i < 10; i++)
write(fd, buf, 4096);
performs 10% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block.c   | 4 ++--
 block/raw-posix.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index d45e4dd..bc5d1e7 100644
--- a/block.c
+++ b/block.c
@@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
 bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
 } else {
-bs->bl.opt_mem_alignment = 512;
+bs->bl.opt_mem_alignment = 4096;
 }
 
 if (bs->backing_hd) {
@@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
 
 bs->open_flags = flags;
 bs->guest_block_size = 512;
-bs->request_alignment = 512;
+bs->request_alignment = 4096;
 bs->zero_beyond_eof = true;
 open_flags = bdrv_open_flags(bs, flags);
 bs->read_only = !(open_flags & BDRV_O_RDWR);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index ec38fee..d1b3388 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 if (!s->buf_align) {
 size_t align;
 buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
-for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
 if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
 s->buf_align = align;
 break;
@@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 if (!bs->request_alignment) {
 size_t align;
 buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
-for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
 if (pread(fd, buf, align, 0) >= 0) {
 bs->request_alignment = align;
 break;
-- 
1.9.1




[Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests

2015-01-28 Thread Denis V. Lunev
The following sequence
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
for (i = 0; i < 10; i++)
write(fd, buf, 4096);
performs 10% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

I have used the following program to test
#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
void *buf;
int i = 0;

do {
buf = memalign(512, 4096); <--- replace 512 with 4096
if ((unsigned long)buf & 4095)
break;
i++;
} while (1);
printf("%d\n", i);

memset(buf, 0x11, 4096);

for (i = 0; i < 10; i++)
write(fd, buf, 4096);

close(fd);
return 0;
}
time for in in `seq 1 30` ; do a.out aa ; done

The file was placed into 8 GB partition on HDD below to avoid speed
change due to different offset on disk. Results are reliable:
- 189 vs 180 seconds on Linux 3.16

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 

hades ~/src/qemu # hdparm -I /dev/sdg

/dev/sdg:

ATA device, with non-removable media
Model Number:   WDC WD20EZRX-07D8PB0
Serial Number:  WD-WCC4M5LVSAEP
Firmware Revision:  80.00A80
Transport:  Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, 
SATA Rev 2.6, SATA Rev 3.0
Standards:
Supported: 9 8 7 6 5
Likely used: 9
Configuration:
Logical max current
cylinders   16383   16383
heads   16  16
sectors/track   63  63
--
CHS current addressable sectors:   16514064
LBAuser addressable sectors:  268435455
LBA48  user addressable sectors: 3907029168
Logical  Sector size:   512 bytes
Physical Sector size:  4096 bytes
device size with M = 1024*1024: 1907729 MBytes
device size with M = 1000*1000: 2000398 MBytes (2000 GB)
cache/buffer size  = unknown
Nominal Media Rotation Rate: 5400
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16  Current = 16
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
 Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
Enabled Supported:
   *SMART feature set
Security Mode feature set
   *Power Management feature set
   *Write cache
   *Look-ahead
   *Host Protected Area feature set
   *WRITE_BUFFER command
   *READ_BUFFER command
   *NOP cmd
   *DOWNLOAD_MICROCODE
Power-Up In Standby feature set
   *SET_FEATURES required to spinup after power up
SET_MAX security extension
   *48-bit Address feature set
   *Device Configuration Overlay feature set
   *Mandatory FLUSH_CACHE
   *FLUSH_CACHE_EXT
   *SMART error logging
   *SMART self-test
   *General Purpose Logging feature set
   *64-bit World wide name
   *WRITE_UNCORRECTABLE_EXT command
   *{READ,WRITE}_DMA_EXT_GPL commands
   *Segmented DOWNLOAD_MICROCODE
   *Gen1 signaling speed (1.5Gb/s)
   *Gen2 signaling speed (3.0Gb/s)
   *Gen3 signaling speed (6.0Gb/s)
   *Native Command Queueing (NCQ)
   *Host-initiated interface power management
   *Phy event counters
   *NCQ priority information
   *READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
   *DMA Setup Auto-Activate optimization
Device-initiated interface power management
   *Software settings preservation
   *SMART Command Transport (SCT) feature set
   *SCT Write Same (AC2)
   *SCT Features Control (AC4)
   *SCT Data Tables (AC5)
unknown 206[12] (vendor specific)
unknown 206[13] (vendor specific)
unknown 206[14] (vendor specific)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
276min for SECURITY ERASE UNIT. 276min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee2b5da838c
NAA : 5
IEEE OUI: 0014ee
Unique ID   : 2b5da838c
Checksum: correct
hades ~/src/qemu #




Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096

2015-01-28 Thread Denis V. Lunev

On 28/01/15 21:49, Denis V. Lunev wrote:

The following sequence
 int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
 for (i = 0; i < 10; i++)
 write(fd, buf, 4096);
performs 10% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
  block.c   | 4 ++--
  block/raw-posix.c | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index d45e4dd..bc5d1e7 100644
--- a/block.c
+++ b/block.c
@@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
  bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
  bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
  } else {
-bs->bl.opt_mem_alignment = 512;
+bs->bl.opt_mem_alignment = 4096;
  }
  
  if (bs->backing_hd) {

@@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
  
  bs->open_flags = flags;

  bs->guest_block_size = 512;
-bs->request_alignment = 512;
+bs->request_alignment = 4096;
  bs->zero_beyond_eof = true;
  open_flags = bdrv_open_flags(bs, flags);
  bs->read_only = !(open_flags & BDRV_O_RDWR);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index ec38fee..d1b3388 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
  if (!s->buf_align) {
  size_t align;
  buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
-for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
  if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
  s->buf_align = align;
  break;
@@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
  if (!bs->request_alignment) {
  size_t align;
  buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
-for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
  if (pread(fd, buf, align, 0) >= 0) {
  bs->request_alignment = align;
  break;

sorry, the patch is wrong. It breaks 'make check-block'.
I will redo it and perform more testing.

request-alignment related changes are wrong :(
I have run tests without them but added them as
a obvious last minute addition.



Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096

2015-01-28 Thread Denis V. Lunev

On 28/01/15 23:07, Paolo Bonzini wrote:


On 28/01/2015 19:49, Denis V. Lunev wrote:

The following sequence
 int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
 for (i = 0; i < 10; i++)
 write(fd, buf, 4096);
performs 10% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

The 10% difference, however, is probably not enough to cover the cost of
providing a bounce buffer if a guest is (rightfully) using a 512-byte
aligned buffer: bs->bl.opt_mem_alignment is in fact badly named and it
should be bs->bl.min_mem_alignment instead.

Instead, you probably should patch bdrv_opt_mem_align to return at least
4096, and leave the detection logic intact.  This will let
qemu_blockalign return a properly aligned buffer to qemu-img and other
in-process allocations, without negatively affecting the guest.

Thanks,

Paolo

ok, this looks good to me :)



Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
  block.c   | 4 ++--
  block/raw-posix.c | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index d45e4dd..bc5d1e7 100644
--- a/block.c
+++ b/block.c
@@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
  bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
  bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
  } else {
-bs->bl.opt_mem_alignment = 512;
+bs->bl.opt_mem_alignment = 4096;
  }
  
  if (bs->backing_hd) {

@@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
  
  bs->open_flags = flags;

  bs->guest_block_size = 512;
-bs->request_alignment = 512;
+bs->request_alignment = 4096;
  bs->zero_beyond_eof = true;
  open_flags = bdrv_open_flags(bs, flags);
  bs->read_only = !(open_flags & BDRV_O_RDWR);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index ec38fee..d1b3388 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
  if (!s->buf_align) {
  size_t align;
  buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
-for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
  if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
  s->buf_align = align;
  break;
@@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
  if (!bs->request_alignment) {
  size_t align;
  buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
-for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
  if (pread(fd, buf, align, 0) >= 0) {
  bs->request_alignment = align;
  break;






[Qemu-devel] [PATCH v2 0/1] block: enforce minimal 4096 alignment in qemu_blockalign

2015-01-29 Thread Denis V. Lunev
The following sequence
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
for (i = 0; i < 10; i++)
write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

I have used the following program to test
#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
void *buf;
int i = 0;

do {
buf = memalign(512, 4096); <--- replace 512 with 4096
if ((unsigned long)buf & 4095)
break;
i++;
} while (1);
printf("%d\n", i);

memset(buf, 0x11, 4096);

for (i = 0; i < 10; i++)
write(fd, buf, 4096);

close(fd);
return 0;
}
time for in in `seq 1 30` ; do a.out aa ; done

The file was placed into 8 GB partition on HDD below to avoid speed
change due to different offset on disk. Results are reliable:
- 189 vs 180 seconds on Linux 3.16

Changes from v1:
- enforces 4096 alignment in qemu_(try_)blockalign, avoid touching of
  bdrv_qiov_is_aligned path not to enforce additional bounce buffering
  as suggested by Paolo
- reduces 10% to 5% in patch description to better fit 180 vs 189
  difference

Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 

hades ~/src/qemu # hdparm -I /dev/sdg

/dev/sdg:

ATA device, with non-removable media
Model Number:   WDC WD20EZRX-07D8PB0
Serial Number:  WD-WCC4M5LVSAEP
Firmware Revision:  80.00A80
Transport:  Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, 
SATA Rev 2.6, SATA Rev 3.0
Standards:
Supported: 9 8 7 6 5
Likely used: 9
Configuration:
Logical max current
cylinders   16383   16383
heads   16  16
sectors/track   63  63
--
CHS current addressable sectors:   16514064
LBAuser addressable sectors:  268435455
LBA48  user addressable sectors: 3907029168
Logical  Sector size:   512 bytes
Physical Sector size:  4096 bytes
device size with M = 1024*1024: 1907729 MBytes
device size with M = 1000*1000: 2000398 MBytes (2000 GB)
cache/buffer size  = unknown
Nominal Media Rotation Rate: 5400
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16  Current = 16
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
 Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
Enabled Supported:
   *SMART feature set
Security Mode feature set
   *Power Management feature set
   *Write cache
   *Look-ahead
   *Host Protected Area feature set
   *WRITE_BUFFER command
   *READ_BUFFER command
   *NOP cmd
   *DOWNLOAD_MICROCODE
Power-Up In Standby feature set
   *SET_FEATURES required to spinup after power up
SET_MAX security extension
   *48-bit Address feature set
   *Device Configuration Overlay feature set
   *Mandatory FLUSH_CACHE
   *FLUSH_CACHE_EXT
   *SMART error logging
   *SMART self-test
   *General Purpose Logging feature set
   *64-bit World wide name
   *WRITE_UNCORRECTABLE_EXT command
   *{READ,WRITE}_DMA_EXT_GPL commands
   *Segmented DOWNLOAD_MICROCODE
   *Gen1 signaling speed (1.5Gb/s)
   *Gen2 signaling speed (3.0Gb/s)
   *Gen3 signaling speed (6.0Gb/s)
   *Native Command Queueing (NCQ)
   *Host-initiated interface power management
   *Phy event counters
   *NCQ priority information
   *READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
   *DMA Setup Auto-Activate optimization
Device-initiated interface power management
   *Software settings preservation
   *SMART Command Transport (SCT) feature set
   *SCT Write Same (AC2)
   *SCT Features Control (AC4)
   *SCT Data Tables (AC5)
unknown 206[12] (vendor specific)
unknown 206[13] (vendor specific)
unknown 206[14] (vendor specific)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
276min for SECURITY ERASE UNIT. 276min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee2b5da838c
NAA : 5
IEEE OUI: 0014ee
Unique ID   : 2b5da838c
Checksum: correct
hades ~/src/qemu #



[Qemu-devel] [PATCH 1/1] block: enforce minimal 4096 alignment in qemu_blockalign

2015-01-29 Thread Denis V. Lunev
The following sequence
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
for (i = 0; i < 10; i++)
write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes. This patch
forces page alignment when we really forced to perform memory
allocation.

Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
 block.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index d45e4dd..38cf73f 100644
--- a/block.c
+++ b/block.c
@@ -5293,7 +5293,11 @@ void bdrv_set_guest_block_size(BlockDriverState *bs, int 
align)
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
 {
-return qemu_memalign(bdrv_opt_mem_align(bs), size);
+size_t align = bdrv_opt_mem_align(bs);
+if (align < 4096) {
+align = 4096;
+}
+return qemu_memalign(align, size);
 }
 
 void *qemu_blockalign0(BlockDriverState *bs, size_t size)
@@ -5307,6 +5311,9 @@ void *qemu_try_blockalign(BlockDriverState *bs, size_t 
size)
 
 /* Ensure that NULL is never returned on success */
 assert(align > 0);
+if (align < 4096) {
+align = 4096;
+}
 if (size == 0) {
 size = align;
 }
-- 
1.9.1




Re: [Qemu-devel] [PATCH 1/1] block: enforce minimal 4096 alignment in qemu_blockalign

2015-01-29 Thread Denis V. Lunev

On 29/01/15 16:18, Kevin Wolf wrote:

Am 29.01.2015 um 11:50 hat Denis V. Lunev geschrieben:

The following sequence
 int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
 for (i = 0; i < 10; i++)
 write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes. This patch
forces page alignment when we really forced to perform memory
allocation.

Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
---
  block.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index d45e4dd..38cf73f 100644
--- a/block.c
+++ b/block.c
@@ -5293,7 +5293,11 @@ void bdrv_set_guest_block_size(BlockDriverState *bs, int 
align)
  
  void *qemu_blockalign(BlockDriverState *bs, size_t size)

  {
-return qemu_memalign(bdrv_opt_mem_align(bs), size);
+size_t align = bdrv_opt_mem_align(bs);
+if (align < 4096) {
+align = 4096;
+}
+return qemu_memalign(align, size);
  }
  
  void *qemu_blockalign0(BlockDriverState *bs, size_t size)

@@ -5307,6 +5311,9 @@ void *qemu_try_blockalign(BlockDriverState *bs, size_t 
size)
  
  /* Ensure that NULL is never returned on success */

  assert(align > 0);
+if (align < 4096) {
+align = 4096;
+}
  if (size == 0) {
  size = align;
  }

This is the wrong place to make this change. First you're duplicating
logic in the callers of bdrv_opt_mem_align() instead of making it return
the right thing in the first place.

This has been actually done in the first iteration. bdrv_opt_mem_align
is called actually three times in:
  qemu_blockalign
  qemu_try_blockalign
  bdrv_qiov_is_aligned
Paolo says that he does not want to have bdrv_qiov_is_aligned affected
to avoid extra bounce buffering.

From my point of view this extra bounce buffering is better than unaligned
pointer during write to the disk as 512/4096 logical/physical sectors size
disks are mainstream now. Though I don't want to specially argue here.
Normal guest operations results in page aligned requests and this is not
a problem at all. The amount of 512 aligned requests from guest side is
quite negligible.

  Second, you're arguing with numbers
from a simple test case for O_DIRECT on Linux, but you're changing the
alignment for everyone instead of just the raw-posix driver which is
responsible for accessing Linux files.

This should not be a real problem. We are allocation memory for the
buffer. A little bit stricter alignment is not a big overhead for any libc
implementation thus this kludge will not produce any significant overhead.

Also, what's the real reason for the performance improvement? Having
page alignment? If so, actually querying the page size instead of
assuming 4k might be worth a thought.

Kevin

Most likely the problem comes from the read-modify-write pattern
either in kernel or in disk. Actually my experience says that it is a
bad idea to supply 512 byte aligned buffer for O_DIRECT IO.
ABI technically allows this but in general it is much less tested.

Yes, this synthetic test shows some difference here. In terms of
qemu-io the result is also visible, but less
  qemu-img create -f qcow2 ./1.img 64G
  qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
performs 1% better.

There is also similar kludge here
size_t bdrv_opt_mem_align(BlockDriverState *bs)
{
if (!bs || !bs->drv) {
/* 4k should be on the safe side */
return 4096;
}

return bs->bl.opt_mem_alignment;
}
which just uses 4096 constant.

Yes, I could agree that queering page size could be a good idea, but
I do not know at the moment how to do that. Can you pls share your
opinion if you have any.

Regards,
Den



[Qemu-devel] [RESEND PATCH 1/2] balloon: call qdev_alias_all_properties for proxy dev in balloon class init

2015-01-29 Thread Denis V. Lunev
The idea is that all other virtio devices are calling this helper
to merge properties of the proxy device. This is the only difference
in between this helper and code in inside virtio_instance_init_common.
The patch should not cause any harm as property list in generic balloon
code is empty.

This also allows to avoid some dummy errors like fixed by this
commit 91ba21208839643603e7f7fa5864723c3f371ebe
Author: Gonglei 
Date:   Tue Sep 30 14:10:35 2014 +0800
virtio-balloon: fix virtio-balloon child refcount in transports

Signed-off-by: Denis V. Lunev 
Signed-off-by: Raushaniya Maksudova 
Revieved-by: Cornelia Huck 
CC: Christian Borntraeger 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 
---
 hw/s390x/virtio-ccw.c  | 5 ++---
 hw/virtio/virtio-pci.c | 5 ++---
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index ea236c9..82da894 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -899,9 +899,8 @@ static void balloon_ccw_stats_set_poll_interval(Object 
*obj, struct Visitor *v,
 static void virtio_ccw_balloon_instance_init(Object *obj)
 {
 VirtIOBalloonCcw *dev = VIRTIO_BALLOON_CCW(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
 object_property_add(obj, "guest-stats", "guest statistics",
 balloon_ccw_stats_get_all, NULL, NULL, dev, NULL);
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index dde1d73..745324b 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1316,9 +1316,8 @@ static void virtio_balloon_pci_class_init(ObjectClass 
*klass, void *data)
 static void virtio_balloon_pci_instance_init(Object *obj)
 {
 VirtIOBalloonPCI *dev = VIRTIO_BALLOON_PCI(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
 object_property_add(obj, "guest-stats", "guest statistics",
 balloon_pci_stats_get_all, NULL, NULL, dev,
 NULL);
-- 
1.9.1




[Qemu-devel] [RESEND PATCH v3 0/2] balloon: add a feature bit to let Guest OS deflate virtio_balloon on OOM

2015-01-29 Thread Denis V. Lunev
Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless it
is often the case that these control tools does not have enough time to
react on fast changing memory load. As a result OS runs out of memory and
invokes OOM-killer. The balancing of memory by use of the virtio balloon
should not cause the termination of processes while there are pages in the
balloon. Now there is no way for virtio balloon driver to free memory at
the last moment before some process get killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside Guest OS and is working in the cooperation with the host. Thus
some improvements from Guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make the
system to return and retry the allocation that forced the out of memory
killer to run.

This behavior should be enabled if and only if appropriate feature bit
is set on the device. It is off by default.

This functionality was recently merged into vanilla Linux (actually in
linux-next at the moment)

  commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
  Author: Raushaniya Maksudova 
  Date:   Mon Nov 10 09:36:29 2014 +1030

This patch adds respective control bits into QEMU. It introduces
deflate-on-oom option for baloon device which do the trick.

Changes from v2:
- fixed mistake with bit number in virtio_balloon_get_features

Changes from v1:
- From: in patch 1 according to the original ownership
- feature processing in patch 2 as suggested by Michael. It could be done
  without additional field, but this will require to move the property
  level up, i.e. to PCI & CCW level.

Signed-off-by: Raushaniya Maksudova 
Signed-off-by: Denis V. Lunev 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 




[Qemu-devel] [RESEND PATCH 2/2] balloon: add a feature bit to let Guest OS deflate balloon on oom

2015-01-29 Thread Denis V. Lunev
Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless it
is often the case that these control tools does not have enough time to
react on fast changing memory load. As a result OS runs out of memory and
invokes OOM-killer. The balancing of memory by use of the virtio balloon
should not cause the termination of processes while there are pages in the
balloon. Now there is no way for virtio balloon driver to free memory at
the last moment before some process get killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside Guest OS and is working in the cooperation with the host. Thus
some improvements from Guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make the
system to return and retry the allocation that forced the out of memory
killer to run.

This behavior should be enabled if and only if appropriate feature bit
is set on the device. It is off by default.

This functionality was recently merged into vanilla Linux.

  commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
  Author: Raushaniya Maksudova 
  Date:   Mon Nov 10 09:36:29 2014 +1030

This patch adds respective control bits into QEMU. It introduces
deflate-on-oom option for baloon device which do the trick.

Signed-off-by: Denis V. Lunev 
CC: Raushaniya Maksudova 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 
---
 hw/virtio/virtio-balloon.c | 6 --
 include/hw/virtio/virtio-balloon.h | 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 7bfbb75..a54d026 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -305,8 +305,8 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 
 static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f)
 {
-f |= (1 << VIRTIO_BALLOON_F_STATS_VQ);
-return f;
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+return f | (1u << VIRTIO_BALLOON_F_STATS_VQ) | dev->host_features;
 }
 
 static void virtio_balloon_stat(void *opaque, BalloonInfo *info)
@@ -409,6 +409,8 @@ static void virtio_balloon_device_unrealize(DeviceState 
*dev, Error **errp)
 }
 
 static Property virtio_balloon_properties[] = {
+DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index f863bfe..2e1ccd9 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -30,6 +30,7 @@
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ 1   /* Memory stats virtqueue */
+#define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -67,6 +68,7 @@ typedef struct VirtIOBalloon {
 QEMUTimer *stats_timer;
 int64_t stats_last_update;
 int64_t stats_poll_interval;
+uint32_t host_features;
 } VirtIOBalloon;
 
 #endif
-- 
1.9.1




Re: [Qemu-devel] [PATCH 6/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-29 Thread Denis V. Lunev

On 30/01/15 01:50, Max Reitz wrote:

On 2015-01-28 at 13:38, Denis V. Lunev wrote:

There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply 
call

fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Before the patch do_fallocate was used when either
CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
Now the story is different. CONFIG_FALLOCATE is defined when Linux
fallocate is defined, posix_fallocate is completely different story
(CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
thus we are on the safe side.

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
  block/raw-posix.c | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2e24829..3db911a 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -147,6 +147,7 @@ typedef struct BDRVRawState {
  bool has_discard:1;
  bool has_write_zeroes:1;
  bool discard_zeroes:1;
+bool has_fallocate;
  bool needs_alignment;
  } BDRVRawState;
  @@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState 
*bs, QDict *options,

  }
  if (S_ISREG(st.st_mode)) {
  s->discard_zeroes = true;
+s->has_fallocate = true;
  }
  if (S_ISBLK(st.st_mode)) {
  #ifdef BLKDISCARDZEROES
@@ -902,7 +904,7 @@ static int translate_err(int err)
  return err;
  }
  -#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)

+#ifdef CONFIG_FALLOCATE
  static int do_fallocate(int fd, int mode, off_t offset, off_t len)
  {
  do {
@@ -980,6 +982,16 @@ static ssize_t 
handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)

  }
  #endif
  +#ifdef CONFIG_FALLOCATE
+if (s->has_fallocate && aiocb->aio_offset >= 
bdrv_getlength(aiocb->bs)) {
+int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, 
aiocb->aio_nbytes);

+if (ret == 0 || ret != -ENOTSUP) {
+return ret;
+}
+s->has_fallocate = false;
+}
+#endif
+
  return -ENOTSUP;
  }


Now that you do have has_fallocate, I think you should be using it in 
patch 5 as well. So I think you should either you make this patch add 
it in the area touched by patch 5, or you introduce has_fallocate in 
patch 5 already and use it there.


Max

OK. No problem. I do not think that it is ever possible, but
why not?

I will reorder these patches in the patchset to minimize changes.



[Qemu-devel] [PATCH 6/7] block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes

2015-01-30 Thread Denis V. Lunev
This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported.
Unfortunately, FALLOC_FL_ZERO_RANGE is supported on really modern systems
and only for a couple of filesystems. FALLOC_FL_PUNCH_HOLE is much more
mature.

The sequence of 2 operations FALLOC_FL_PUNCH_HOLE and 0 is necessary due
to the following reasons:
- FALLOC_FL_PUNCH_HOLE creates a hole in the file, the file becomes
  sparse. In order to retain original functionality we must allocate
  disk space afterwards. This is done using fallocate(0) call
- fallocate(0) without preceeding FALLOC_FL_PUNCH_HOLE will do nothing
  if called above already allocated areas of the file, i.e. the content
  will not be zeroed

This should increase the performance a bit for not-so-modern kernels.

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 1c88ad8..7b42f37 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -967,6 +967,25 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+if (s->has_discard && s->has_fallocate) {
+int ret = do_fallocate(s->fd,
+   FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0) {
+ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0 || ret != -ENOTSUP) {
+return ret;
+}
+s->has_fallocate = false;
+} else if (ret != -ENOTSUP) {
+return ret;
+} else {
+s->has_discard = false;
+}
+}
+#endif
+
 #ifdef CONFIG_FALLOCATE
 if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
 int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
-- 
1.9.1




[Qemu-devel] [PATCH 4/7] block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes

2015-01-30 Thread Denis V. Lunev
This efficiently writes zeroes on Linux if the kernel is capable enough.
FALLOC_FL_ZERO_RANGE correctly handles all cases, including and not
including file expansion.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 15 +--
 configure | 19 +++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index d3910fd..5a777e7 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,7 +60,7 @@
 #define FS_NOCOW_FL 0x0080 /* Do not cow file */
 #endif
 #endif
-#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
 #include 
 #endif
 #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -902,7 +902,7 @@ static int translate_err(int err)
 return err;
 }
 
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
 {
 do {
@@ -954,6 +954,17 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE_ZERO_RANGE
+if (s->has_write_zeroes) {
+int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0 || ret != -ENOTSUP) {
+return ret;
+}
+s->has_write_zeroes = false;
+}
+#endif
+
 return -ENOTSUP;
 }
 
diff --git a/configure b/configure
index f185dd0..e00e03a 100755
--- a/configure
+++ b/configure
@@ -3335,6 +3335,22 @@ if compile_prog "" "" ; then
   fallocate_punch_hole=yes
 fi
 
+# check that fallocate supports range zeroing inside the file
+fallocate_zero_range=no
+cat > $TMPC << EOF
+#include 
+#include 
+
+int main(void)
+{
+fallocate(0, FALLOC_FL_ZERO_RANGE, 0, 0);
+return 0;
+}
+EOF
+if compile_prog "" "" ; then
+  fallocate_zero_range=yes
+fi
+
 # check for posix_fallocate
 posix_fallocate=no
 cat > $TMPC << EOF
@@ -4567,6 +4583,9 @@ fi
 if test "$fallocate_punch_hole" = "yes" ; then
   echo "CONFIG_FALLOCATE_PUNCH_HOLE=y" >> $config_host_mak
 fi
+if test "$fallocate_zero_range" = "yes" ; then
+  echo "CONFIG_FALLOCATE_ZERO_RANGE=y" >> $config_host_mak
+fi
 if test "$posix_fallocate" = "yes" ; then
   echo "CONFIG_POSIX_FALLOCATE=y" >> $config_host_mak
 fi
-- 
1.9.1




[Qemu-devel] [PATCH 2/7] block/raw-posix: create do_fallocate helper

2015-01-30 Thread Denis V. Lunev
The pattern
do {
if (fallocate(s->fd, mode, offset, len) == 0) {
return 0;
}
} while (errno == EINTR);
ret = translate_err(-errno);
will be commonly useful in next patches. Create helper for it.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 24300d0..2aa268a 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -902,6 +902,18 @@ static int translate_err(int err)
 return err;
 }
 
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)
+static int do_fallocate(int fd, int mode, off_t offset, off_t len)
+{
+do {
+if (fallocate(fd, mode, offset, len) == 0) {
+return 0;
+}
+} while (errno == EINTR);
+return translate_err(-errno);
+}
+#endif
+
 static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
@@ -965,14 +977,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 #endif
 
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
-do {
-if (fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
-  aiocb->aio_offset, aiocb->aio_nbytes) == 0) {
-return 0;
-}
-} while (errno == EINTR);
-
-ret = -errno;
+ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
 #endif
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH 5/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-30 Thread Denis V. Lunev
There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply call
fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Before the patch do_fallocate was used when either
CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
Now the story is different. CONFIG_FALLOCATE is defined when Linux
fallocate is defined, posix_fallocate is completely different story
(CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
thus we are on the safe side.

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 5a777e7..1c88ad8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -147,6 +147,7 @@ typedef struct BDRVRawState {
 bool has_discard:1;
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
+bool has_fallocate;
 bool needs_alignment;
 } BDRVRawState;
 
@@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 }
 if (S_ISREG(st.st_mode)) {
 s->discard_zeroes = true;
+s->has_fallocate = true;
 }
 if (S_ISBLK(st.st_mode)) {
 #ifdef BLKDISCARDZEROES
@@ -902,7 +904,7 @@ static int translate_err(int err)
 return err;
 }
 
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
+#ifdef CONFIG_FALLOCATE
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
 {
 do {
@@ -965,6 +967,16 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE
+if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
+int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0 || ret != -ENOTSUP) {
+return ret;
+}
+s->has_fallocate = false;
+}
+#endif
+
 return -ENOTSUP;
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH v6 0/7] eliminate data write in bdrv_write_zeroes on Linux in raw-posix.c

2015-01-30 Thread Denis V. Lunev
I have performed several tests with non-aligned fallocate calls and
in all cases (with non-aligned fallocates) Linux performs fine, i.e.
areas are zeroed correctly. Checks were made on
Linux 3.16.0-28-generic #38-Ubuntu SMP

This should seriously increase performance of bdrv_write_zeroes

Changes from v5:
- changed order between patches 5/6
- check has_fallocate in FALLOC_FL_PUNCH_HOLE branch of the code
- minor comment tweaks as pointed out by Max Reitz

Changes from v4:
- comments to patches are improved by Max Reitz suggestions
- replaced 'return ret' with 'return -ENOTSUP' handle_aiocb_write_zeroes
  The idea is that we can reach that point only if original ret was
  equal to -ENOTSUP
- added has_fallocate flag to indicate that fallocate is working for
  given BDS
- checked file length with bdrv_getlength

Changes from v3:
- dropped original patch 1, equivalent stuff was merged already
- reordered patches as suggested by Fam
- fixes spelling errors as suggested by Fam
- fixed not initialized value in handle_aiocb_write_zeroes
- fixed wrong error processing from do_fallocate(FALLOC_FL_ZERO_RANGE)

Changes from v2:
- added Peter Lieven to CC
- added CONFIG_FALLOCATE check to call do_fallocate in patch 7
- dropped patch 1 as NACK-ed
- added processing of very large data areas in bdrv_co_write_zeroes (new
  patch 1)
- set bl.max_write_zeroes to INT_MAX in raw-posix.c for regular files
  (new patch 8)

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 




[Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-01-30 Thread Denis V. Lunev
fallocate() works fine and could handle properly with arbitrary size
requests. There is no sense to reduce the amount of space to fallocate.
The bigger is the size, the better is the performance as the amount of
journal updates is reduced.

The patch changes behavior for both generic filesystem and XFS codepaths,
which are different in handle_aiocb_write_zeroes. The implementation
of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
thus the change is fine for both ways.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7b42f37..933c778 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 }
 }
 
+static void raw_probe_max_write_zeroes(BlockDriverState *bs)
+{
+BDRVRawState *s = bs->opaque;
+struct stat st;
+
+if (fstat(s->fd, &st) < 0) {
+return; /* no problem, keep default value */
+}
+if (!S_ISREG(st.st_mode) || !s->discard_zeroes) {
+return;
+}
+bs->bl.max_write_zeroes = INT_MAX;
+}
+
 static void raw_parse_flags(int bdrv_flags, int *open_flags)
 {
 assert(open_flags != NULL);
@@ -600,6 +614,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 /* Fail already reopen_prepare() if we can't get a working O_DIRECT
  * alignment with the new fd. */
 if (raw_s->fd != -1) {
+raw_probe_max_write_zeroes(state->bs);
 raw_probe_alignment(state->bs, raw_s->fd, &local_err);
 if (local_err) {
 qemu_close(raw_s->fd);
@@ -653,6 +668,8 @@ static void raw_refresh_limits(BlockDriverState *bs, Error 
**errp)
 
 raw_probe_alignment(bs, s->fd, errp);
 bs->bl.opt_mem_alignment = s->buf_align;
+
+raw_probe_max_write_zeroes(bs);
 }
 
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
-- 
1.9.1




[Qemu-devel] [PATCH 1/7] block/raw-posix: create translate_err helper to merge errno values

2015-01-30 Thread Denis V. Lunev
actually the code
if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
ret == -ENOTTY) {
ret = -ENOTSUP;
}
is present twice and will be added a couple more times. Create helper
for this.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e51293a..24300d0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -893,6 +893,15 @@ static int xfs_discard(BDRVRawState *s, int64_t offset, 
uint64_t bytes)
 }
 #endif
 
+static int translate_err(int err)
+{
+if (err == -ENODEV || err == -ENOSYS || err == -EOPNOTSUPP ||
+err == -ENOTTY) {
+err = -ENOTSUP;
+}
+return err;
+}
+
 static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
@@ -921,10 +930,9 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 #endif
 }
 
-if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
-ret == -ENOTTY) {
+ret = translate_err(ret);
+if (ret == -ENOTSUP) {
 s->has_write_zeroes = false;
-ret = -ENOTSUP;
 }
 return ret;
 }
@@ -968,10 +976,9 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 #endif
 }
 
-if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
-ret == -ENOTTY) {
+ret = translate_err(ret);
+if (ret == -ENOTSUP) {
 s->has_discard = false;
-ret = -ENOTSUP;
 }
 return ret;
 }
-- 
1.9.1




[Qemu-devel] [PATCH 3/7] block/raw-posix: refactor handle_aiocb_write_zeroes a bit

2015-01-30 Thread Denis V. Lunev
move code dealing with a block device to a separate function. This will
allow to implement additional processing for ordinary files.

Please note, that xfs_code has been moved before checking for
s->has_write_zeroes as xfs_write_zeroes does not touch this flag inside.
This makes code a bit more consistent.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 46 +++---
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2aa268a..d3910fd 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -914,41 +914,49 @@ static int do_fallocate(int fd, int mode, off_t offset, 
off_t len)
 }
 #endif
 
-static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
+static ssize_t handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
 {
-int ret = -EOPNOTSUPP;
+int ret = -ENOTSUP;
 BDRVRawState *s = aiocb->bs->opaque;
 
-if (s->has_write_zeroes == 0) {
+if (!s->has_write_zeroes) {
 return -ENOTSUP;
 }
 
-if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
 #ifdef BLKZEROOUT
-do {
-uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
-if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
-return 0;
-}
-} while (errno == EINTR);
-
-ret = -errno;
-#endif
-} else {
-#ifdef CONFIG_XFS
-if (s->is_xfs) {
-return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
+do {
+uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
+if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
+return 0;
 }
+} while (errno == EINTR);
+
+ret = translate_err(-errno);
 #endif
-}
 
-ret = translate_err(ret);
 if (ret == -ENOTSUP) {
 s->has_write_zeroes = false;
 }
 return ret;
 }
 
+static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
+{
+BDRVRawState *s = aiocb->bs->opaque;
+
+if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
+return handle_aiocb_write_zeroes_block(aiocb);
+}
+
+#ifdef CONFIG_XFS
+if (s->is_xfs) {
+return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
+}
+#endif
+
+return -ENOTSUP;
+}
+
 static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
-- 
1.9.1




Re: [Qemu-devel] [PATCH 2/7] block/raw-posix: create do_fallocate helper

2015-01-30 Thread Denis V. Lunev

On 30/01/15 11:47, Peter Lieven wrote:

Am 30.01.2015 um 09:42 schrieb Denis V. Lunev:

The pattern
 do {
 if (fallocate(s->fd, mode, offset, len) == 0) {
 return 0;
 }
 } while (errno == EINTR);
 ret = translate_err(-errno);
will be commonly useful in next patches. Create helper for it.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
  block/raw-posix.c | 22 ++
  1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 24300d0..2aa268a 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -902,6 +902,18 @@ static int translate_err(int err)
  return err;
  }
  
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)

+static int do_fallocate(int fd, int mode, off_t offset, off_t len)
+{
+do {
+if (fallocate(fd, mode, offset, len) == 0) {
+return 0;
+}
+} while (errno == EINTR);
+return translate_err(-errno);
+}
+#endif
+
  static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
  {
  int ret = -EOPNOTSUPP;
@@ -965,14 +977,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
  #endif
  
  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE

-do {
-if (fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
-  aiocb->aio_offset, aiocb->aio_nbytes) == 0) {
-return 0;
-}
-} while (errno == EINTR);
-
-ret = -errno;
+ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+   aiocb->aio_offset, aiocb->aio_nbytes);

You also introduce the conversion via translate_err here. Is that ok?

Peter


yes, this is redundant but harmless



Re: [Qemu-devel] [PATCH 5/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-30 Thread Denis V. Lunev

On 30/01/15 17:58, Max Reitz wrote:

On 2015-01-30 at 03:42, Denis V. Lunev wrote:

There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply 
call

fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Before the patch do_fallocate was used when either
CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
Now the story is different. CONFIG_FALLOCATE is defined when Linux
fallocate is defined, posix_fallocate is completely different story
(CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
thus we are on the safe side.

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
  block/raw-posix.c | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 5a777e7..1c88ad8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -147,6 +147,7 @@ typedef struct BDRVRawState {
  bool has_discard:1;
  bool has_write_zeroes:1;
  bool discard_zeroes:1;
+bool has_fallocate;
  bool needs_alignment;
  } BDRVRawState;
  @@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState 
*bs, QDict *options,

  }
  if (S_ISREG(st.st_mode)) {
  s->discard_zeroes = true;
+s->has_fallocate = true;


This could be moved upwards where has_discard and has_write_zeroes are 
initialized; but it won't matter in practice, I hope. Thus:


Reviewed-by: Max Reitz 


This does matter as has_discard and has_write_zeroes are bit fields
thus I can not insert something useful into the middle of those
fields.



  1   2   3   4   5   6   7   8   9   10   >