date:20200428

Re: [PATCH v4 6/7] wire in the dwc-hsotg (dwc2) USB host controller emulation

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/28/20 4:22 AM, Paul Zimmerman wrote:
> Wire the dwc-hsotg (dwc2) emulation into Qemu
> 
> Signed-off-by: Paul Zimmerman 
> ---
>  hw/arm/bcm2835_peripherals.c | 21 -
>  include/hw/arm/bcm2835_peripherals.h |  3 ++-
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
> index 5e2c832d95..3b554cfac0 100644
> --- a/hw/arm/bcm2835_peripherals.c
> +++ b/hw/arm/bcm2835_peripherals.c
> @@ -128,6 +128,13 @@ static void bcm2835_peripherals_init(Object *obj)
>  sysbus_init_child_obj(obj, "mphi", &s->mphi, sizeof(s->mphi),
>TYPE_BCM2835_MPHI);
>  
> +/* DWC2 */
> +sysbus_init_child_obj(obj, "dwc2", &s->dwc2, sizeof(s->dwc2),
> +  TYPE_DWC2_USB);
> +
> +object_property_add_const_link(OBJECT(&s->dwc2), "dma-mr",
> +   OBJECT(&s->gpu_bus_mr), &error_abort);
> +
>  object_property_add_const_link(OBJECT(&s->gpio), "sdbus-sdhci",
> OBJECT(&s->sdhci.sdbus), &error_abort);
>  object_property_add_const_link(OBJECT(&s->gpio), "sdbus-sdhost",
> @@ -385,6 +392,19 @@ static void bcm2835_peripherals_realize(DeviceState 
> *dev, Error **errp)
>  qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
> INTERRUPT_HOSTPORT));
>  
> +/* DWC2 */
> +object_property_set_bool(OBJECT(&s->dwc2), true, "realized", &err);
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +
> +memory_region_add_subregion(&s->peri_mr, USB_OTG_OFFSET,
> +sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->dwc2), 0));
> +sysbus_connect_irq(SYS_BUS_DEVICE(&s->dwc2), 0,
> +qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
> +   INTERRUPT_USB));
> +
>  create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 
> 0x40);
>  create_unimp(s, &s->cprman, "bcm2835-cprman", CPRMAN_OFFSET, 0x1000);
>  create_unimp(s, &s->a2w, "bcm2835-a2w", A2W_OFFSET, 0x1000);
> @@ -398,7 +418,6 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
> Error **errp)
>  create_unimp(s, &s->otp, "bcm2835-otp", OTP_OFFSET, 0x80);
>  create_unimp(s, &s->dbus, "bcm2835-dbus", DBUS_OFFSET, 0x8000);
>  create_unimp(s, &s->ave0, "bcm2835-ave0", AVE0_OFFSET, 0x8000);
> -create_unimp(s, &s->dwc2, "dwc-usb2", USB_OTG_OFFSET, 0x1000);
>  create_unimp(s, &s->sdramc, "bcm2835-sdramc", SDRAMC_OFFSET, 0x100);
>  }
>  
> diff --git a/include/hw/arm/bcm2835_peripherals.h 
> b/include/hw/arm/bcm2835_peripherals.h
> index 7a7a8f6141..48a0ad1633 100644
> --- a/include/hw/arm/bcm2835_peripherals.h
> +++ b/include/hw/arm/bcm2835_peripherals.h
> @@ -27,6 +27,7 @@
>  #include "hw/sd/bcm2835_sdhost.h"
>  #include "hw/gpio/bcm2835_gpio.h"
>  #include "hw/timer/bcm2835_systmr.h"
> +#include "hw/usb/hcd-dwc2.h"
>  #include "hw/misc/unimp.h"
>  
>  #define TYPE_BCM2835_PERIPHERALS "bcm2835-peripherals"
> @@ -67,7 +68,7 @@ typedef struct BCM2835PeripheralState {
>  UnimplementedDeviceState ave0;
>  UnimplementedDeviceState bscsl;
>  UnimplementedDeviceState smi;
> -UnimplementedDeviceState dwc2;
> +DWC2State dwc2;
>  UnimplementedDeviceState sdramc;
>  } BCM2835PeripheralState;
>  
> 

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v4 4/7] dwc-hsotg (dwc2) USB host controller emulation

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/28/20 4:22 AM, Paul Zimmerman wrote:
> Add the dwc-hsotg (dwc2) USB host controller emulation code.
> Based on hw/usb/hcd-ehci.c and hw/usb/hcd-ohci.c.
> 
> Note that to use this with the dwc-otg driver in the Raspbian
> kernel, you must pass the option "dwc_otg.fiq_fsm_enable=0" on
> the kernel command line.
> 
> Emulation of slave mode and of descriptor-DMA mode has not been
> implemented yet. These modes are seldom used.
> 
> I have used some on-line sources of information while developing
> this emulation, including:
> 
> http://www.capital-micro.com/PDF/CME-M7_Family_User_Guide_EN.pdf
> which has a pretty complete description of the controller starting
> on page 370.
> 
> https://sourceforge.net/p/wive-ng/wive-ng-mt/ci/master/tree/docs/DataSheets/RT3050_5x_V2.0_081408_0902.pdf
> which has a description of the controller registers starting on
> page 130.
> 
> Thanks to Felippe Mathieu-Daude for providing a cleaner method
> of implementing the memory regions for the controller registers.
> 
> Signed-off-by: Paul Zimmerman 
> ---
>  hw/usb/Kconfig   |5 +
>  hw/usb/Makefile.objs |1 +
>  hw/usb/hcd-dwc2.c| 1378 ++
>  hw/usb/trace-events  |   47 ++
>  4 files changed, 1431 insertions(+)
>  create mode 100644 hw/usb/hcd-dwc2.c
> 
> diff --git a/hw/usb/Kconfig b/hw/usb/Kconfig
> index 464348ba14..d4d8c37c28 100644
> --- a/hw/usb/Kconfig
> +++ b/hw/usb/Kconfig
> @@ -46,6 +46,11 @@ config USB_MUSB
>  bool
>  select USB
>  
> +config USB_DWC2
> +bool
> +default y
> +select USB
> +
>  config TUSB6010
>  bool
>  select USB_MUSB
> diff --git a/hw/usb/Makefile.objs b/hw/usb/Makefile.objs
> index 66835e5bf7..fa5c3fa1b8 100644
> --- a/hw/usb/Makefile.objs
> +++ b/hw/usb/Makefile.objs
> @@ -12,6 +12,7 @@ common-obj-$(CONFIG_USB_EHCI_SYSBUS) += hcd-ehci-sysbus.o
>  common-obj-$(CONFIG_USB_XHCI) += hcd-xhci.o
>  common-obj-$(CONFIG_USB_XHCI_NEC) += hcd-xhci-nec.o
>  common-obj-$(CONFIG_USB_MUSB) += hcd-musb.o
> +common-obj-$(CONFIG_USB_DWC2) += hcd-dwc2.o
>  
>  common-obj-$(CONFIG_TUSB6010) += tusb6010.o
>  common-obj-$(CONFIG_IMX)  += chipidea.o
> diff --git a/hw/usb/hcd-dwc2.c b/hw/usb/hcd-dwc2.c
> new file mode 100644
> index 00..59c2caa6c6
> --- /dev/null
> +++ b/hw/usb/hcd-dwc2.c
> @@ -0,0 +1,1378 @@
> +/*
> + * dwc-hsotg (dwc2) USB host controller emulation
> + *
> + * Based on hw/usb/hcd-ehci.c and hw/usb/hcd-ohci.c
> + *
> + * Note that to use this emulation with the dwc-otg driver in the
> + * Raspbian kernel, you must pass the option "dwc_otg.fiq_fsm_enable=0"
> + * on the kernel command line.
> + *
> + * Some useful documentation used to develop this emulation can be
> + * found online (as of April 2020) at:
> + *
> + * http://www.capital-micro.com/PDF/CME-M7_Family_User_Guide_EN.pdf
> + * which has a pretty complete description of the controller starting
> + * on page 370.
> + *
> + * 
> https://sourceforge.net/p/wive-ng/wive-ng-mt/ci/master/tree/docs/DataSheets/RT3050_5x_V2.0_081408_0902.pdf
> + * which has a description of the controller registers starting on
> + * page 130.
> + *
> + * Copyright (c) 2020 Paul Zimmerman 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "hw/usb/dwc2-regs.h"
> +#include "hw/usb/hcd-dwc2.h"
> +#include "migration/vmstate.h"
> +#include "trace.h"
> +#include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
> +#include "hw/qdev-properties.h"
> +
> +#define USB_HZ_FS   1200
> +#define USB_HZ_HS   9600
> +#define USB_FRMINTVL12000
> +
> +/* nifty macros from Arnon's EHCI version  */
> +#define get_field(data, field) \
> +(((data) & field##_MASK) >> field##_SHIFT)
> +
> +#define set_field(data, newval, field) do { \
> +uint32_t val = *(data); \
> +val &= ~field##_MASK; \
> +val |= ((newval) << field##_SHIFT) & field##_MASK; \
> +*(data) = val; \
> +} while (0)
> +
> +#define get_bit(data, bitmask) \
> +(!!((data) & (bitmask)))
> +
> +/* update irq line */
> +static inline void dwc2_update_irq(DWC2State *s)
> +{
> +static int oldlevel;
> +int level = 0;
> +
> +if ((s->gintsts & s->gintmsk) && (s->gahbcfg & GAHBCFG_GLBL_INTR_EN)) {
> +level = 1;
> +}
> +if (level != oldlevel) {
> +oldlevel = level;
> +trace_usb_dwc2_update_irq(level);
> +qemu_set_irq(s->irq, level);
> +}
> +}
> +
> +/* flag interrupt cond

Re: [PATCH v4 7/7] raspi2 acceptance test: add test for dwc-hsotg (dwc2) USB host

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/28/20 4:22 AM, Paul Zimmerman wrote:
> Add a check for functional dwc-hsotg (dwc2) USB host emulation to
> the Raspi 2 acceptance test
> 
> Signed-off-by: Paul Zimmerman 
> ---
>  tests/acceptance/boot_linux_console.py | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/acceptance/boot_linux_console.py 
> b/tests/acceptance/boot_linux_console.py
> index f825cd9ef5..efa4803642 100644
> --- a/tests/acceptance/boot_linux_console.py
> +++ b/tests/acceptance/boot_linux_console.py
> @@ -373,13 +373,18 @@ class BootLinuxConsole(Test):
>  
>  self.vm.set_console()
>  kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
> -   serial_kernel_cmdline[uart_id])
> +   serial_kernel_cmdline[uart_id] +
> +   ' root=/dev/mmcblk0p2 rootwait ' +
> +   'dwc_otg.fiq_fsm_enable=0')
>  self.vm.add_args('-kernel', kernel_path,
>   '-dtb', dtb_path,
> - '-append', kernel_command_line)
> + '-append', kernel_command_line,
> + '-device', 'usb-kbd')
>  self.vm.launch()
>  console_pattern = 'Kernel command line: %s' % kernel_command_line
>  self.wait_for_console_pattern(console_pattern)
> +console_pattern = 'Product: QEMU USB Keyboard'
> +self.wait_for_console_pattern(console_pattern)

Awesome, thanks for this patch!

Reviewed-by: Philippe Mathieu-Daudé 

>  
>  def test_arm_raspi2_uart0(self):
>  """
>

Re: [PATCH v2 1/6] block/block-copy: rename in-flight requests to tasks

2020-04-28 Thread Max Reitz

On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:
> We are going to use aio-task-pool API and extend in-flight request
> structure to be a successor of AioTask, so rename things appropriately.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/block-copy.c | 99 +++---
>  1 file changed, 49 insertions(+), 50 deletions(-)
> 
> diff --git a/block/block-copy.c b/block/block-copy.c
> index 05227e18bf..61d1d26991 100644
> --- a/block/block-copy.c
> +++ b/block/block-copy.c

[...]

> -static void coroutine_fn block_copy_inflight_req_shrink(BlockCopyState *s,
> -BlockCopyInFlightReq *req, int64_t new_bytes)
> +static void coroutine_fn block_copy_task_shrink(BlockCopyState *s,
> +BlockCopyTask *task,
> +int64_t new_bytes)
>  {
> -if (new_bytes == req->bytes) {
> +if (new_bytes == task->bytes) {
>  return;
>  }
>  
> -assert(new_bytes > 0 && new_bytes < req->bytes);
> +assert(new_bytes > 0 && new_bytes < task->bytes);
>  
> -s->in_flight_bytes -= req->bytes - new_bytes;
> +s->in_flight_bytes -= task->bytes - new_bytes;
>  bdrv_set_dirty_bitmap(s->copy_bitmap,
> -  req->offset + new_bytes, req->bytes - new_bytes);
> +  task->offset + new_bytes, task->bytes - new_bytes);
> +s->in_flight_bytes -= task->bytes - new_bytes;

This line doesn’t seem right.

(The rest does.)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Bug 1874674] Re: [Feature request] acceptance test class to run user-mode binaries

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/24/20 9:14 PM, Richard Henderson wrote:
> What user-mode testing do you think might be improved by using avocado?

Test unmodified real-world binaries, know to work in the field.

Test can be added by users without having to be a TCG developer, see
https://www.mail-archive.com/qemu-devel@nongnu.org/msg626608.html:

  class LoadBFLT(LinuxUserTest):
  def test_stm32(self):
  rootfs_url = ('https://elinux.org/images/5/51/'
'Stm32_mini_rootfs.cpio.bz2')
  rootfs_path_bz2 = self.fetch_asset(rootfs_url, ...)
  busybox_path = self.workdir + "/bin/busybox"

  res = self.run("%s %s" % (busybox_path, cmd))
  ver = 'BusyBox v1.24.0.git (2015-02-03 22:17:13 CET) ...'
  self.assertIn(ver, res.stdout_text)

  cmd = 'uname -a'
  res = self.run("%s %s" % (busybox_path, cmd))
  unm = 'armv7l GNU/Linux'
  self.assertIn(unm, res.stdout_text)

This is a fairly trivial test, cheap (no need to cross-build), yet it
still covers quite some QEMU code.

> IMO at present we have a fairly comprehensive testing infrastructure for
> user-mode that is simply underused.  With docker, we have a set of
> cross-compilers for most guest architectures, and we are able to build
> statically linked binaries that are copied out of the container for
> testing by the just-built qemu binaries on the host.  This
> infrastructure is used by check-tcg.  It's fairly easy to add new test
> cases to be run on one or all guests.

What you describe is a different and complementary test set. Craft tests
and build them with QEMU.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1874674

Title:
  [Feature request] acceptance test class to run user-mode binaries

Status in QEMU:
  New

Bug description:
  Currently the acceptance test framework only target system-mode emulation.
  It would be useful to test user-mode too.

  Ref: https://www.mail-archive.com/qemu-devel@nongnu.org/msg626610.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1874674/+subscriptions

[PATCH 1/2] Fix undefined behaviour

2020-04-28 Thread Grzegorz Uriasz

Signed-off-by: Grzegorz Uriasz 
---
 hw/xen/xen_pt_load_rom.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/xen/xen_pt_load_rom.c b/hw/xen/xen_pt_load_rom.c
index a50a80837e..9f100dc159 100644
--- a/hw/xen/xen_pt_load_rom.c
+++ b/hw/xen/xen_pt_load_rom.c
@@ -38,12 +38,12 @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev,
 fp = fopen(rom_file, "r+");
 if (fp == NULL) {
 if (errno != ENOENT) {
-error_report("pci-assign: Cannot open %s: %s", rom_file, 
strerror(errno));
+warn_report("pci-assign: Cannot open %s: %s", rom_file, 
strerror(errno));
 }
 return NULL;
 }
 if (fstat(fileno(fp), &st) == -1) {
-error_report("pci-assign: Cannot stat %s: %s", rom_file, 
strerror(errno));
+warn_report("pci-assign: Cannot stat %s: %s", rom_file, 
strerror(errno));
 goto close_rom;
 }
 
@@ -59,10 +59,9 @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev,
 memset(ptr, 0xff, st.st_size);
 
 if (!fread(ptr, 1, st.st_size, fp)) {
-error_report("pci-assign: Cannot read from host %s", rom_file);
-error_printf("Device option ROM contents are probably invalid "
- "(check dmesg).\nSkip option ROM probe with rombar=0, "
- "or load from file with romfile=\n");
+warn_report("pci-assign: Cannot read from host %s", rom_file);
+memory_region_unref(&dev->rom);
+ptr = NULL;
 goto close_rom;
 }
 
-- 
2.26.1

[PATCH 2/2] Improve legacy vbios handling

2020-04-28 Thread Grzegorz Uriasz

Signed-off-by: Grzegorz Uriasz 
---
 hw/xen/xen_pt.c  |  8 +--
 hw/xen/xen_pt_graphics.c | 48 +---
 hw/xen/xen_pt_load_rom.c |  2 +-
 3 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index b91082cb8b..ffc3559dd4 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -483,8 +483,12 @@ static int xen_pt_register_regions(XenPCIPassthroughState 
*s, uint16_t *cmd)
i, r->size, r->base_addr, type);
 }
 
-/* Register expansion ROM address */
-if (d->rom.base_addr && d->rom.size) {
+/*
+ * Register expansion ROM address. If we are dealing with a ROM
+ * shadow copy for legacy vga devices then don't bother to map it
+ * as previous code creates a proper shadow copy
+ */
+if (d->rom.base_addr && d->rom.size && !(is_igd_vga_passthrough(d))) {
 uint32_t bar_data = 0;
 
 /* Re-set BAR reported by OS, otherwise ROM can't be read. */
diff --git a/hw/xen/xen_pt_graphics.c b/hw/xen/xen_pt_graphics.c
index a3bc7e3921..fe0ef2685c 100644
--- a/hw/xen/xen_pt_graphics.c
+++ b/hw/xen/xen_pt_graphics.c
@@ -129,7 +129,7 @@ int xen_pt_unregister_vga_regions(XenHostPCIDevice *dev)
 return 0;
 }
 
-static void *get_vgabios(XenPCIPassthroughState *s, int *size,
+static void *get_sysfs_vgabios(XenPCIPassthroughState *s, int *size,
XenHostPCIDevice *dev)
 {
 return pci_assign_dev_load_option_rom(&s->dev, size,
@@ -137,6 +137,45 @@ static void *get_vgabios(XenPCIPassthroughState *s, int 
*size,
   dev->dev, dev->func);
 }
 
+static void xen_pt_direct_vbios_copy(XenPCIPassthroughState *s, Error **errp)
+{
+int fd = -1;
+void *guest_bios = NULL;
+void *host_vbios = NULL;
+/* This is always 32 pages in the real mode reserved region */
+int bios_size = 32 << XC_PAGE_SHIFT;
+int vbios_addr = 0xc;
+
+fd = open("/dev/mem", O_RDONLY);
+if (fd == -1) {
+error_setg(errp, "Can't open /dev/mem: %s", strerror(errno));
+return;
+}
+host_vbios = mmap(NULL, bios_size,
+PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, fd, vbios_addr);
+close(fd);
+
+if (host_vbios == MAP_FAILED) {
+error_setg(errp, "Failed to mmap host vbios: %s", strerror(errno));
+return;
+}
+
+memory_region_init_ram(&s->dev.rom, OBJECT(&s->dev),
+"legacy_vbios.rom", bios_size, &error_abort);
+guest_bios = memory_region_get_ram_ptr(&s->dev.rom);
+memcpy(guest_bios, host_vbios, bios_size);
+
+if (munmap(host_vbios, bios_size) == -1) {
+XEN_PT_LOG(&s->dev, "Failed to unmap host vbios: %s\n", 
strerror(errno));
+}
+
+cpu_physical_memory_write(vbios_addr, guest_bios, bios_size);
+memory_region_set_address(&s->dev.rom, vbios_addr);
+pci_register_bar(&s->dev, PCI_ROM_SLOT, PCI_BASE_ADDRESS_SPACE_MEMORY, 
&s->dev.rom);
+s->dev.has_rom = true;
+XEN_PT_LOG(&s->dev, "Legacy VBIOS registered\n");
+}
+
 /* Refer to Seabios. */
 struct rom_header {
 uint16_t signature;
@@ -179,9 +218,11 @@ void xen_pt_setup_vga(XenPCIPassthroughState *s, 
XenHostPCIDevice *dev,
 return;
 }
 
-bios = get_vgabios(s, &bios_size, dev);
+bios = get_sysfs_vgabios(s, &bios_size, dev);
 if (!bios) {
-error_setg(errp, "VGA: Can't get VBIOS");
+XEN_PT_LOG(&s->dev, "Unable to get host VBIOS from sysfs - "
+"falling back to a direct copy of memory 
ranges\n");
+xen_pt_direct_vbios_copy(s, errp);
 return;
 }
 
@@ -223,6 +264,7 @@ void xen_pt_setup_vga(XenPCIPassthroughState *s, 
XenHostPCIDevice *dev,
 
 /* Currently we fixed this address as a primary for legacy BIOS. */
 cpu_physical_memory_write(0xc, bios, bios_size);
+XEN_PT_LOG(&s->dev, "Legacy VBIOS registered\n");
 }
 
 uint32_t igd_read_opregion(XenPCIPassthroughState *s)
diff --git a/hw/xen/xen_pt_load_rom.c b/hw/xen/xen_pt_load_rom.c
index 9f100dc159..8cd9aa84dc 100644
--- a/hw/xen/xen_pt_load_rom.c
+++ b/hw/xen/xen_pt_load_rom.c
@@ -65,7 +65,7 @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev,
 goto close_rom;
 }
 
-pci_register_bar(dev, PCI_ROM_SLOT, 0, &dev->rom);
+pci_register_bar(dev, PCI_ROM_SLOT, PCI_BASE_ADDRESS_SPACE_MEMORY, 
&dev->rom);
 dev->has_rom = true;
 *size = st.st_size;
 close_rom:
-- 
2.26.1

[PATCH 0/2] Fix QEMU crashes when passing IGD to a guest VM under XEN

2020-04-28 Thread Grzegorz Uriasz



Hi,

This patch series is a small subset of a bigger patch set spanning few projects 
aiming to isolate the GPU
in QUBES OS to a dedicated security domain. I'm doing this together with 3 
colleagues as part of our Bachelors thesis.

When passing an Intel Graphic Device to a HVM guest under XEN, QEMU sometimes 
crashes
when starting the VM. It turns out that the code responsible for setting up
the legacy VBIOS for the IGD contains a bug which results in a memcpy of an 
undefined size
between the QEMU heap and the physical memory of the guest.

If the size of the memcpy is small enough qemu does not crash - this means that 
this
bug is actually a small security issue - a hostile guest kernel might determine 
the memory layout of
QEMU simply by looking at physical memory beyond 0xd - this defeats ASLR 
and might make exploitation
easier if other issues were to be found.

The problem is the current mechanism for obtaining a copy of the ROM of the IGD.
We first allocate a buffer which holds the vbios - the size of which is 
obtained from sysfs.
We then try to read the rom from sysfs, if we fail then we just return without 
setting the size of the buffer.
This would be ok if the size of the ROM reported by sysfs would be 0, but the 
size is always 32 pages as this corresponds
to legacy memory ranges. It turns out that reading the ROM fails on every 
single device I've tested(spanning few
generations of IGD), which means qemu never sets the size of the buffer and 
returns a valid pointer to code which
basically does a memcpy of an undefined size.

I'm including two patches.
The first one fixes the security issue by making failing to read the ROM from 
sysfs fatal.
The second patch introduces a better method for obtaining the VBIOS. I've 
haven't yet seen a single device on which
the old code was working, the new code basically creates a shadow copy directly 
by reading from /dev/mem - this
should be fine as a quick grep of the codebase shows that this approach is 
already being used to handle MSI.
I've tested the new code on few different laptops and it works fine and the 
guest VMS finally stopped complaining that
the VBIOS tables are missing.

Grzegorz Uriasz (2):
  Fix undefined behaviour
  Improve legacy vbios handling

 hw/xen/xen_pt.c  |  8 +--
 hw/xen/xen_pt_graphics.c | 48 +---
 hw/xen/xen_pt_load_rom.c | 13 +--
 3 files changed, 57 insertions(+), 12 deletions(-)

-- 
2.26.1

Re: [PATCH v2 2/6] block/block-copy: alloc task on each iteration

2020-04-28 Thread Max Reitz

On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:
> We are going to use aio-task-pool API, so tasks will be handled in
> parallel. We need therefore separate allocated task on each iteration.
> Introduce this logic now.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/block-copy.c | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v4 1/7] raspi: add BCM2835 SOC MPHI emulation

2020-04-28 Thread Philippe Mathieu-Daudé

Hi Paul,

On 4/28/20 4:22 AM, Paul Zimmerman wrote:
> Add BCM2835 SOC MPHI (Message-based Parallel Host Interface)
> emulation. It is very basic, only providing the FIQ interrupt
> needed to allow the dwc-otg USB host controller driver in the
> Raspbian kernel to function.
> 
> Signed-off-by: Paul Zimmerman 
> ---
>  hw/arm/bcm2835_peripherals.c |  17 +++
>  hw/misc/Makefile.objs|   1 +
>  hw/misc/bcm2835_mphi.c   | 190 +++
>  include/hw/arm/bcm2835_peripherals.h |   2 +
>  include/hw/misc/bcm2835_mphi.h   |  48 +++
>  5 files changed, 258 insertions(+)
>  create mode 100644 hw/misc/bcm2835_mphi.c
>  create mode 100644 include/hw/misc/bcm2835_mphi.h
> 
> diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
> index edcaa4916d..5e2c832d95 100644
> --- a/hw/arm/bcm2835_peripherals.c
> +++ b/hw/arm/bcm2835_peripherals.c
> @@ -124,6 +124,10 @@ static void bcm2835_peripherals_init(Object *obj)
>  sysbus_init_child_obj(obj, "gpio", &s->gpio, sizeof(s->gpio),
>TYPE_BCM2835_GPIO);
>  
> +/* Mphi */
> +sysbus_init_child_obj(obj, "mphi", &s->mphi, sizeof(s->mphi),
> +  TYPE_BCM2835_MPHI);
> +
>  object_property_add_const_link(OBJECT(&s->gpio), "sdbus-sdhci",
> OBJECT(&s->sdhci.sdbus), &error_abort);
>  object_property_add_const_link(OBJECT(&s->gpio), "sdbus-sdhost",
> @@ -368,6 +372,19 @@ static void bcm2835_peripherals_realize(DeviceState 
> *dev, Error **errp)
>  return;
>  }
>  
> +/* Mphi */
> +object_property_set_bool(OBJECT(&s->mphi), true, "realized", &err);
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +
> +memory_region_add_subregion(&s->peri_mr, MPHI_OFFSET,
> +sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->mphi), 0));
> +sysbus_connect_irq(SYS_BUS_DEVICE(&s->mphi), 0,
> +qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
> +   INTERRUPT_HOSTPORT));
> +
>  create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 
> 0x40);
>  create_unimp(s, &s->cprman, "bcm2835-cprman", CPRMAN_OFFSET, 0x1000);
>  create_unimp(s, &s->a2w, "bcm2835-a2w", A2W_OFFSET, 0x1000);
> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> index 68aae2eabb..91085cc21b 100644
> --- a/hw/misc/Makefile.objs
> +++ b/hw/misc/Makefile.objs
> @@ -57,6 +57,7 @@ common-obj-$(CONFIG_OMAP) += omap_l4.o
>  common-obj-$(CONFIG_OMAP) += omap_sdrc.o
>  common-obj-$(CONFIG_OMAP) += omap_tap.o
>  common-obj-$(CONFIG_RASPI) += bcm2835_mbox.o
> +common-obj-$(CONFIG_RASPI) += bcm2835_mphi.o
>  common-obj-$(CONFIG_RASPI) += bcm2835_property.o
>  common-obj-$(CONFIG_RASPI) += bcm2835_rng.o
>  common-obj-$(CONFIG_RASPI) += bcm2835_thermal.o
> diff --git a/hw/misc/bcm2835_mphi.c b/hw/misc/bcm2835_mphi.c
> new file mode 100644
> index 00..66fc4a9cd3
> --- /dev/null
> +++ b/hw/misc/bcm2835_mphi.c
> @@ -0,0 +1,190 @@
> +/*
> + * BCM2835 SOC MPHI emulation
> + *
> + * Very basic emulation, only providing the FIQ interrupt needed to
> + * allow the dwc-otg USB host controller driver in the Raspbian kernel
> + * to function.
> + *
> + * Copyright (c) 2020 Paul Zimmerman 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/misc/bcm2835_mphi.h"
> +#include "migration/vmstate.h"
> +#include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
> +
> +static inline void mphi_raise_irq(BCM2835MphiState *s)
> +{
> +qemu_set_irq(s->irq, 1);
> +}
> +
> +static inline void mphi_lower_irq(BCM2835MphiState *s)
> +{
> +qemu_set_irq(s->irq, 0);
> +}
> +
> +static uint64_t mphi_reg_read(void *ptr, hwaddr addr, unsigned size)
> +{
> +BCM2835MphiState *s = ptr;
> +uint32_t reg = s->regbase + addr;
> +uint32_t val = 0;
> +
> +switch (reg) {
> +case 0x28:  /* outdda */
> +val = s->outdda;
> +break;
> +case 0x2c:  /* outddb */
> +val = s->outddb;
> +break;
> +case 0x4c:  /* ctrl */
> +val = s->ctrl;
> +val |= 1 << 17;
> +break;
> +case 0x50:  /* intstat */
> +val = s->intstat;
> +break;
> +case 0x1f0: /* swirq_set */
> +val = s->swirq_set;
> +break;
> +case 0x1f4: /* swirq_clr */
> +val = s->swirq_clr;
> +break;

I'm surprised this

Re: [PATCH v1 03/11] hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/27/20 8:16 PM, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> Fix typo xlnx-ve -> xlnx-versal.
> 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  hw/arm/xlnx-versal-virt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
> index 878a275140..8a608074d1 100644
> --- a/hw/arm/xlnx-versal-virt.c
> +++ b/hw/arm/xlnx-versal-virt.c
> @@ -440,7 +440,7 @@ static void versal_virt_init(MachineState *machine)
>  psci_conduit = QEMU_PSCI_CONDUIT_SMC;
>  }
>  
> -sysbus_init_child_obj(OBJECT(machine), "xlnx-ve", &s->soc,
> +sysbus_init_child_obj(OBJECT(machine), "xlnx-versal", &s->soc,
>sizeof(s->soc), TYPE_XLNX_VERSAL);
>  object_property_set_link(OBJECT(&s->soc), OBJECT(machine->ram),
>   "ddr", &error_abort);
> 

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v1 02/11] hw/arm: versal: Move misplaced comment

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/27/20 8:16 PM, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> Move misplaced comment.
> 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  hw/arm/xlnx-versal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
> index c73b2fe755..cc696e44c0 100644
> --- a/hw/arm/xlnx-versal.c
> +++ b/hw/arm/xlnx-versal.c
> @@ -36,7 +36,6 @@ static void versal_create_apu_cpus(Versal *s)
>  
>  obj = object_new(XLNX_VERSAL_ACPU_TYPE);
>  if (!obj) {
> -/* Secondary CPUs start in PSCI powered-down state */
>  error_report("Unable to create apu.cpu[%d] of type %s",
>   i, XLNX_VERSAL_ACPU_TYPE);
>  exit(EXIT_FAILURE);
> @@ -49,6 +48,7 @@ static void versal_create_apu_cpus(Versal *s)
>  object_property_set_int(obj, s->cfg.psci_conduit,
>  "psci-conduit", &error_abort);
>  if (i) {
> +/* Secondary CPUs start in PSCI powered-down state */
>  object_property_set_bool(obj, true,
>   "start-powered-off", &error_abort);
>  }
> 

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v1 07/11] hw/arm: versal: Embedd the APUs into the SoC type

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/27/20 8:16 PM, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> Embedd the APUs into the SoC type.
> 
> Suggested-by: Peter Maydell 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  hw/arm/xlnx-versal-virt.c|  4 ++--
>  hw/arm/xlnx-versal.c | 19 +--
>  include/hw/arm/xlnx-versal.h |  2 +-
>  3 files changed, 8 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
> index 8a608074d1..d7be1ad494 100644
> --- a/hw/arm/xlnx-versal-virt.c
> +++ b/hw/arm/xlnx-versal-virt.c
> @@ -469,9 +469,9 @@ static void versal_virt_init(MachineState *machine)
>  s->binfo.get_dtb = versal_virt_get_dtb;
>  s->binfo.modify_dtb = versal_virt_modify_dtb;
>  if (machine->kernel_filename) {
> -arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
> +arm_load_kernel(&s->soc.fpd.apu.cpu[0], machine, &s->binfo);
>  } else {
> -AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
> +AddressSpace *as = arm_boot_address_space(&s->soc.fpd.apu.cpu[0],
>&s->binfo);
>  /* Some boot-loaders (e.g u-boot) don't like blobs at address 0 
> (NULL).
>   * Offset things by 4K.  */
> diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
> index ebd2dc51be..c8a296e2e0 100644
> --- a/hw/arm/xlnx-versal.c
> +++ b/hw/arm/xlnx-versal.c
> @@ -31,19 +31,11 @@ static void versal_create_apu_cpus(Versal *s)
>  
>  for (i = 0; i < ARRAY_SIZE(s->fpd.apu.cpu); i++) {
>  Object *obj;
> -char *name;
> -
> -obj = object_new(XLNX_VERSAL_ACPU_TYPE);
> -if (!obj) {
> -error_report("Unable to create apu.cpu[%d] of type %s",
> - i, XLNX_VERSAL_ACPU_TYPE);
> -exit(EXIT_FAILURE);
> -}
> -
> -name = g_strdup_printf("apu-cpu[%d]", i);
> -object_property_add_child(OBJECT(s), name, obj, &error_fatal);
> -g_free(name);
>  
> +object_initialize_child(OBJECT(s), "apu-cpu[*]",
> +&s->fpd.apu.cpu[i], 
> sizeof(s->fpd.apu.cpu[i]),
> +XLNX_VERSAL_ACPU_TYPE, &error_abort, NULL);

:)

Reviewed-by: Philippe Mathieu-Daudé 

> +obj = OBJECT(&s->fpd.apu.cpu[i]);
>  object_property_set_int(obj, s->cfg.psci_conduit,
>  "psci-conduit", &error_abort);
>  if (i) {
> @@ -57,7 +49,6 @@ static void versal_create_apu_cpus(Versal *s)
>  object_property_set_link(obj, OBJECT(&s->fpd.apu.mr), "memory",
>   &error_abort);
>  object_property_set_bool(obj, true, "realized", &error_fatal);
> -s->fpd.apu.cpu[i] = ARM_CPU(obj);
>  }
>  }
>  
> @@ -95,7 +86,7 @@ static void versal_create_apu_gic(Versal *s, qemu_irq *pic)
>  }
>  
>  for (i = 0; i < nr_apu_cpus; i++) {
> -DeviceState *cpudev = DEVICE(s->fpd.apu.cpu[i]);
> +DeviceState *cpudev = DEVICE(&s->fpd.apu.cpu[i]);
>  int ppibase = XLNX_VERSAL_NR_IRQS + i * GIC_INTERNAL + GIC_NR_SGIS;
>  qemu_irq maint_irq;
>  int ti;
> diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
> index 94b7826fd4..426b66449d 100644
> --- a/include/hw/arm/xlnx-versal.h
> +++ b/include/hw/arm/xlnx-versal.h
> @@ -36,7 +36,7 @@ typedef struct Versal {
>  struct {
>  struct {
>  MemoryRegion mr;
> -ARMCPU *cpu[XLNX_VERSAL_NR_ACPUS];
> +ARMCPU cpu[XLNX_VERSAL_NR_ACPUS];
>  GICv3State gic;
>  } apu;
>  } fpd;
>

Re: [PATCH v1 08/11] hw/arm: versal: Add support for SD

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/27/20 8:16 PM, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> Add support for SD.
> 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  hw/arm/xlnx-versal.c | 31 +++
>  include/hw/arm/xlnx-versal.h | 12 
>  2 files changed, 43 insertions(+)
> 
> diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
> index c8a296e2e0..e263bdf77a 100644
> --- a/hw/arm/xlnx-versal.c
> +++ b/hw/arm/xlnx-versal.c
> @@ -210,6 +210,36 @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
>  }
>  }
>  
> +#define SDHCI_CAPABILITIES  0x280737ec6481 /* Same as on ZynqMP.  */
> +static void versal_create_sds(Versal *s, qemu_irq *pic)
> +{
> +int i;
> +
> +for (i = 0; i < ARRAY_SIZE(s->pmc.iou.sd); i++) {
> +DeviceState *dev;
> +MemoryRegion *mr;
> +
> +sysbus_init_child_obj(OBJECT(s), "sd[*]",
> +  &s->pmc.iou.sd[i], sizeof(s->pmc.iou.sd[i]),
> +  TYPE_SYSBUS_SDHCI);
> +dev = DEVICE(&s->pmc.iou.sd[i]);
> +
> +object_property_set_uint(OBJECT(dev),
> + 3, "sd-spec-version", &error_fatal);
> +object_property_set_uint(OBJECT(dev), SDHCI_CAPABILITIES, "capareg",
> + &error_fatal);
> +object_property_set_uint(OBJECT(dev), UHS_I, "uhs", &error_fatal);
> +qdev_init_nofail(dev);
> +
> +mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
> +memory_region_add_subregion(&s->mr_ps,
> +MM_PMC_SD0 + i * MM_PMC_SD0_SIZE, mr);
> +
> +sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0,
> +   pic[VERSAL_SD0_IRQ_0 + i * 2]);

Reviewed-by: Philippe Mathieu-Daudé 

> +}
> +}
> +
>  /* This takes the board allocated linear DDR memory and creates aliases
>   * for each split DDR range/aperture on the Versal address map.
>   */
> @@ -292,6 +322,7 @@ static void versal_realize(DeviceState *dev, Error **errp)
>  versal_create_uarts(s, pic);
>  versal_create_gems(s, pic);
>  versal_create_admas(s, pic);
> +versal_create_sds(s, pic);
>  versal_map_ddr(s);
>  versal_unimp(s);
>  
> diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
> index 426b66449d..e11693e29d 100644
> --- a/include/hw/arm/xlnx-versal.h
> +++ b/include/hw/arm/xlnx-versal.h
> @@ -14,6 +14,7 @@
>  
>  #include "hw/sysbus.h"
>  #include "hw/arm/boot.h"
> +#include "hw/sd/sdhci.h"
>  #include "hw/intc/arm_gicv3.h"
>  #include "hw/char/pl011.h"
>  #include "hw/dma/xlnx-zdma.h"
> @@ -26,6 +27,7 @@
>  #define XLNX_VERSAL_NR_UARTS   2
>  #define XLNX_VERSAL_NR_GEMS2
>  #define XLNX_VERSAL_NR_ADMAS   8
> +#define XLNX_VERSAL_NR_SDS 2
>  #define XLNX_VERSAL_NR_IRQS192
>  
>  typedef struct Versal {
> @@ -58,6 +60,13 @@ typedef struct Versal {
>  } iou;
>  } lpd;
>  
> +/* The Platform Management Controller subsystem.  */
> +struct {
> +struct {
> +SDHCIState sd[XLNX_VERSAL_NR_SDS];
> +} iou;
> +} pmc;
> +
>  struct {
>  MemoryRegion *mr_ddr;
>  uint32_t psci_conduit;
> @@ -80,6 +89,7 @@ typedef struct Versal {
>  #define VERSAL_GEM1_IRQ_0  58
>  #define VERSAL_GEM1_WAKE_IRQ_0 59
>  #define VERSAL_ADMA_IRQ_0  60
> +#define VERSAL_SD0_IRQ_0   126
>  
>  /* Architecturally reserved IRQs suitable for virtualization.  */
>  #define VERSAL_RSVD_IRQ_FIRST 111
> @@ -129,6 +139,8 @@ typedef struct Versal {
>  #define MM_FPD_CRF  0xfd1aU
>  #define MM_FPD_CRF_SIZE 0x14
>  
> +#define MM_PMC_SD0  0xf104U
> +#define MM_PMC_SD0_SIZE 0x1
>  #define MM_PMC_CRP  0xf126U
>  #define MM_PMC_CRP_SIZE 0x1
>  #endif
>

Re: [PATCH for-5.1 7/7] MAINTAINERS: Add myself as Loongson-3 maintainer

2020-04-28 Thread Philippe Mathieu-Daudé

Hi Huacai,

On 4/27/20 11:33 AM, Huacai Chen wrote:
> Signed-off-by: Huacai Chen 
> Co-developed-by: Jiaxun Yang 
> ---
>  MAINTAINERS | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index aa9a057..efe840b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1080,6 +1080,12 @@ F: hw/isa/vt82c686.c
>  F: hw/pci-host/bonito.c
>  F: include/hw/isa/vt82c686.h
>  
> +Loongson-3
> +M: Huacai Chen 
> +S: Maintained
> +F: hw/mips/mips_loongson3.c
> +F: hw/pci-host/ls7a.c

I still haven't received the series cover, so I'm not sure if you
intended to include the LS7A bridge chip here, but if so it seems you
forgot to include it.

> +
>  Boston
>  M: Paul Burton 
>  R: Aleksandar Rikalo 
>

Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-28 Thread Denis Plotnikov





On 27.04.2020 15:14, Dr. David Alan Gilbert wrote:

* Denis Plotnikov (dplotni...@virtuozzo.com) wrote:

The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new
"enabled_buffered" callback.

It's a bit invasive isn't it - changes a lot of functions in a lot of
places!


If you mean changing the qemu-file code - yes, it is.

If you mean changing the qemu-file usage in the code - no.
The only place to change is the snapshot code when the buffered mode is 
enabled with a callback.

The change is in 03 patch of the series.


The multifd code separated the control headers from the data on separate
fd's - but that doesn't help your case.


yes, that doesn't help


Is there any chance you could do this by using the existing 'save_page'
hook (that RDMA uses).


I don't think so. My goal is to improve writing performance of
the internal snapshot to qcow2 image. The snapshot is saved in qcow2 as
continuous stream placed in the end of address space.
To achieve the best writing speed I need a size and base-aligned buffer
containing the vm state (with ram) which looks like that (related to ram):

... | ram page header | ram page | ram page header | ram page | ... and 
so on


to store the buffer in qcow2 with a single operation.

'save_page' would allow me not to store 'ram page' in the qemu-file 
internal structures,
and write my own ram page storing logic. I think that wouldn't help me a 
lot because:

1. I need a page with the ram page header
2. I want to reduce the number of io operations
3. I want to save other parts of vm state as fast as possible

May be I can't see the better way of using 'save page' callback.
Could you suggest anything?

Denis

In the cover letter you mention direct qemu_fflush calls - have we got a
few too many in some palces that you think we can clean out?


I'm not sure that some of them are excessive. To the best of my knowlege,
qemu-file is used for the source-destination communication on migration
and removing some qemu_fflush-es may break communication logic.

Snapshot is just a special case (if not the only) when we know that we 
can do buffered (cached)
writings. Do you know any other cases when the buffered (cached) mode 
could be useful?




Dave


Signed-off-by: Denis Plotnikov 
---
  include/qemu/typedefs.h |   1 +
  migration/qemu-file.c   | 351 +---
  migration/qemu-file.h   |   9 ++
  3 files changed, 339 insertions(+), 22 deletions(-)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 88dce54..9b388c8 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -98,6 +98,7 @@ typedef struct QEMUBH QEMUBH;
  typedef struct QemuConsole QemuConsole;
  typedef struct QEMUFile QEMUFile;
  typedef struct QEMUFileBuffer QEMUFileBuffer;
+typedef struct QEMUFileAioTask QEMUFileAioTask;
  typedef struct QemuLockable QemuLockable;
  typedef struct QemuMutex QemuMutex;
  typedef struct QemuOpt QemuOpt;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 285c6ef..f42f949 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -29,19 +29,25 @@
  #include "qemu-file.h"
  #include "trace.h"
  #include "qapi/error.h"
+#include "block/aio_task.h"
  
-#define IO_BUF_SIZE 32768

+#define IO_BUF_SIZE (1024 * 1024)
  #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
+#define IO_BUF_NUM 2
+#define IO_BUF_ALIGNMENT 512
  
-QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, 512));

+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, IO_BUF_ALIGNMENT));
+QEMU_BUILD_BUG_ON(IO_BUF_SIZE > INT_MAX);
+QEMU_BUILD_BUG_ON(IO_BUF_NUM <= 0);
  
  struct QEMUFileBuffer {

  int buf_index;
-int buf_size; /* 0 when writing */
+int buf_size; /* 0 when non-buffered writing */
  uint8_t *buf;
  unsigned long *may_free;
  struct iovec *iov;
  unsigned int iovcnt;
+QLIST_ENTRY(QEMUFileBuffer) link;
  };
  
  struct QEMUFile {

@@ -60,6 +66,22 @@ struct QEMUFile {
  bool shutdown;
  /* currently used buffer */
  QEMUFileBuffer *current_buf;
+/*
+ * with buffered_mode enabled all the data copied to 512 byte
+ * aligned buffer, including iov data. Then the buffer is passed
+ * to writev_buffer callback.
+ */
+bool buffered_mode;
+/* for async buffer writing */
+AioTaskPool *pool;
+/* the list of free buffers, currently used on is NOT there */
+QLIST_HEAD(, QEMUFileBuffer) free_buffers;
+};
+
+struct QEMUFileAioTask {
+AioTask task;
+QEMUFile *f;
+QEMUFileBuffer *fb;
  };
  
  /*

@@ -115,10 +137,42 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops)
  f->opaque = opaque;
  f->ops = ops;
  
-f->current_buf = g_new0(QEMUFileBuffer, 1);

-f->current_buf->buf = g_malloc(IO_BUF_SIZE);
-f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE);
-f->current_buf->may_fr

Re: [PATCH v2] [Qemu-devel] target/i386: HAX: Enable ROM/ROM device memory region support

2020-04-28 Thread Philippe Mathieu-Daudé


On 4/28/20 4:45 AM, Colin Xu wrote:


Hi Paolo,

Would you please queue this one?
--
Best Regards,
Colin Xu

On Mon, 30 Mar 2020, Colin Xu wrote:


Looks good to me.

Reviewed-by: Colin Xu 

On 2020-03-30 11:25, hang.y...@linux.intel.com wrote:

From: Hang Yuan 

Add ROM and ROM device memory region support in HAX. Their memory 
region is
read only and write access will generate EPT violation. The violation 
will be

handled in the HAX kernel with the following patch.


"will be"? This patch is 10 months old.



https://github.com/intel/haxm/commit/33ceea09a1655fca12c47f1e112b1d269357ff28 



v2: fix coding style problems


^ This line goes ...



Signed-off-by: Hang Yuan 
---


... here after the '---'.


  target/i386/hax-mem.c | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/target/i386/hax-mem.c b/target/i386/hax-mem.c
index 6bb5a24917..a8bfd37977 100644
--- a/target/i386/hax-mem.c
+++ b/target/i386/hax-mem.c
@@ -175,13 +175,10 @@ static void 
hax_process_section(MemoryRegionSection *section, uint8_t flags)

  uint64_t host_va;
  uint32_t max_mapping_size;

-    /* We only care about RAM and ROM regions */
-    if (!memory_region_is_ram(mr)) {
-    if (memory_region_is_romd(mr)) {
-    /* HAXM kernel module does not support ROMD yet  */
-    warn_report("Ignoring ROMD region 0x%016" PRIx64 
"->0x%016" PRIx64,

-    start_pa, start_pa + size);


Don't you need to check for some kmod version before removing this check?


-    }
+    /* We only care about RAM/ROM regions and ROM device */
+    if (memory_region_is_rom(mr) || (memory_region_is_romd(mr))) {


Redundant parenthesis.


+    flags |= HAX_RAM_INFO_ROM;
+    } else if (!memory_region_is_ram(mr)) {
  return;
  }


If you move the 'if (RAM) return' first, the code becomes easier to review.





--
Best Regards,
Colin Xu

Re: [PATCH v1 04/11] hw/arm: versal: Embedd the UARTs into the SoC type

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/27/20 8:16 PM, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> Embedd the UARTs into the SoC type.
> 
> Suggested-by: Peter Maydell 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  hw/arm/xlnx-versal.c | 12 ++--
>  include/hw/arm/xlnx-versal.h |  3 ++-
>  2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
> index cc696e44c0..dbde03b7e6 100644
> --- a/hw/arm/xlnx-versal.c
> +++ b/hw/arm/xlnx-versal.c
> @@ -21,7 +21,6 @@
>  #include "kvm_arm.h"
>  #include "hw/misc/unimp.h"
>  #include "hw/arm/xlnx-versal.h"
> -#include "hw/char/pl011.h"
>  
>  #define XLNX_VERSAL_ACPU_TYPE ARM_CPU_TYPE_NAME("cortex-a72")
>  #define GEM_REVISION0x40070106
> @@ -144,16 +143,17 @@ static void versal_create_uarts(Versal *s, qemu_irq 
> *pic)
>  DeviceState *dev;
>  MemoryRegion *mr;
>  
> -dev = qdev_create(NULL, TYPE_PL011);
> -s->lpd.iou.uart[i] = SYS_BUS_DEVICE(dev);
> +sysbus_init_child_obj(OBJECT(s), name,
> +  &s->lpd.iou.uart[i], 
> sizeof(s->lpd.iou.uart[i]),
> +  TYPE_PL011);
> +dev = DEVICE(&s->lpd.iou.uart[i]);
>  qdev_prop_set_chr(dev, "chardev", serial_hd(i));
> -object_property_add_child(OBJECT(s), name, OBJECT(dev), 
> &error_fatal);
>  qdev_init_nofail(dev);
>  
> -mr = sysbus_mmio_get_region(s->lpd.iou.uart[i], 0);
> +mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
>  memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
>  
> -sysbus_connect_irq(s->lpd.iou.uart[i], 0, pic[irqs[i]]);
> +sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);

Cleaner :)

Reviewed-by: Philippe Mathieu-Daudé 

>  g_free(name);
>  }
>  }
> diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
> index 6c0a692b2f..a3dfd064b3 100644
> --- a/include/hw/arm/xlnx-versal.h
> +++ b/include/hw/arm/xlnx-versal.h
> @@ -15,6 +15,7 @@
>  #include "hw/sysbus.h"
>  #include "hw/arm/boot.h"
>  #include "hw/intc/arm_gicv3.h"
> +#include "hw/char/pl011.h"
>  
>  #define TYPE_XLNX_VERSAL "xlnx-versal"
>  #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
> @@ -49,7 +50,7 @@ typedef struct Versal {
>  MemoryRegion mr_ocm;
>  
>  struct {
> -SysBusDevice *uart[XLNX_VERSAL_NR_UARTS];
> +PL011State uart[XLNX_VERSAL_NR_UARTS];
>  SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
>  SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
>  } iou;
>

Re: [PATCH v1 09/11] hw/arm: versal: Add support for the RTC

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/27/20 8:16 PM, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> hw/arm: versal: Add support for the RTC.
> 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  hw/arm/xlnx-versal.c | 21 +
>  include/hw/arm/xlnx-versal.h |  8 
>  2 files changed, 29 insertions(+)
> 
> diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
> index e263bdf77a..321171bcce 100644
> --- a/hw/arm/xlnx-versal.c
> +++ b/hw/arm/xlnx-versal.c
> @@ -240,6 +240,26 @@ static void versal_create_sds(Versal *s, qemu_irq *pic)
>  }
>  }
>  
> +static void versal_create_rtc(Versal *s, qemu_irq *pic)
> +{
> +SysBusDevice *sbd;
> +MemoryRegion *mr;
> +
> +sysbus_init_child_obj(OBJECT(s), "rtc", &s->pmc.rtc, sizeof(s->pmc.rtc),
> +  TYPE_XLNX_ZYNQMP_RTC);
> +sbd = SYS_BUS_DEVICE(&s->pmc.rtc);
> +qdev_init_nofail(DEVICE(sbd));
> +
> +mr = sysbus_mmio_get_region(sbd, 0);
> +memory_region_add_subregion(&s->mr_ps, MM_PMC_RTC, mr);
> +
> +/*
> + * TODO: Connect the ALARM and SECONDS interrupts once our RTC model
> + * supports them.
> + */
> +sysbus_connect_irq(sbd, 1, pic[VERSAL_RTC_APB_ERR_IRQ]);

RTC IRQ#1 is 'irq_addr_error_int', OK. Maybe worth later switching to
the qdev gpio API using qdev_init_gpio_out_named() in
hw/rtc/xlnx-zynqmp-rtc.c and then qdev_get_gpio_in_named() here.

Reviewed-by: Philippe Mathieu-Daudé 

> +}
> +
>  /* This takes the board allocated linear DDR memory and creates aliases
>   * for each split DDR range/aperture on the Versal address map.
>   */
> @@ -323,6 +343,7 @@ static void versal_realize(DeviceState *dev, Error **errp)
>  versal_create_gems(s, pic);
>  versal_create_admas(s, pic);
>  versal_create_sds(s, pic);
> +versal_create_rtc(s, pic);
>  versal_map_ddr(s);
>  versal_unimp(s);
>  
> diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
> index e11693e29d..9c9f47ba9d 100644
> --- a/include/hw/arm/xlnx-versal.h
> +++ b/include/hw/arm/xlnx-versal.h
> @@ -19,6 +19,7 @@
>  #include "hw/char/pl011.h"
>  #include "hw/dma/xlnx-zdma.h"
>  #include "hw/net/cadence_gem.h"
> +#include "hw/rtc/xlnx-zynqmp-rtc.h"
>  
>  #define TYPE_XLNX_VERSAL "xlnx-versal"
>  #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
> @@ -65,6 +66,8 @@ typedef struct Versal {
>  struct {
>  SDHCIState sd[XLNX_VERSAL_NR_SDS];
>  } iou;
> +
> +XlnxZynqMPRTC rtc;
>  } pmc;
>  
>  struct {
> @@ -89,7 +92,10 @@ typedef struct Versal {
>  #define VERSAL_GEM1_IRQ_0  58
>  #define VERSAL_GEM1_WAKE_IRQ_0 59
>  #define VERSAL_ADMA_IRQ_0  60
> +#define VERSAL_RTC_APB_ERR_IRQ 121
>  #define VERSAL_SD0_IRQ_0   126
> +#define VERSAL_RTC_ALARM_IRQ   142
> +#define VERSAL_RTC_SECONDS_IRQ 143
>  
>  /* Architecturally reserved IRQs suitable for virtualization.  */
>  #define VERSAL_RSVD_IRQ_FIRST 111
> @@ -143,4 +149,6 @@ typedef struct Versal {
>  #define MM_PMC_SD0_SIZE 0x1
>  #define MM_PMC_CRP  0xf126U
>  #define MM_PMC_CRP_SIZE 0x1
> +#define MM_PMC_RTC  0xf12a
> +#define MM_PMC_RTC_SIZE 0x1
>  #endif
>

Re: [PATCH 3/3] virtio-net: remove VIRTIO_NET_HDR_F_RSC_INFO compat handling

2020-04-28 Thread Jason Wang




On 2020/4/27 下午6:24, Cornelia Huck wrote:

VIRTIO_NET_HDR_F_RSC_INFO is available in the headers now.

Signed-off-by: Cornelia Huck 
---
  hw/net/virtio-net.c | 8 
  1 file changed, 8 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e85d902588b3..7449570c7123 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -77,14 +77,6 @@
 tso/gso/gro 'off'. */
  #define VIRTIO_NET_RSC_DEFAULT_INTERVAL 30
  
-/* temporary until standard header include it */

-#if !defined(VIRTIO_NET_HDR_F_RSC_INFO)
-
-#define VIRTIO_NET_HDR_F_RSC_INFO  4 /* rsc_ext data in csum_ fields */
-#define VIRTIO_NET_F_RSC_EXT   61
-
-#endif
-
  static inline __virtio16 *virtio_net_rsc_ext_num_packets(
  struct virtio_net_hdr *hdr)
  {



I think we should not keep the those tricky num_packets/dup_acks.

Thanks

Re: [PATCH v2 3/6] block/block-copy: add state pointer to BlockCopyTask

2020-04-28 Thread Max Reitz

On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:
> We are going to use aio-task-pool API, so we'll need state pointer in
> BlockCopyTask anyway. Add it now and use where possible.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/block-copy.c | 30 --
>  1 file changed, 16 insertions(+), 14 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 3/3] virtio-net: remove VIRTIO_NET_HDR_F_RSC_INFO compat handling

2020-04-28 Thread Cornelia Huck

On Tue, 28 Apr 2020 16:19:15 +0800
Jason Wang  wrote:

> On 2020/4/27 下午6:24, Cornelia Huck wrote:
> > VIRTIO_NET_HDR_F_RSC_INFO is available in the headers now.
> >
> > Signed-off-by: Cornelia Huck 
> > ---
> >   hw/net/virtio-net.c | 8 
> >   1 file changed, 8 deletions(-)
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index e85d902588b3..7449570c7123 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -77,14 +77,6 @@
> >  tso/gso/gro 'off'. */
> >   #define VIRTIO_NET_RSC_DEFAULT_INTERVAL 30
> >   
> > -/* temporary until standard header include it */
> > -#if !defined(VIRTIO_NET_HDR_F_RSC_INFO)
> > -
> > -#define VIRTIO_NET_HDR_F_RSC_INFO  4 /* rsc_ext data in csum_ fields */
> > -#define VIRTIO_NET_F_RSC_EXT   61
> > -
> > -#endif
> > -
> >   static inline __virtio16 *virtio_net_rsc_ext_num_packets(
> >   struct virtio_net_hdr *hdr)
> >   {  
> 
> 
> I think we should not keep the those tricky num_packets/dup_acks.

No real opinion here, patch 3 is only a cleanup.

The important one is patch 1, because without it I cannot do a headers
update.

Re: [PATCH for-5.1 7/7] MAINTAINERS: Add myself as Loongson-3 maintainer

2020-04-28 Thread chen huacai

Hi, Philippe,

On Tue, Apr 28, 2020 at 2:18 PM Philippe Mathieu-Daudé  wrote:
>
> Hi Huacai,
>
> On 4/27/20 11:33 AM, Huacai Chen wrote:
> > Signed-off-by: Huacai Chen 
> > Co-developed-by: Jiaxun Yang 
> > ---
> >  MAINTAINERS | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index aa9a057..efe840b 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -1080,6 +1080,12 @@ F: hw/isa/vt82c686.c
> >  F: hw/pci-host/bonito.c
> >  F: include/hw/isa/vt82c686.h
> >
> > +Loongson-3
> > +M: Huacai Chen 
> > +S: Maintained
> > +F: hw/mips/mips_loongson3.c
> > +F: hw/pci-host/ls7a.c
>
> I still haven't received the series cover, so I'm not sure if you
> intended to include the LS7A bridge chip here, but if so it seems you
> forgot to include it.
I'm sorry that I've rework the qemu patchset together with Jiaxun
Yang. Now we have dropped LS7A bridge and use GPEX instead. This patch
should also be updated but I forgot.

>
> > +
> >  Boston
> >  M: Paul Burton 
> >  R: Aleksandar Rikalo 
> >



-- 
Huacai Chen

Re: About hardfloat in ppc

2020-04-28 Thread Alex Bennée



罗勇刚(Yonggang Luo)  writes:

> I am confusing why only  inexact  are set then we can use hard-float.

The inexact behaviour of the host hardware may be different from the
guest architecture we are trying to emulate and the host hardware may
not be configurable to emulate the guest mode.

Have a look in softfloat.c and see all the places where
float_flag_inexact is set. Can you convince yourself that the host
hardware will do the same?

> And PPC always clearing inexact  flag before calling to soft-float
> funcitons. so we can not
> optimize it with hard-float.
> I need some resouces about ineact flag and why always clearing inexcat in
> PPC FP simualtion.

Because that is the behaviour of the PPC floating point unit. The
inexact flag will represent the last operation done.

> I am looking for two possible solution:
> 1. do not clear inexact flag in PPC simulation
> 2. even the inexact are cleared, we can still use alternative hard-float.
>
> But now I am the beginner, Have no clue about all the things.

Well you'll need to learn about floating point because these are rather
fundamental aspects of it's behaviour. In the old days QEMU used to use
the host floating point processor with it's template based translation.
However this led to lots of weird bugs because the floating point
answers under qemu where different from the target it was trying to
emulate. It was for this reason softfloat was introduced. The hardfloat
optimisation can only be done when we are confident that we will get the
exact same answer of the target we are trying to emulate - a "faster but
incorrect" mode is just going to cause confusion as discussed in the
previous thread. Have you read that yet?

>
> On Mon, Apr 27, 2020 at 7:10 PM Alex Bennée  wrote:
>
>>
>> BALATON Zoltan  writes:
>>
>> > On Mon, 27 Apr 2020, Alex Bennée wrote:
>> >> 罗勇刚(Yonggang Luo)  writes:
>> >>> Because ppc fpu-helper are always clearing float_flag_inexact,
>> >>> So is that possible to optimize the performance when
>> float_flag_inexact
>> >>> are cleared?
>> >>
>> >> There was some discussion about this in the last thread about enabling
>> >> hardfloat for PPC. See the thread:
>> >>
>> >>  Subject: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
>> >>  Date: Tue, 18 Feb 2020 18:10:16 +0100
>> >>  Message-Id: <20200218171702.979f0746...@zero.eik.bme.hu>
>> >
>> > I've answered this already with link to that thread here:
>> >
>> > On Fri, 10 Apr 2020, BALATON Zoltan wrote:
>> > : Date: Fri, 10 Apr 2020 20:04:53 +0200 (CEST)
>> > : From: BALATON Zoltan 
>> > : To: "罗勇刚(Yonggang Luo)" 
>> > : Cc: qemu-devel@nongnu.org, Mark Cave-Ayland, John Arbuckle,
>> qemu-...@nongnu.org, Paul Clarke, Howard Spoelstra, David Gibson
>> > : Subject: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
>> > :
>> > : On Fri, 10 Apr 2020, 罗勇刚(Yonggang Luo) wrote:
>> > :> Are this stable now? I'd like to see hard float to be landed:)
>> > :
>> > : If you want to see hardfloat for PPC then you should read the
>> > replies to : this patch which can be found here:
>> > :
>> > : http://patchwork.ozlabs.org/patch/1240235/
>> > :
>> > : to understand what's needed then try to implement the solution with
>> > FP : exceptions cached in a global that maybe could work. I won't be
>> > able to : do that as said here:
>> > :
>> > : https://lists.nongnu.org/archive/html/qemu-ppc/2020-03/msg6.html
>> > :
>> > : because I don't have time to learn all the details needed. I think :
>> > others are in the same situation so unless somebody puts in the :
>> > necessary effort this won't change.
>> >
>> > Which also had a proposed solution to the problem that you could try
>> > to implement, in particular see this message:
>> >
>> >
>> http://patchwork.ozlabs.org/project/qemu-devel/patch/20200218171702.979f0746...@zero.eik.bme.hu/#2375124
>> >
>> > amd Richard's reply immediately below that. In short to optimise FPU
>> > emulation we would either find a way to compute inexact flag quickly
>> > without reading the FPU status (this may not be possible) or somehow
>> > get status from the FPU but the obvious way of claring the flag and
>> > reading them after each operation is too slow. So maybe using
>> > exceptions and only clearing when actually there's a change could be
>> > faster.
>> >
>> > As to how to use exceptions see this message in above thread:
>> >
>> > https://lists.nongnu.org/archive/html/qemu-ppc/2020-03/msg5.html
>> >
>> > But that's only to show how to hook in an exception handler what it
>> > does needs to be implemented. Then tested and benchmarked.
>> >
>> > I still don't know where are the extensive PPC floating point tests to
>> > use for checking results though as that was never answered.
>>
>> Specifically for PPC we don't have them. We use the softfloat test cases
>> to exercise our softfloat/hardfloat code as part of "make
>> check-softfloat". You can also re-build fp-bench for each guest target
>> to measure raw throughput.
>>
>> >> However in short the problem

Re: [PATCH v1 06/11] hw/arm: versal: Embedd the ADMAs into the SoC type

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/27/20 8:16 PM, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> Embedd the ADMAs into the SoC type.
> 
> Suggested-by: Peter Maydell 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  hw/arm/xlnx-versal.c | 14 +++---
>  include/hw/arm/xlnx-versal.h |  3 ++-
>  2 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
> index e424aa789e..ebd2dc51be 100644
> --- a/hw/arm/xlnx-versal.c
> +++ b/hw/arm/xlnx-versal.c
> @@ -203,18 +203,18 @@ static void versal_create_admas(Versal *s, qemu_irq 
> *pic)
>  DeviceState *dev;
>  MemoryRegion *mr;
>  
> -dev = qdev_create(NULL, "xlnx.zdma");
> -s->lpd.iou.adma[i] = SYS_BUS_DEVICE(dev);
> -object_property_set_int(OBJECT(s->lpd.iou.adma[i]), 128, "bus-width",
> -&error_abort);
> -object_property_add_child(OBJECT(s), name, OBJECT(dev), 
> &error_fatal);
> +sysbus_init_child_obj(OBJECT(s), name,
> +  &s->lpd.iou.adma[i], 
> sizeof(s->lpd.iou.adma[i]),
> +  TYPE_XLNX_ZDMA);
> +dev = DEVICE(&s->lpd.iou.adma[i]);
> +object_property_set_int(OBJECT(dev), 128, "bus-width", &error_abort);
>  qdev_init_nofail(dev);
>  
> -mr = sysbus_mmio_get_region(s->lpd.iou.adma[i], 0);
> +mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
>  memory_region_add_subregion(&s->mr_ps,
>  MM_ADMA_CH0 + i * MM_ADMA_CH0_SIZE, mr);
>  
> -sysbus_connect_irq(s->lpd.iou.adma[i], 0, pic[VERSAL_ADMA_IRQ_0 + 
> i]);
> +sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[VERSAL_ADMA_IRQ_0 + 
> i]);
>  g_free(name);
>  }
>  }
> diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
> index 01da736a5b..94b7826fd4 100644
> --- a/include/hw/arm/xlnx-versal.h
> +++ b/include/hw/arm/xlnx-versal.h
> @@ -16,6 +16,7 @@
>  #include "hw/arm/boot.h"
>  #include "hw/intc/arm_gicv3.h"
>  #include "hw/char/pl011.h"
> +#include "hw/dma/xlnx-zdma.h"
>  #include "hw/net/cadence_gem.h"
>  
>  #define TYPE_XLNX_VERSAL "xlnx-versal"
> @@ -53,7 +54,7 @@ typedef struct Versal {
>  struct {
>  PL011State uart[XLNX_VERSAL_NR_UARTS];
>  CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
> -SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
> +XlnxZDMA adma[XLNX_VERSAL_NR_ADMAS];
>  } iou;
>  } lpd;
>  
> 

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] hax: Dynamic allocate vcpu state structure

2020-04-28 Thread Philippe Mathieu-Daudé


On 4/28/20 4:47 AM, Colin Xu wrote:


And this one. 3 patches for HAX.

Thanks in advance.
--
Best Regards,
Colin Xu

On Mon, 20 Apr 2020, Colin Xu wrote:



Looks good to me.

Reviewed-by: Colin Xu 

--
Best Regards,
Colin Xu

On Mon, 6 Apr 2020, WangBowen wrote:


Dynamic allocating vcpu state structure according to smp value to be
more precise and safe. Previously it will alloccate array of fixed size
HAX_MAX_VCPU.

This is achieved by using g_new0 to dynamic allocate the array. The
allocated size is obtained from smp.max_cpus in MachineState. Also, the
size is compared with HAX_MAX_VCPU when creating the vm. The reason for
choosing dynamic array over linked list is because the status is visited
by index all the time.

This will lead to QEMU checking whether the smp value is larger than the
HAX_MAX_VCPU when creating vm, if larger, the process will terminate,
otherwise it will allocate array of size smp to store the status.

Signed-off-by: WangBowen 
---
target/i386/hax-all.c  | 25 +++--
target/i386/hax-i386.h |  5 +++--
2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/target/i386/hax-all.c b/target/i386/hax-all.c
index a8b6e5aeb8..a22adec5da 100644
--- a/target/i386/hax-all.c
+++ b/target/i386/hax-all.c
@@ -232,10 +232,10 @@ int hax_init_vcpu(CPUState *cpu)
    return ret;
}

-struct hax_vm *hax_vm_create(struct hax_state *hax)
+struct hax_vm *hax_vm_create(struct hax_state *hax, int max_cpus)
{
    struct hax_vm *vm;
-    int vm_id = 0, ret;
+    int vm_id = 0, ret, i;

    if (hax_invalid_fd(hax->fd)) {
    return NULL;
@@ -259,6 +259,17 @@ struct hax_vm *hax_vm_create(struct hax_state *hax)
    goto error;
    }

+    if (max_cpus > HAX_MAX_VCPU) {
+    fprintf(stderr, "Maximum VCPU number QEMU supported is 
%d\n", HAX_MAX_VCPU);

+    goto error;
+    }
+


Move this check before the 'vm = g_new0(struct hax_vm, 1);' and simply 
return NULL, no need to goto error, else you are leaking vm->fd.



+    vm->numvcpus = max_cpus;
+    vm->vcpus = g_new0(struct hax_vcpu_state *, vm->numvcpus);
+    for (i = 0; i < vm->numvcpus; i++) {
+    vm->vcpus[i] = NULL;
+    }
+
    hax->vm = vm;
    return vm;

@@ -272,12 +283,14 @@ int hax_vm_destroy(struct hax_vm *vm)
{
    int i;

-    for (i = 0; i < HAX_MAX_VCPU; i++)
+    for (i = 0; i < vm->numvcpus; i++)
    if (vm->vcpus[i]) {
    fprintf(stderr, "VCPU should be cleaned before vm clean\n");
    return -1;
    }
    hax_close_fd(vm->fd);
+    vm->numvcpus = 0;
+    g_free(vm->vcpus);
    g_free(vm);
    hax_global.vm = NULL;
    return 0;
@@ -292,7 +305,7 @@ static void hax_handle_interrupt(CPUState *cpu, 
int mask)

    }
}

-static int hax_init(ram_addr_t ram_size)
+static int hax_init(ram_addr_t ram_size, int max_cpus)
{
    struct hax_state *hax = NULL;
    struct hax_qemu_version qversion;
@@ -324,7 +337,7 @@ static int hax_init(ram_addr_t ram_size)
    goto error;
    }

-    hax->vm = hax_vm_create(hax);
+    hax->vm = hax_vm_create(hax, max_cpus);
    if (!hax->vm) {
    fprintf(stderr, "Failed to create HAX VM\n");
    ret = -EINVAL;
@@ -352,7 +365,7 @@ static int hax_init(ram_addr_t ram_size)

static int hax_accel_init(MachineState *ms)
{
-    int ret = hax_init(ms->ram_size);
+    int ret = hax_init(ms->ram_size, (int)ms->smp.max_cpus);

    if (ret && (ret != -ENOSPC)) {
    fprintf(stderr, "No accelerator found.\n");
diff --git a/target/i386/hax-i386.h b/target/i386/hax-i386.h
index 54e9d8b057..7d988f81da 100644
--- a/target/i386/hax-i386.h
+++ b/target/i386/hax-i386.h
@@ -47,7 +47,8 @@ struct hax_state {
struct hax_vm {
    hax_fd fd;
    int id;
-    struct hax_vcpu_state *vcpus[HAX_MAX_VCPU];
+    int numvcpus;
+    struct hax_vcpu_state **vcpus;
};

#ifdef NEED_CPU_H
@@ -58,7 +59,7 @@ int valid_hax_tunnel_size(uint16_t size);
/* Host specific functions */
int hax_mod_version(struct hax_state *hax, struct hax_module_version 
*version);

int hax_inject_interrupt(CPUArchState *env, int vector);
-struct hax_vm *hax_vm_create(struct hax_state *hax);
+struct hax_vm *hax_vm_create(struct hax_state *hax, int max_cpus);
int hax_vcpu_run(struct hax_vcpu_state *vcpu);
int hax_vcpu_create(int id);
int hax_sync_vcpu_state(CPUArchState *env, struct vcpu_state_t *state,
--
2.24.1

Re: [PATCH] virtiofsd: Show submounts

2020-04-28 Thread Daniel P . Berrangé

On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote:
> * Max Reitz (mre...@redhat.com) wrote:
> > Currently, setup_mounts() bind-mounts the shared directory without
> > MS_REC.  This makes all submounts disappear.
> > 
> > Pass MS_REC so that the guest can see submounts again.
> 
> Thanks!
> 
> > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> 
> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> 
> > Signed-off-by: Max Reitz 
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c 
> > b/tools/virtiofsd/passthrough_ll.c
> > index 4c35c95b25..9d7f863e66 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> >  int oldroot;
> >  int newroot;
> >  
> > -if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> > +if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> >  fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, 
> > source);
> >  exit(1);
> >  }
> 
> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> host?

I think that probably makes sense to have MS_SLAVE by default, as that
means the set of files exposed to the guest is consistent across a QEMU
restart. Without MS_SLAVE new mounts aren't visible until after QEMU
is restarted which is likely surprising/undesirable to admins.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH for-5.1 4/7] target/mips: Add Loongson-3 CPU definition

2020-04-28 Thread chen huacai

Hi, Philippe,

On Tue, Apr 28, 2020 at 2:34 PM Philippe Mathieu-Daudé  wrote:
>
> Hi Huacai,
>
> On 4/27/20 11:33 AM, Huacai Chen wrote:
> > Loongson-3 CPU family include Loongson-3A R1/R2/R3/R4 and Loongson-3B
> > R1/R2. Loongson-3A R4 is the newest and its ISA is almost the superset
> > of all others. To reduce complexity, we just define a "Loongson-3A" CPU
> > which is corresponding to Loongson-3A R4. Loongson-3A has CONFIG6 and
> > CONFIG7, so add their bit-fields as well.
>
> Is there a public datasheet for R4? (If possible in English).
I'm sorry that we only have Chinese datasheet in www.loongson.cn.

>
> >
> > Signed-off-by: Huacai Chen 
> > Co-developed-by: Jiaxun Yang 
> > ---
> >  target/mips/cpu.h| 28 ++
> >  target/mips/internal.h   |  2 ++
> >  target/mips/mips-defs.h  |  7 --
> >  target/mips/translate.c  |  2 ++
> >  target/mips/translate_init.inc.c | 51 
> > 
> >  5 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/target/mips/cpu.h b/target/mips/cpu.h
> > index 94d01ea..0b3c987 100644
> > --- a/target/mips/cpu.h
> > +++ b/target/mips/cpu.h
> > @@ -940,7 +940,35 @@ struct CPUMIPSState {
> >  #define CP0C5_UFR  2
> >  #define CP0C5_NFExists 0
> >  int32_t CP0_Config6;
> > +int32_t CP0_Config6_rw_bitmask;
> > +#define CP0C6_BPPASS  31
> > +#define CP0C6_KPOS24
> > +#define CP0C6_KE  23
> > +#define CP0C6_VTLBONLY22
> > +#define CP0C6_LASX21
> > +#define CP0C6_SSEN20
> > +#define CP0C6_DISDRTIME   19
> > +#define CP0C6_PIXNUEN 18
> > +#define CP0C6_SCRAND  17
> > +#define CP0C6_LLEXCEN 16
> > +#define CP0C6_DISVC   15
> > +#define CP0C6_VCLRU   14
> > +#define CP0C6_DCLRU   13
> > +#define CP0C6_PIXUEN  12
> > +#define CP0C6_DISBLKLYEN  11
> > +#define CP0C6_UMEMUALEN   10
> > +#define CP0C6_SFBEN   8
> > +#define CP0C6_FLTINT  7
> > +#define CP0C6_VLTINT  6
> > +#define CP0C6_DISBTB  5
> > +#define CP0C6_STPREFCTL   2
> > +#define CP0C6_INSTPREF1
> > +#define CP0C6_DATAPREF0
> >  int32_t CP0_Config7;
> > +int64_t CP0_Config7_rw_bitmask;
> > +#define CP0C7_NAPCGEN   2
> > +#define CP0C7_UNIMUEN   1
> > +#define CP0C7_VFPUCGEN  0
> >  uint64_t CP0_LLAddr;
> >  uint64_t CP0_MAAR[MIPS_MAAR_MAX];
> >  int32_t CP0_MAARI;
> > diff --git a/target/mips/internal.h b/target/mips/internal.h
> > index 1bf274b..7853cb1 100644
> > --- a/target/mips/internal.h
> > +++ b/target/mips/internal.h
> > @@ -36,7 +36,9 @@ struct mips_def_t {
> >  int32_t CP0_Config5;
> >  int32_t CP0_Config5_rw_bitmask;
> >  int32_t CP0_Config6;
> > +int32_t CP0_Config6_rw_bitmask;
> >  int32_t CP0_Config7;
> > +int32_t CP0_Config7_rw_bitmask;
> >  target_ulong CP0_LLAddr_rw_bitmask;
> >  int CP0_LLAddr_shift;
> >  int32_t SYNCI_Step;
> > diff --git a/target/mips/mips-defs.h b/target/mips/mips-defs.h
> > index a831bb4..c2c96db 100644
> > --- a/target/mips/mips-defs.h
> > +++ b/target/mips/mips-defs.h
> > @@ -51,8 +51,9 @@
> >   */
> >  #define INSN_LOONGSON2E   0x0001ULL
> >  #define INSN_LOONGSON2F   0x0002ULL
> > -#define INSN_VR54XX   0x0004ULL
> > -#define INSN_R59000x0008ULL
> > +#define INSN_LOONGSON3A   0x0004ULL
> > +#define INSN_VR54XX   0x0008ULL
> > +#define INSN_R59000x0010ULL
> >  /*
> >   *   bits 56-63: vendor-specific ASEs
> >   */
> > @@ -94,6 +95,8 @@
> >  /* Wave Computing: "nanoMIPS" */
> >  #define CPU_NANOMIPS32  (CPU_MIPS32R6 | ISA_NANOMIPS32)
> >
> > +#define CPU_LOONGSON3A  (CPU_MIPS64R2 | INSN_LOONGSON3A)
> > +
> >  /*
> >   * Strictly follow the architecture standard:
> >   * - Disallow "special" instruction handling for PMON/SPIM.
> > diff --git a/target/mips/translate.c b/target/mips/translate.c
> > index 25b595a..2caf4cb 100644
> > --- a/target/mips/translate.c
> > +++ b/target/mips/translate.c
> > @@ -31206,7 +31206,9 @@ void cpu_state_reset(CPUMIPSState *env)
> >  env->CP0_Config5 = env->cpu_model->CP0_Config5;
> >  env->CP0_Config5_rw_bitmask = env->cpu_model->CP0_Config5_rw_bitmask;
> >  env->CP0_Config6 = env->cpu_model->CP0_Config6;
> > +env->CP0_Config6_rw_bitmask = env->cpu_model->CP0_Config6_rw_bitmask;
> >  env->CP0_Config7 = env->cpu_model->CP0_Config7;
> > +env->CP0_Config7_rw_bitmask = env->cpu_model->CP0_Config7_rw_bitmask;
> >  env->CP0_LLAddr_rw_bitmask = env->cpu_model->CP0_LLAddr_rw_bitmask
> >   << env->cpu_model->CP0_LLAddr_shift;
> >  env->CP0_LLAddr_shift = env->cpu_model->CP0_LLAddr_shift;
> > diff --git a/target/mips/translate_init.inc.c 
> > b/target/mips/translate_init.inc.c
> > index 6d145a9..a32412d 100644
> > --- a/tar

Re: [PATCH v1 05/11] hw/arm: versal: Embedd the GEMs into the SoC type

2020-04-28 Thread Philippe Mathieu-Daudé

On 4/27/20 8:16 PM, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> Embedd the GEMs into the SoC type.
> 
> Suggested-by: Peter Maydell 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  hw/arm/xlnx-versal.c | 15 ---
>  include/hw/arm/xlnx-versal.h |  3 ++-
>  2 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
> index dbde03b7e6..e424aa789e 100644
> --- a/hw/arm/xlnx-versal.c
> +++ b/hw/arm/xlnx-versal.c
> @@ -170,25 +170,26 @@ static void versal_create_gems(Versal *s, qemu_irq *pic)
>  DeviceState *dev;
>  MemoryRegion *mr;
>  
> -dev = qdev_create(NULL, "cadence_gem");
> -s->lpd.iou.gem[i] = SYS_BUS_DEVICE(dev);
> -object_property_add_child(OBJECT(s), name, OBJECT(dev), 
> &error_fatal);
> +sysbus_init_child_obj(OBJECT(s), name,
> +  &s->lpd.iou.gem[i], sizeof(s->lpd.iou.gem[i]),
> +  TYPE_CADENCE_GEM);
> +dev = DEVICE(&s->lpd.iou.gem[i]);
>  if (nd->used) {
>  qemu_check_nic_model(nd, "cadence_gem");
>  qdev_set_nic_properties(dev, nd);
>  }
> -object_property_set_int(OBJECT(s->lpd.iou.gem[i]),
> +object_property_set_int(OBJECT(dev),
>  2, "num-priority-queues",
>  &error_abort);
> -object_property_set_link(OBJECT(s->lpd.iou.gem[i]),
> +object_property_set_link(OBJECT(dev),
>   OBJECT(&s->mr_ps), "dma",
>   &error_abort);
>  qdev_init_nofail(dev);
>  
> -mr = sysbus_mmio_get_region(s->lpd.iou.gem[i], 0);
> +mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
>  memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
>  
> -sysbus_connect_irq(s->lpd.iou.gem[i], 0, pic[irqs[i]]);
> +sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
>  g_free(name);
>  }
>  }
> diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
> index a3dfd064b3..01da736a5b 100644
> --- a/include/hw/arm/xlnx-versal.h
> +++ b/include/hw/arm/xlnx-versal.h
> @@ -16,6 +16,7 @@
>  #include "hw/arm/boot.h"
>  #include "hw/intc/arm_gicv3.h"
>  #include "hw/char/pl011.h"
> +#include "hw/net/cadence_gem.h"
>  
>  #define TYPE_XLNX_VERSAL "xlnx-versal"
>  #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
> @@ -51,7 +52,7 @@ typedef struct Versal {
>  
>  struct {
>  PL011State uart[XLNX_VERSAL_NR_UARTS];
> -SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
> +CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
>  SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
>  } iou;
>  } lpd;
> 

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v20 3/4] qcow2: add zstd cluster compression

2020-04-28 Thread Denis Plotnikov





On 28.04.2020 09:16, Max Reitz wrote:

On 27.04.20 21:26, Denis Plotnikov wrote:


On 27.04.2020 15:35, Max Reitz wrote:

On 21.04.20 10:11, Denis Plotnikov wrote:

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
    time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
    src.img [zlib|zstd]_compressed.img
decompress cmd
    time ./qemu-img convert -O qcow2
    [zlib|zstd]_compressed.img uncompressed.img

     compression   decompression
   zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)    1.9  1.6 (-16 %)
user 65.0   15.8    5.3  2.5
sys   3.3    0.2    2.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
QAPI part:
Acked-by: Markus Armbruster 
---
   docs/interop/qcow2.txt |   1 +
   configure  |   2 +-
   qapi/block-core.json   |   3 +-
   block/qcow2-threads.c  | 157 +
   block/qcow2.c  |   7 ++
   5 files changed, 168 insertions(+), 2 deletions(-)

[...]


diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..0525718704 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
   #define ZLIB_CONST
   #include 
   +#ifdef CONFIG_ZSTD
+#include 
+#include 
+#endif
+
   #include "qcow2.h"
   #include "block/thread-pool.h"
   #include "crypto.h"
@@ -166,6 +171,148 @@ static ssize_t qcow2_zlib_decompress(void
*dest, size_t dest_size,
   return ret;
   }
   +#ifdef CONFIG_ZSTD
+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store
compressed data
+ *  -EIO    on any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+    ssize_t ret;
+    ZSTD_outBuffer output = { dest, dest_size, 0 };
+    ZSTD_inBuffer input = { src, src_size, 0 };

Minor style note: I think it’d be nicer to use designated initializers
here.


+    ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+    if (!cctx) {
+    return -EIO;
+    }
+    /*
+ * Use the zstd streamed interface for symmetry with decompression,
+ * where streaming is essential since we don't record the exact
+ * compressed size.
+ *
+ * In the loop, we try to compress all the data into one zstd
frame.
+ * ZSTD_compressStream2 potentially can finish a frame earlier
+ * than the full input data is consumed. That's why we are looping
+ * until all the input data is consumed.
+ */
+    while (input.pos < input.size) {
+    size_t zstd_ret;
+    /*
+ * ZSTD spec: "You must continue calling ZSTD_compressStream2()
+ * with ZSTD_e_end until it returns 0, at which point you are
+ * free to start a new frame". We assume that "start a new
frame"
+ * means call ZSTD_compressStream2 in the very beginning or
when
+ * ZSTD_compressStream2 has returned with 0.
+ */
+    do {
+    zstd_ret = ZSTD_compressStream2(cctx, &output, &input,
ZSTD_e_end);

The spec makes it sound to me like ZSTD_e_end will always complete in a
single call if there’s enough space in the output buffer.  So the only
team we have to loop would be when there isn’t enough space anyway:

It says this about ZSTD_e_end:

flush operation is the same, and follows same rules as calling
ZSTD_compressStream2() with ZSTD_e_flush.

Those rules being:

Note that, if `output->size` is too small, a single invocation with
ZSTD_e_flush might not be enough (return code > 0).

So it seems like it will only return a value > 0 if the output buffer is
definitely too small.

The spec also notes that the return value is greater than 0 if:

0 if some data still present within internal buffer (the value is

minimal estimation of remaining size),

So it’s a minimum estimate.  That’s another point that heavily implies
to me that if the return value were less than what’s left in the buffer,
the function w

Re: [PATCH for-5.1 3/7] hw/mips: Add CPU IRQ3 delivery for KVM

2020-04-28 Thread chen huacai

Hi, Philippe,

On Mon, Apr 27, 2020 at 5:57 PM Philippe Mathieu-Daudé  wrote:
>
> On 4/27/20 11:33 AM, Huacai Chen wrote:
> > Currently, KVM/MIPS only deliver I/O interrupt via IP2, this patch add
> > IP2 delivery as well, because Loongson-3 based machine use both IRQ2
> > (CPU's IP2) and IRQ3 (CPU's IP3).
> >
> > Signed-off-by: Huacai Chen 
> > Co-developed-by: Jiaxun Yang 
> > ---
> >  hw/mips/mips_int.c | 6 ++
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/mips/mips_int.c b/hw/mips/mips_int.c
> > index 796730b..5526219 100644
> > --- a/hw/mips/mips_int.c
> > +++ b/hw/mips/mips_int.c
> > @@ -48,16 +48,14 @@ static void cpu_mips_irq_request(void *opaque, int irq, 
> > int level)
> >  if (level) {
> >  env->CP0_Cause |= 1 << (irq + CP0Ca_IP);
> >
> > -if (kvm_enabled() && irq == 2) {
> > +if (kvm_enabled() && (irq == 2 || irq == 3))
>
> Shouldn't we check env->CP0_Config6 (or Config7) has the required
> feature first?
I'm sorry that I can't understand IRQ delivery has something to do
with Config6/Config7, to identify Loongson-3?

>
> >  kvm_mips_set_interrupt(cpu, irq, level);
> > -}
> >
> >  } else {
> >  env->CP0_Cause &= ~(1 << (irq + CP0Ca_IP));
> >
> > -if (kvm_enabled() && irq == 2) {
> > +if (kvm_enabled() && (irq == 2 || irq == 3))
> >  kvm_mips_set_interrupt(cpu, irq, level);
> > -}
> >  }
> >
> >  if (env->CP0_Cause & CP0Ca_IP_mask) {
> >



-- 
Huacai Chen

Re: [PATCH v2 1/6] block/block-copy: rename in-flight requests to tasks

2020-04-28 Thread Vladimir Sementsov-Ogievskiy


28.04.2020 10:30, Max Reitz wrote:

On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:

We are going to use aio-task-pool API and extend in-flight request
structure to be a successor of AioTask, so rename things appropriately.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/block-copy.c | 99 +++---
  1 file changed, 49 insertions(+), 50 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 05227e18bf..61d1d26991 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c


[...]


-static void coroutine_fn block_copy_inflight_req_shrink(BlockCopyState *s,
-BlockCopyInFlightReq *req, int64_t new_bytes)
+static void coroutine_fn block_copy_task_shrink(BlockCopyState *s,
+BlockCopyTask *task,
+int64_t new_bytes)
  {
-if (new_bytes == req->bytes) {
+if (new_bytes == task->bytes) {
  return;
  }
  
-assert(new_bytes > 0 && new_bytes < req->bytes);

+assert(new_bytes > 0 && new_bytes < task->bytes);
  
-s->in_flight_bytes -= req->bytes - new_bytes;

+s->in_flight_bytes -= task->bytes - new_bytes;
  bdrv_set_dirty_bitmap(s->copy_bitmap,
-  req->offset + new_bytes, req->bytes - new_bytes);
+  task->offset + new_bytes, task->bytes - new_bytes);
+s->in_flight_bytes -= task->bytes - new_bytes;


This line doesn’t seem right.



Hmm yes.. A kind of copy-paste or rebase artifact.


--
Best regards,
Vladimir

RE: [PATCH 1/2] Fix undefined behaviour

2020-04-28 Thread Paul Durrant

> -Original Message-
> From: Grzegorz Uriasz 
> Sent: 28 April 2020 07:29
> To: qemu-devel@nongnu.org
> Cc: Grzegorz Uriasz ; marma...@invisiblethingslab.com; 
> ar...@puzio.waw.pl;
> ja...@bartmin.ski; j.nowa...@student.uw.edu.pl; Stefano Stabellini 
> ; Anthony
> Perard ; Paul Durrant ; 
> xen-de...@lists.xenproject.org
> Subject: [PATCH 1/2] Fix undefined behaviour
> 
> Signed-off-by: Grzegorz Uriasz 

I think we need more of a commit comment for both this and patch #2 to explain 
why you are making the changes.

  Paul

Re: [PATCH v2 5/6] block/block-copy: move block_copy_task_create down

2020-04-28 Thread Vladimir Sementsov-Ogievskiy


28.04.2020 12:06, Max Reitz wrote:

On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:

Simple movement without any change. It's needed for the following
patch, as this function will need to use some staff which is currently


*stuff


below it.


Wouldn’t it be simpler to just declare block_copy_task_entry()?



I just think, that it's good to keep native order of functions and avoid extra 
declarations. Still, may be I care too much. No actual difference, if you 
prefer declaration, I can drop this patch.




Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/block-copy.c | 66 +++---
  1 file changed, 33 insertions(+), 33 deletions(-)





--
Best regards,
Vladimir

Re: [PATCH 3/3] virtio-net: remove VIRTIO_NET_HDR_F_RSC_INFO compat handling

2020-04-28 Thread Cornelia Huck

On Tue, 28 Apr 2020 16:58:44 +0800
Jason Wang  wrote:

> On 2020/4/28 下午4:34, Cornelia Huck wrote:
> > On Tue, 28 Apr 2020 16:19:15 +0800
> > Jason Wang  wrote:
> >  
> >> On 2020/4/27 下午6:24, Cornelia Huck wrote:  
> >>> VIRTIO_NET_HDR_F_RSC_INFO is available in the headers now.
> >>>
> >>> Signed-off-by: Cornelia Huck 
> >>> ---
> >>>hw/net/virtio-net.c | 8 
> >>>1 file changed, 8 deletions(-)
> >>>
> >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>> index e85d902588b3..7449570c7123 100644
> >>> --- a/hw/net/virtio-net.c
> >>> +++ b/hw/net/virtio-net.c
> >>> @@ -77,14 +77,6 @@
> >>>   tso/gso/gro 'off'. */
> >>>#define VIRTIO_NET_RSC_DEFAULT_INTERVAL 30
> >>>
> >>> -/* temporary until standard header include it */
> >>> -#if !defined(VIRTIO_NET_HDR_F_RSC_INFO)
> >>> -
> >>> -#define VIRTIO_NET_HDR_F_RSC_INFO  4 /* rsc_ext data in csum_ fields */
> >>> -#define VIRTIO_NET_F_RSC_EXT   61
> >>> -
> >>> -#endif
> >>> -
> >>>static inline __virtio16 *virtio_net_rsc_ext_num_packets(
> >>>struct virtio_net_hdr *hdr)
> >>>{  
> >>
> >> I think we should not keep the those tricky num_packets/dup_acks.  
> > No real opinion here, patch 3 is only a cleanup.
> >
> > The important one is patch 1, because without it I cannot do a headers
> > update.  
> 
> 
> Yes, at least we should dereference segments/dup_acks instead of 
> csum_start/csum_offsets since the header has been synced.

So what about:

- I merge patch 1 and the header sync now (because I have a bunch of
  patches that depend on it...)
- We change virtio-net to handle that properly on top (probably best
  done by someone familiar with the code base ;)

Re: [PATCH 1/2] Fix undefined behaviour

2020-04-28 Thread Peter Maydell

On Tue, 28 Apr 2020 at 08:50, Grzegorz Uriasz  wrote:
>
> Signed-off-by: Grzegorz Uriasz 
> ---
>  hw/xen/xen_pt_load_rom.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)

The subject doesn't match the patch contents and there is
no long-form part of the commit message explaining why...

thanks
-- PMM

Re: [PATCH v2] char-socket: initialize reconnect timer only when the timer doesn't start

2020-04-28 Thread Li Feng

Sorry for sending the weird same mail out.
Ignore the second one.

Hi Lureau,

I found another issue when running my new test:  tests/test-char -p
/char/socket/client/reconnect-error/unix
The backtrace like this:
#0  0x75ac3277 in raise () from /lib64/libc.so.6
#1  0x75ac4968 in abort () from /lib64/libc.so.6
#2  0x555aaa34 in error_handle_fatal (errp=,
err=0x7fffec0012d0) at util/error.c:40
#3  0x555aab0d in error_setv (errp=0x55802a08
, src=0x555c4220 "io/channel.c", line=148,
func=0x555c4520 <__func__.17450> "qio_channel_readv_all",
err_class=ERROR_CLASS_GENERIC_ERROR, fmt=,
ap=0x7423bb10, suffix=0x0) at util/error.c:73
#4  0x555aac90 in error_setg_internal
(errp=errp@entry=0x55802a08 ,
src=src@entry=0x555c4220 "io/channel.c", line=line@entry=148,
func=func@entry=0x555c4520 <__func__.17450>
"qio_channel_readv_all", fmt=fmt@entry=0x555c4340 "Unexpected
end-of-file before all bytes were read") at util/error.c:97
#5  0x5556c1fc in qio_channel_readv_all (ioc=,
iov=, niov=, errp=0x55802a08
) at io/channel.c:147
#6  0x5556c23a in qio_channel_read_all (ioc=,
buf=, buflen=, errp=) at
io/channel.c:247
#7  0x5556ad0e in char_socket_ping_pong (ioc=0x7fffec0008c0)
at tests/test-char.c:727
#8  0x5556adcf in char_socket_client_server_thread
(data=data@entry=0x5582e350) at tests/test-char.c:881
#9  0x555a9556 in qemu_thread_start (args=) at
util/qemu-thread-posix.c:519
#10 0x75e61e25 in start_thread () from /lib64/libpthread.so.0
#11 0x75b8bbad in clone () from /lib64/libc.so.6

I think this is a new issue of qemu, not my test issue.
How do you think?

Thanks,

Feng Li

Li Feng  于2020年4月28日周二 下午4:53写道：

>
> When the disconnect event is triggered in the connecting stage,
> the tcp_chr_disconnect_locked may be called twice.
>
> The first call:
> #0  qemu_chr_socket_restart_timer (chr=0x5582ee90) at 
> chardev/char-socket.c:120
> #1  0x5558e38c in tcp_chr_disconnect_locked (chr=) 
> at chardev/char-socket.c:490
> #2  0x5558e3cd in tcp_chr_disconnect (chr=0x5582ee90) at 
> chardev/char-socket.c:497
> #3  0x5558ea32 in tcp_chr_new_client 
> (chr=chr@entry=0x5582ee90, sioc=sioc@entry=0x5582f0b0) at 
> chardev/char-socket.c:892
> #4  0x5558eeb8 in qemu_chr_socket_connected (task=0x5582f300, 
> opaque=) at chardev/char-socket.c:1090
> #5  0x55574352 in qio_task_complete 
> (task=task@entry=0x5582f300) at io/task.c:196
> #6  0x555745f4 in qio_task_thread_result (opaque=0x5582f300) 
> at io/task.c:111
> #7  qio_task_wait_thread (task=0x5582f300) at io/task.c:190
> #8  0x5558f17e in tcp_chr_wait_connected (chr=0x5582ee90, 
> errp=0x55802a08 ) at chardev/char-socket.c:1013
> #9  0x55567cbd in char_socket_client_reconnect_test 
> (opaque=0x557fe020 ) at tests/test-char.c:1152
> The second call:
> #0  0x75ac3277 in raise () from /lib64/libc.so.6
> #1  0x75ac4968 in abort () from /lib64/libc.so.6
> #2  0x75abc096 in __assert_fail_base () from /lib64/libc.so.6
> #3  0x75abc142 in __assert_fail () from /lib64/libc.so.6
> #4  0x5558d10a in qemu_chr_socket_restart_timer 
> (chr=0x5582ee90) at chardev/char-socket.c:125
> #5  0x5558df0c in tcp_chr_disconnect_locked (chr=) 
> at chardev/char-socket.c:490
> #6  0x5558df4d in tcp_chr_disconnect (chr=0x5582ee90) at 
> chardev/char-socket.c:497
> #7  0x5558e5b2 in tcp_chr_new_client 
> (chr=chr@entry=0x5582ee90, sioc=sioc@entry=0x5582f0b0) at 
> chardev/char-socket.c:892
> #8  0x5558e93a in tcp_chr_connect_client_sync 
> (chr=chr@entry=0x5582ee90, errp=errp@entry=0x7fffd178) at 
> chardev/char-socket.c:944
> #9  0x5558ec78 in tcp_chr_wait_connected (chr=0x5582ee90, 
> errp=0x55802a08 ) at chardev/char-socket.c:1035
> #10 0x5556804b in char_socket_client_test (opaque=0x557fe020 
> ) at tests/test-char.c:1023
>
> Run test/test-char to reproduce this issue.
>
> test-char: chardev/char-socket.c:125: qemu_chr_socket_restart_timer: 
> Assertion `!s->reconnect_timer' failed.
>
> Signed-off-by: Li Feng 
> ---
> v2:
> - Rewrite the solution.
> - Add test to reproduce this issue.
>
>  chardev/char-socket.c |  2 +-
>  tests/test-char.c | 48 ++--
>  2 files changed, 39 insertions(+), 11 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 1f14c2c7c8..d84330b3c9 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -486,7 +486,7 @@ static void tcp_chr_disconnect_locked(Chardev *chr)
>  if (emit_close) {
>  qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
>  }
> -if (s->reconnect_time) {
> +if (s->reconnect_time && !s->reconnect_timer) {
>

Re: [PATCH v2 5/6] block/block-copy: move block_copy_task_create down

2020-04-28 Thread Max Reitz

On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:
> Simple movement without any change. It's needed for the following
> patch, as this function will need to use some staff which is currently

*stuff

> below it.

Wouldn’t it be simpler to just declare block_copy_task_entry()?

Max

> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/block-copy.c | 66 +++---
>  1 file changed, 33 insertions(+), 33 deletions(-)



signature.asc
Description: OpenPGP digital signature

[PATCH v2] char-socket: initialize reconnect timer only when the timer doesn't start

2020-04-28 Thread Li Feng

When the disconnect event is triggered in the connecting stage,
the tcp_chr_disconnect_locked may be called twice.

The first call:
#0  qemu_chr_socket_restart_timer (chr=0x5582ee90) at 
chardev/char-socket.c:120
#1  0x5558e38c in tcp_chr_disconnect_locked (chr=) 
at chardev/char-socket.c:490
#2  0x5558e3cd in tcp_chr_disconnect (chr=0x5582ee90) at 
chardev/char-socket.c:497
#3  0x5558ea32 in tcp_chr_new_client (chr=chr@entry=0x5582ee90, 
sioc=sioc@entry=0x5582f0b0) at chardev/char-socket.c:892
#4  0x5558eeb8 in qemu_chr_socket_connected (task=0x5582f300, 
opaque=) at chardev/char-socket.c:1090
#5  0x55574352 in qio_task_complete 
(task=task@entry=0x5582f300) at io/task.c:196
#6  0x555745f4 in qio_task_thread_result (opaque=0x5582f300) at 
io/task.c:111
#7  qio_task_wait_thread (task=0x5582f300) at io/task.c:190
#8  0x5558f17e in tcp_chr_wait_connected (chr=0x5582ee90, 
errp=0x55802a08 ) at chardev/char-socket.c:1013
#9  0x55567cbd in char_socket_client_reconnect_test 
(opaque=0x557fe020 ) at tests/test-char.c:1152
The second call:
#0  0x75ac3277 in raise () from /lib64/libc.so.6
#1  0x75ac4968 in abort () from /lib64/libc.so.6
#2  0x75abc096 in __assert_fail_base () from /lib64/libc.so.6
#3  0x75abc142 in __assert_fail () from /lib64/libc.so.6
#4  0x5558d10a in qemu_chr_socket_restart_timer 
(chr=0x5582ee90) at chardev/char-socket.c:125
#5  0x5558df0c in tcp_chr_disconnect_locked (chr=) 
at chardev/char-socket.c:490
#6  0x5558df4d in tcp_chr_disconnect (chr=0x5582ee90) at 
chardev/char-socket.c:497
#7  0x5558e5b2 in tcp_chr_new_client (chr=chr@entry=0x5582ee90, 
sioc=sioc@entry=0x5582f0b0) at chardev/char-socket.c:892
#8  0x5558e93a in tcp_chr_connect_client_sync 
(chr=chr@entry=0x5582ee90, errp=errp@entry=0x7fffd178) at 
chardev/char-socket.c:944
#9  0x5558ec78 in tcp_chr_wait_connected (chr=0x5582ee90, 
errp=0x55802a08 ) at chardev/char-socket.c:1035
#10 0x5556804b in char_socket_client_test (opaque=0x557fe020 
) at tests/test-char.c:1023

Run test/test-char to reproduce this issue.

test-char: chardev/char-socket.c:125: qemu_chr_socket_restart_timer: Assertion 
`!s->reconnect_timer' failed.

Signed-off-by: Li Feng 
---
v2:
- Rewrite the solution.
- Add test to reproduce this issue.

 chardev/char-socket.c |  2 +-
 tests/test-char.c | 48 ++--
 2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 1f14c2c7c8..d84330b3c9 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -486,7 +486,7 @@ static void tcp_chr_disconnect_locked(Chardev *chr)
 if (emit_close) {
 qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
-if (s->reconnect_time) {
+if (s->reconnect_time && !s->reconnect_timer) {
 qemu_chr_socket_restart_timer(chr);
 }
 }
diff --git a/tests/test-char.c b/tests/test-char.c
index 8d39bdc9fa..13dbbfe2a3 100644
--- a/tests/test-char.c
+++ b/tests/test-char.c
@@ -625,12 +625,14 @@ static void char_udp_test(void)
 typedef struct {
 int event;
 bool got_pong;
+CharBackend *be;
 } CharSocketTestData;
 
 
 #define SOCKET_PING "Hello"
 #define SOCKET_PONG "World"
 
+typedef void (*char_socket_cb)(void *opaque, QEMUChrEvent event);
 
 static void
 char_socket_event(void *opaque, QEMUChrEvent event)
@@ -639,6 +641,23 @@ char_socket_event(void *opaque, QEMUChrEvent event)
 data->event = event;
 }
 
+static void
+char_socket_event_with_error(void *opaque, QEMUChrEvent event)
+{
+CharSocketTestData *data = opaque;
+CharBackend *be = data->be;
+data->event = event;
+switch (event) {
+case CHR_EVENT_OPENED:
+qemu_chr_fe_disconnect(be);
+return;
+case CHR_EVENT_CLOSED:
+return;
+default:
+return;
+}
+}
+
 
 static void
 char_socket_read(void *opaque, const uint8_t *buf, int size)
@@ -783,6 +802,7 @@ static void char_socket_server_test(gconstpointer opaque)
 
  reconnect:
 data.event = -1;
+data.be = &be;
 qemu_chr_fe_set_handlers(&be, NULL, NULL,
  char_socket_event, NULL,
  &data, NULL, true);
@@ -869,6 +889,7 @@ typedef struct {
 const char *reconnect;
 bool wait_connected;
 bool fd_pass;
+char_socket_cb event_cb;
 } CharSocketClientTestConfig;
 
 static void char_socket_client_dupid_test(gconstpointer opaque)
@@ -920,6 +941,7 @@ static void char_socket_client_dupid_test(gconstpointer 
opaque)
 static void char_socket_client_test(gconstpointer opaque)
 {
 const CharSocketClientTestConfig *config = opaque;
+const char_socket_cb event_cb = config->event_cb;
 QIOChannelSocket *ioc;

Re: [PATCH 3/3] virtio-net: remove VIRTIO_NET_HDR_F_RSC_INFO compat handling

2020-04-28 Thread Jason Wang




On 2020/4/28 下午4:34, Cornelia Huck wrote:

On Tue, 28 Apr 2020 16:19:15 +0800
Jason Wang  wrote:


On 2020/4/27 下午6:24, Cornelia Huck wrote:

VIRTIO_NET_HDR_F_RSC_INFO is available in the headers now.

Signed-off-by: Cornelia Huck 
---
   hw/net/virtio-net.c | 8 
   1 file changed, 8 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e85d902588b3..7449570c7123 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -77,14 +77,6 @@
  tso/gso/gro 'off'. */
   #define VIRTIO_NET_RSC_DEFAULT_INTERVAL 30
   
-/* temporary until standard header include it */

-#if !defined(VIRTIO_NET_HDR_F_RSC_INFO)
-
-#define VIRTIO_NET_HDR_F_RSC_INFO  4 /* rsc_ext data in csum_ fields */
-#define VIRTIO_NET_F_RSC_EXT   61
-
-#endif
-
   static inline __virtio16 *virtio_net_rsc_ext_num_packets(
   struct virtio_net_hdr *hdr)
   {


I think we should not keep the those tricky num_packets/dup_acks.

No real opinion here, patch 3 is only a cleanup.

The important one is patch 1, because without it I cannot do a headers
update.



Yes, at least we should dereference segments/dup_acks instead of 
csum_start/csum_offsets since the header has been synced.


Thanks

Re: [PATCH v2 4/6] block/block-copy: move task size initial calculation to _task_create

2020-04-28 Thread Vladimir Sementsov-Ogievskiy


28.04.2020 11:52, Max Reitz wrote:

On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:

Comment "Called only on full-dirty region" without corresponding
assertion is a very unsafe thing.


Not sure whether it’s that unsafe for a static function with a single
caller, but, well.


Adding assertion means call
bdrv_dirty_bitmap_next_zero twice. Instead, let's move
bdrv_dirty_bitmap_next_zero call to block_copy_task_create. It also
allows to drop cur_bytes variable which partly duplicate task->bytes.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/block-copy.c | 47 --
  1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 63d8468b27..dd406eb4bb 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -106,12 +106,23 @@ static bool coroutine_fn 
block_copy_wait_one(BlockCopyState *s, int64_t offset,
  return true;
  }
  
-/* Called only on full-dirty region */

  static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
   int64_t offset, int64_t bytes)


A bit of documentation on the new interface might be nice.  For one
thing, that @offset must be dirty, although there is an assertion that,
well, checks it.  (An assertion doesn’t really check anything, it rather
verifies a contract.  And violation is fatal.)


Still, good to document that.



For another, what the range [offset, offset + bytes) is; namely
basically the whole range of data that we might potentially copy, only
that the head must be dirty, but the tail may be clean.


Right



Which makes me think that the interface is maybe less than intuitive.
It would make more sense if we could just call this function on the
whole region and it would figure out whether @offset is dirty by itself
(and return NULL if it isn’t).


Hmm. Actually, I didn't touch the very inefficient "if 
(!bdrv_dirty_bitmap_get(s->copy_bitmap, offset)) { continue }" construct, because I 
was waiting for my series about refactoring bdrv_dirty_bitmap_next_dirty_area to merge in. 
But now it is merged, and I should refactor this thing. And may be you are right, that it 
may be done inside block_copy_task_create.



OTOH I suppose the interface how it is here is more useful for
task-ification.  But maybe that should be documented.


On the first glance, it should not really matter.

OK, I'll try to improve it somehow for v3

--
Best regards,
Vladimir

[PATCH v2] char-socket: initialize reconnect timer only when the timer doesn't start

2020-04-28 Thread Li Feng

When the disconnect event is triggered in the connecting stage,
the tcp_chr_disconnect_locked may be called twice.

The first call:
#0  qemu_chr_socket_restart_timer (chr=0x5582ee90) at 
chardev/char-socket.c:120
#1  0x5558e38c in tcp_chr_disconnect_locked (chr=) 
at chardev/char-socket.c:490
#2  0x5558e3cd in tcp_chr_disconnect (chr=0x5582ee90) at 
chardev/char-socket.c:497
#3  0x5558ea32 in tcp_chr_new_client (chr=chr@entry=0x5582ee90, 
sioc=sioc@entry=0x5582f0b0) at chardev/char-socket.c:892
#4  0x5558eeb8 in qemu_chr_socket_connected (task=0x5582f300, 
opaque=) at chardev/char-socket.c:1090
#5  0x55574352 in qio_task_complete 
(task=task@entry=0x5582f300) at io/task.c:196
#6  0x555745f4 in qio_task_thread_result (opaque=0x5582f300) at 
io/task.c:111
#7  qio_task_wait_thread (task=0x5582f300) at io/task.c:190
#8  0x5558f17e in tcp_chr_wait_connected (chr=0x5582ee90, 
errp=0x55802a08 ) at chardev/char-socket.c:1013
#9  0x55567cbd in char_socket_client_reconnect_test 
(opaque=0x557fe020 ) at tests/test-char.c:1152
The second call:
#0  0x75ac3277 in raise () from /lib64/libc.so.6
#1  0x75ac4968 in abort () from /lib64/libc.so.6
#2  0x75abc096 in __assert_fail_base () from /lib64/libc.so.6
#3  0x75abc142 in __assert_fail () from /lib64/libc.so.6
#4  0x5558d10a in qemu_chr_socket_restart_timer 
(chr=0x5582ee90) at chardev/char-socket.c:125
#5  0x5558df0c in tcp_chr_disconnect_locked (chr=) 
at chardev/char-socket.c:490
#6  0x5558df4d in tcp_chr_disconnect (chr=0x5582ee90) at 
chardev/char-socket.c:497
#7  0x5558e5b2 in tcp_chr_new_client (chr=chr@entry=0x5582ee90, 
sioc=sioc@entry=0x5582f0b0) at chardev/char-socket.c:892
#8  0x5558e93a in tcp_chr_connect_client_sync 
(chr=chr@entry=0x5582ee90, errp=errp@entry=0x7fffd178) at 
chardev/char-socket.c:944
#9  0x5558ec78 in tcp_chr_wait_connected (chr=0x5582ee90, 
errp=0x55802a08 ) at chardev/char-socket.c:1035
#10 0x5556804b in char_socket_client_test (opaque=0x557fe020 
) at tests/test-char.c:1023

Run test/test-char to reproduce this issue.

test-char: chardev/char-socket.c:125: qemu_chr_socket_restart_timer: Assertion 
`!s->reconnect_timer' failed.

Signed-off-by: Li Feng 
---
v2:
- Rewrite the solution.
- Add test to reproduce this issue.

 chardev/char-socket.c |  2 +-
 tests/test-char.c | 48 ++--
 2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 1f14c2c7c8..d84330b3c9 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -486,7 +486,7 @@ static void tcp_chr_disconnect_locked(Chardev *chr)
 if (emit_close) {
 qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
-if (s->reconnect_time) {
+if (s->reconnect_time && !s->reconnect_timer) {
 qemu_chr_socket_restart_timer(chr);
 }
 }
diff --git a/tests/test-char.c b/tests/test-char.c
index 8d39bdc9fa..13dbbfe2a3 100644
--- a/tests/test-char.c
+++ b/tests/test-char.c
@@ -625,12 +625,14 @@ static void char_udp_test(void)
 typedef struct {
 int event;
 bool got_pong;
+CharBackend *be;
 } CharSocketTestData;
 
 
 #define SOCKET_PING "Hello"
 #define SOCKET_PONG "World"
 
+typedef void (*char_socket_cb)(void *opaque, QEMUChrEvent event);
 
 static void
 char_socket_event(void *opaque, QEMUChrEvent event)
@@ -639,6 +641,23 @@ char_socket_event(void *opaque, QEMUChrEvent event)
 data->event = event;
 }
 
+static void
+char_socket_event_with_error(void *opaque, QEMUChrEvent event)
+{
+CharSocketTestData *data = opaque;
+CharBackend *be = data->be;
+data->event = event;
+switch (event) {
+case CHR_EVENT_OPENED:
+qemu_chr_fe_disconnect(be);
+return;
+case CHR_EVENT_CLOSED:
+return;
+default:
+return;
+}
+}
+
 
 static void
 char_socket_read(void *opaque, const uint8_t *buf, int size)
@@ -783,6 +802,7 @@ static void char_socket_server_test(gconstpointer opaque)
 
  reconnect:
 data.event = -1;
+data.be = &be;
 qemu_chr_fe_set_handlers(&be, NULL, NULL,
  char_socket_event, NULL,
  &data, NULL, true);
@@ -869,6 +889,7 @@ typedef struct {
 const char *reconnect;
 bool wait_connected;
 bool fd_pass;
+char_socket_cb event_cb;
 } CharSocketClientTestConfig;
 
 static void char_socket_client_dupid_test(gconstpointer opaque)
@@ -920,6 +941,7 @@ static void char_socket_client_dupid_test(gconstpointer 
opaque)
 static void char_socket_client_test(gconstpointer opaque)
 {
 const CharSocketClientTestConfig *config = opaque;
+const char_socket_cb event_cb = config->event_cb;
 QIOChannelSocket *ioc;

Re: [PATCH v2 4/6] block/block-copy: move task size initial calculation to _task_create

2020-04-28 Thread Max Reitz

On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:
> Comment "Called only on full-dirty region" without corresponding
> assertion is a very unsafe thing.

Not sure whether it’s that unsafe for a static function with a single
caller, but, well.

> Adding assertion means call
> bdrv_dirty_bitmap_next_zero twice. Instead, let's move
> bdrv_dirty_bitmap_next_zero call to block_copy_task_create. It also
> allows to drop cur_bytes variable which partly duplicate task->bytes.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/block-copy.c | 47 --
>  1 file changed, 25 insertions(+), 22 deletions(-)
> 
> diff --git a/block/block-copy.c b/block/block-copy.c
> index 63d8468b27..dd406eb4bb 100644
> --- a/block/block-copy.c
> +++ b/block/block-copy.c
> @@ -106,12 +106,23 @@ static bool coroutine_fn 
> block_copy_wait_one(BlockCopyState *s, int64_t offset,
>  return true;
>  }
>  
> -/* Called only on full-dirty region */
>  static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>   int64_t offset, int64_t bytes)

A bit of documentation on the new interface might be nice.  For one
thing, that @offset must be dirty, although there is an assertion that,
well, checks it.  (An assertion doesn’t really check anything, it rather
verifies a contract.  And violation is fatal.)

For another, what the range [offset, offset + bytes) is; namely
basically the whole range of data that we might potentially copy, only
that the head must be dirty, but the tail may be clean.

Which makes me think that the interface is maybe less than intuitive.
It would make more sense if we could just call this function on the
whole region and it would figure out whether @offset is dirty by itself
(and return NULL if it isn’t).

OTOH I suppose the interface how it is here is more useful for
task-ification.  But maybe that should be documented.

>  {
> +int64_t next_zero;
>  BlockCopyTask *task = g_new(BlockCopyTask, 1);
>  
> +assert(bdrv_dirty_bitmap_get(s->copy_bitmap, offset));
> +
> +bytes = MIN(bytes, s->copy_size);
> +next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, offset, bytes);
> +if (next_zero >= 0) {
> +assert(next_zero > offset); /* offset is dirty */
> +assert(next_zero < offset + bytes); /* no need to do MIN() */
> +bytes = next_zero - offset;
> +}
> +
> +/* region is dirty, so no existent tasks possible in it */

s/existent/existing/?

(The code movement and how you replaced cur_bytes by task->bytes looks
good.)

Max

>  assert(!find_conflicting_task(s, offset, bytes));
>  
>  bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);

signature.asc
Description: OpenPGP digital signature

backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy


Hi!

I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and 
is_allocated_above", and returned to all the inconsistencies about block-status. I keep 
in mind Max's series about child-access functions, and Andrey's work about using COR filter 
in block-stream, which depends on Max's series (because, without them COR fitler with file 
child breaks backing chains).. And, it seems that it's better to discuss some questions 
before resending.

First, problems about block-status:

1. We consider ALLOCATED = ZERO | DATA, and documented as follows:

   * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
   * BDRV_BLOCK_ZERO: offset reads as zero
   * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
   * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   *   layer rather than any backing, set by block layer

This actually means, that we should always have BDRV_BLOCK_ALLOCATED for 
formats which doesn't support backing. So, all such format drivers must return 
ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for example, 
iscsi - doesn't.

2. ZERO. The meaning differs a bit for generic block_status and for drivers.. I 
think, we at least should document it like this:

BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at this 
layer and read as ZERO. If generic block_status returns ZERO, it only mean that 
it reads as zero, but the region may be allocated on underlying level.

3. bdi.unallocated_blocks_are_zero

I think it's very bad, that we have formats, that supports backing, but doesn't 
report bdi.unallocated_blocks_are_zero as true. It means that UNALLOCATED 
region reads as zero if we have short backing file, and not as zero if we 
remove this short backing file. I can live with it but this is weird logic. 
These bad drivers are qcow (not qcow2), parallels and vmdk. I hope, they 
actually just forget to set unallocated_blocks_are_zero to true.

Next. But what about drivers which doesn't support backing? As we considered 
above, they should always return ZERO or DATA, as everything is allocated in 
this backing-chain level (last level, of course).. So again 
unallocated_blocks_are_zero is meaningless. So, I think, that driver which 
doesn't support backings, should be fixed to return always ZERO or DATA, than 
we don't need this unallocated_blocks_are_zero at all.

3. Short backing files in allocated_above: we must consider space after EOF as ALLOCATED, if 
short backing file is inside requested backing-chain part, as it produced exactly because of 
this short file (and we never go to backing). (current realization of allocated_above is 
buggy, see my outdated series "[PATCH 0/4] fix & merge block_status_above and 
is_allocated_above")

4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have a 
backing chain of non-backing child.. I just remember that we didn't reach the 
consensus.

5. Filters.. OK we have two functions for them: bdrv_co_block_status_from_file 
and bdrv_co_block_status_from_backing. I think both are wrong:

bdrv_co_block_status_from_file leads to problem [4], when we can report 
UNALLOCATED, which refers not to the current backing chain, but to sub backing 
chain of file child, which is inconsistent with block_status_above and 
is_allocated_above iteration.

bdrv_co_block_status_from_backing is also is not consistent with 
block_status_above iteration.. At least at leads to querying the same node 
twice.

=

So, about filters and backing chains. Keeping (OK, just, trying to keep) all these things in mind, 
I think that it's better to keep backing chains exactly *backing* chains, so that 
"backing" child is the only "not own" child of the node. So, its close to 
current behavior and we don't need child-access functions. Then how filters should work:

Filter with backing child, should always return UNALLOCATED (i.e. no DATA, no 
ZERO), it is honest: everything is on the other level of backing chain.

Filter with file child should always return BDRV_BLOCK_DATA | 
BDRV_BLOCK_RECURSE, to show that:
1. everything is allocated in *this* level of backing chain
2. filter is too lazy to dig in it's file child (and, maybe the whole sub-tree 
of it) and asks generic layer to do it by itself, if it wants zeroes.

Then, of course, if we want some filter to be inside backing chain, it should have not 
"file" child but "backing". For this, we may support in current public filter 
both variants: backing or file, as user prefer. I.e., filter is opened either with file option or 
with backing and operate correspondingly. And newer filters (like backup-top) may support only 
backing variants.

=

So, I propose to complicate a bit code of old file-based filters, to support 
backing child (which may be done on demand, when needed. For example, Virtuozzo 
now need only

Re: [PATCH] virtiofsd: Show submounts

2020-04-28 Thread Dr. David Alan Gilbert

* Max Reitz (mre...@redhat.com) wrote:
> On 27.04.20 19:59, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mre...@redhat.com) wrote:
> >> Currently, setup_mounts() bind-mounts the shared directory without
> >> MS_REC.  This makes all submounts disappear.
> >>
> >> Pass MS_REC so that the guest can see submounts again.
> > 
> > Thanks!
> > 
> >> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> > 
> > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> 
> Well, I bisected it and landed at 3ca8a2b1.  So while the problematic
> line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1.

OK, I'd rather stick with the Fixes: for the patch that was actually
wrong.

> >> Signed-off-by: Max Reitz 
> >> ---
> >>  tools/virtiofsd/passthrough_ll.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/tools/virtiofsd/passthrough_ll.c 
> >> b/tools/virtiofsd/passthrough_ll.c
> >> index 4c35c95b25..9d7f863e66 100644
> >> --- a/tools/virtiofsd/passthrough_ll.c
> >> +++ b/tools/virtiofsd/passthrough_ll.c
> >> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> >>  int oldroot;
> >>  int newroot;
> >>  
> >> -if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> >> +if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> >>  fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, 
> >> source);
> >>  exit(1);
> >>  }
> > 
> > Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> > host?
> 
> Hm.  So first it looks to me from the man page like one shouldn’t give
> MS_SLAVE on the first mount() call but kind of only use it for remounts
> (in the list at the start, “Create a bind mount” is separate from
> “Change the propagation type of an existing mount”, and the man page
> later says “The only other flags that can be specified while changing
> the propagation type are MS_REC (described below) and MS_SILENT (which
> is ignored).”).
> 
> Second, even if I do change the propagation type to MS_SLAVE in a second
> call, mounts done after qemu has been started don’t show up in the guest
> (for me).
> 
> So while it sounds correct, I can’t see it having an effect, actually.

That's unfortunate; but I guess we can debug that separately

> > What's the interaction between this and the MS_REC|MS_SLAVE that we have
> > a few lines above for / ?
> 
> Good question.  It would seem to me that there isn’t any.  That previous
> mount call just sets MS_REC | MS_SLAVE for the whole mount namespace,
> and then we do a new mount here (by default from / to /) that needs its
> own flags.
> 
> (More interesting is perhaps why we have that other mount() call below,
> which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /.
>  I can’t imagine that to have any effect.)

Is that just trying to be careful before the umount2 so it doesn't try
to unmount something useful?

Dave

> Max
> 



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v2 6/6] block/block-copy: use aio-task-pool API

2020-04-28 Thread Max Reitz

On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:
> Run block_copy iterations in parallel in aio tasks.
> 
> Changes:
>   - BlockCopyTask becomes aio task structure. Add zeroes field to pass
> it to block_copy_do_copy
>   - add call state - it's a state of one call of block_copy(), shared
> between parallel tasks. For now used only to keep information about
> first error: is it read or not.
>   - convert block_copy_dirty_clusters to aio-task loop.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/block-copy.c | 104 +++--
>  1 file changed, 91 insertions(+), 13 deletions(-)

Looks good, just some nits:

> diff --git a/block/block-copy.c b/block/block-copy.c
> index 910947cb43..9994598eb7 100644
> --- a/block/block-copy.c
> +++ b/block/block-copy.c

[...]

> @@ -225,6 +237,30 @@ void block_copy_set_progress_meter(BlockCopyState *s, 
> ProgressMeter *pm)
>  s->progress = pm;
>  }
>  
> +/* Takes ownership on @task */

*of

> +static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
> +BlockCopyTask *task)
> +{
> +if (!pool) {
> +int ret = task->task.func(&task->task);
> +
> +g_free(task);
> +return ret;
> +}
> +
> +aio_task_pool_wait_slot(pool);
> +if (aio_task_pool_status(pool) < 0) {
> +co_put_to_shres(task->s->mem, task->bytes);
> +block_copy_task_end(task, -EAGAIN);

Hm, I think -ECANCELED might be better.  Not that it really matters...

> +g_free(task);
> +return aio_task_pool_status(pool);

Here, too.  (Here, there’s also the fact that this task doesn’t really
fail because of the same reason that the other task failed, so we should
have our own error code here.)

> +}
> +
> +aio_task_pool_start_task(pool, &task->task);
> +
> +return 0;
> +}
> +
>  /*
>   * block_copy_do_copy
>   *

[...]

> @@ -519,25 +584,38 @@ static int coroutine_fn 
> block_copy_dirty_clusters(BlockCopyState *s,

[...]

> +out:
> +if (aio) {
> +aio_task_pool_wait_all(aio);
> +if (ret == 0) {
> +ret = aio_task_pool_status(aio);
> +}
> +g_free(aio);

aio_task_pool_free()?

Max



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 5/6] block/block-copy: move block_copy_task_create down

2020-04-28 Thread Max Reitz

On 28.04.20 11:17, Vladimir Sementsov-Ogievskiy wrote:
> 28.04.2020 12:06, Max Reitz wrote:
>> On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:
>>> Simple movement without any change. It's needed for the following
>>> patch, as this function will need to use some staff which is currently
>>
>> *stuff
>>
>>> below it.
>>
>> Wouldn’t it be simpler to just declare block_copy_task_entry()?
>>
> 
> I just think, that it's good to keep native order of functions and avoid
> extra declarations. Still, may be I care too much. No actual difference,
> if you prefer declaration, I can drop this patch.

Personally, the native order doesn’t do me any good (cscope doesn’t
really care where the definition is), and also having functions in order
seems just like a C artifact.

I just prefer declarations because otherwise we end up moving functions
all the time with no real benefit.  Furthermore, moving functions has
the drawback of polluting git blame.

Max

signature.asc
Description: OpenPGP digital signature

Re: [PATCH] virtiofsd: Show submounts

2020-04-28 Thread Max Reitz

On 28.04.20 11:59, Dr. David Alan Gilbert wrote:
> * Max Reitz (mre...@redhat.com) wrote:
>> On 27.04.20 19:59, Dr. David Alan Gilbert wrote:
>>> * Max Reitz (mre...@redhat.com) wrote:
 Currently, setup_mounts() bind-mounts the shared directory without
 MS_REC.  This makes all submounts disappear.

 Pass MS_REC so that the guest can see submounts again.
>>>
>>> Thanks!
>>>
 Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
>>>
>>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
>>
>> Well, I bisected it and landed at 3ca8a2b1.  So while the problematic
>> line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1.
> 
> OK, I'd rather stick with the Fixes: for the patch that was actually
> wrong.

Why not both? :)

 Signed-off-by: Max Reitz 
 ---
  tools/virtiofsd/passthrough_ll.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/tools/virtiofsd/passthrough_ll.c 
 b/tools/virtiofsd/passthrough_ll.c
 index 4c35c95b25..9d7f863e66 100644
 --- a/tools/virtiofsd/passthrough_ll.c
 +++ b/tools/virtiofsd/passthrough_ll.c
 @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
  int oldroot;
  int newroot;
  
 -if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
 +if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
  fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, 
 source);
  exit(1);
  }
>>>
>>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
>>> host?
>>
>> Hm.  So first it looks to me from the man page like one shouldn’t give
>> MS_SLAVE on the first mount() call but kind of only use it for remounts
>> (in the list at the start, “Create a bind mount” is separate from
>> “Change the propagation type of an existing mount”, and the man page
>> later says “The only other flags that can be specified while changing
>> the propagation type are MS_REC (described below) and MS_SILENT (which
>> is ignored).”).
>>
>> Second, even if I do change the propagation type to MS_SLAVE in a second
>> call, mounts done after qemu has been started don’t show up in the guest
>> (for me).
>>
>> So while it sounds correct, I can’t see it having an effect, actually.
> 
> That's unfortunate; but I guess we can debug that separately
> 
>>> What's the interaction between this and the MS_REC|MS_SLAVE that we have
>>> a few lines above for / ?
>>
>> Good question.  It would seem to me that there isn’t any.  That previous
>> mount call just sets MS_REC | MS_SLAVE for the whole mount namespace,
>> and then we do a new mount here (by default from / to /) that needs its
>> own flags.
>>
>> (More interesting is perhaps why we have that other mount() call below,
>> which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /.
>>  I can’t imagine that to have any effect.)
> 
> Is that just trying to be careful before the umount2 so it doesn't try
> to unmount something useful?

Perhaps, but still, it shouldn’t matter.  I rather suspect that
setup_namespaces() and setup_mounts() were developed (or taken from
elsewhere) independently, so they both have to work independently, and
thus they do overlapping stuff.

Max



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v20 3/4] qcow2: add zstd cluster compression

2020-04-28 Thread Max Reitz

On 28.04.20 09:23, Denis Plotnikov wrote:
> 
> 
> On 28.04.2020 09:16, Max Reitz wrote:
>> On 27.04.20 21:26, Denis Plotnikov wrote:
>>>
>>> On 27.04.2020 15:35, Max Reitz wrote:
 On 21.04.20 10:11, Denis Plotnikov wrote:
> zstd significantly reduces cluster compression time.
> It provides better compression performance maintaining
> the same level of the compression ratio in comparison with
> zlib, which, at the moment, is the only compression
> method available.
>
> The performance test results:
> Test compresses and decompresses qemu qcow2 image with just
> installed rhel-7.6 guest.
> Image cluster size: 64K. Image on disk size: 2.2G
>
> The test was conducted with brd disk to reduce the influence
> of disk subsystem to the test results.
> The results is given in seconds.
>
> compress cmd:
>     time ./qemu-img convert -O qcow2 -c -o
> compression_type=[zlib|zstd]
>     src.img [zlib|zstd]_compressed.img
> decompress cmd
>     time ./qemu-img convert -O qcow2
>     [zlib|zstd]_compressed.img uncompressed.img
>
>  compression   decompression
>    zlib   zstd   zlib zstd
> 
> real 65.5   16.3 (-75 %)    1.9  1.6 (-16 %)
> user 65.0   15.8    5.3  2.5
> sys   3.3    0.2    2.0  2.0
>
> Both ZLIB and ZSTD gave the same compression ratio: 1.57
> compressed image size in both cases: 1.4G
>
> Signed-off-by: Denis Plotnikov 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Alberto Garcia 
> QAPI part:
> Acked-by: Markus Armbruster 
> ---
>    docs/interop/qcow2.txt |   1 +
>    configure  |   2 +-
>    qapi/block-core.json   |   3 +-
>    block/qcow2-threads.c  | 157
> +
>    block/qcow2.c  |   7 ++
>    5 files changed, 168 insertions(+), 2 deletions(-)
 [...]

> diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
> index 7dbaf53489..0525718704 100644
> --- a/block/qcow2-threads.c
> +++ b/block/qcow2-threads.c
> @@ -28,6 +28,11 @@
>    #define ZLIB_CONST
>    #include 
>    +#ifdef CONFIG_ZSTD
> +#include 
> +#include 
> +#endif
> +
>    #include "qcow2.h"
>    #include "block/thread-pool.h"
>    #include "crypto.h"
> @@ -166,6 +171,148 @@ static ssize_t qcow2_zlib_decompress(void
> *dest, size_t dest_size,
>    return ret;
>    }
>    +#ifdef CONFIG_ZSTD
> +
> +/*
> + * qcow2_zstd_compress()
> + *
> + * Compress @src_size bytes of data using zstd compression method
> + *
> + * @dest - destination buffer, @dest_size bytes
> + * @src - source buffer, @src_size bytes
> + *
> + * Returns: compressed size on success
> + *  -ENOMEM destination buffer is not enough to store
> compressed data
> + *  -EIO    on any other error
> + */
> +static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
> +   const void *src, size_t src_size)
> +{
> +    ssize_t ret;
> +    ZSTD_outBuffer output = { dest, dest_size, 0 };
> +    ZSTD_inBuffer input = { src, src_size, 0 };
 Minor style note: I think it’d be nicer to use designated initializers
 here.

> +    ZSTD_CCtx *cctx = ZSTD_createCCtx();
> +
> +    if (!cctx) {
> +    return -EIO;
> +    }
> +    /*
> + * Use the zstd streamed interface for symmetry with
> decompression,
> + * where streaming is essential since we don't record the exact
> + * compressed size.
> + *
> + * In the loop, we try to compress all the data into one zstd
> frame.
> + * ZSTD_compressStream2 potentially can finish a frame earlier
> + * than the full input data is consumed. That's why we are
> looping
> + * until all the input data is consumed.
> + */
> +    while (input.pos < input.size) {
> +    size_t zstd_ret;
> +    /*
> + * ZSTD spec: "You must continue calling
> ZSTD_compressStream2()
> + * with ZSTD_e_end until it returns 0, at which point you are
> + * free to start a new frame". We assume that "start a new
> frame"
> + * means call ZSTD_compressStream2 in the very beginning or
> when
> + * ZSTD_compressStream2 has returned with 0.
> + */
> +    do {
> +    zstd_ret = ZSTD_compressStream2(cctx, &output, &input,
> ZSTD_e_end);
 The spec makes it sound to me like ZSTD_e_end will always complete in a
 single call if there’s enough space in the output bu

Re: [PATCH] virtiofsd: Show submounts

2020-04-28 Thread Dr. David Alan Gilbert

* Max Reitz (mre...@redhat.com) wrote:
> On 28.04.20 11:59, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mre...@redhat.com) wrote:
> >> On 27.04.20 19:59, Dr. David Alan Gilbert wrote:
> >>> * Max Reitz (mre...@redhat.com) wrote:
>  Currently, setup_mounts() bind-mounts the shared directory without
>  MS_REC.  This makes all submounts disappear.
> 
>  Pass MS_REC so that the guest can see submounts again.
> >>>
> >>> Thanks!
> >>>
>  Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> >>>
> >>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> >>
> >> Well, I bisected it and landed at 3ca8a2b1.  So while the problematic
> >> line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1.
> > 
> > OK, I'd rather stick with the Fixes: for the patch that was actually
> > wrong.
> 
> Why not both? :)
> 
>  Signed-off-by: Max Reitz 
>  ---
>   tools/virtiofsd/passthrough_ll.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
>  diff --git a/tools/virtiofsd/passthrough_ll.c 
>  b/tools/virtiofsd/passthrough_ll.c
>  index 4c35c95b25..9d7f863e66 100644
>  --- a/tools/virtiofsd/passthrough_ll.c
>  +++ b/tools/virtiofsd/passthrough_ll.c
>  @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
>   int oldroot;
>   int newroot;
>   
>  -if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
>  +if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>   fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, 
>  source);
>   exit(1);
>   }
> >>>
> >>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> >>> host?
> >>
> >> Hm.  So first it looks to me from the man page like one shouldn’t give
> >> MS_SLAVE on the first mount() call but kind of only use it for remounts
> >> (in the list at the start, “Create a bind mount” is separate from
> >> “Change the propagation type of an existing mount”, and the man page
> >> later says “The only other flags that can be specified while changing
> >> the propagation type are MS_REC (described below) and MS_SILENT (which
> >> is ignored).”).
> >>
> >> Second, even if I do change the propagation type to MS_SLAVE in a second
> >> call, mounts done after qemu has been started don’t show up in the guest
> >> (for me).
> >>
> >> So while it sounds correct, I can’t see it having an effect, actually.
> > 
> > That's unfortunate; but I guess we can debug that separately
> > 
> >>> What's the interaction between this and the MS_REC|MS_SLAVE that we have
> >>> a few lines above for / ?
> >>
> >> Good question.  It would seem to me that there isn’t any.  That previous
> >> mount call just sets MS_REC | MS_SLAVE for the whole mount namespace,
> >> and then we do a new mount here (by default from / to /) that needs its
> >> own flags.
> >>
> >> (More interesting is perhaps why we have that other mount() call below,
> >> which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /.
> >>  I can’t imagine that to have any effect.)
> > 
> > Is that just trying to be careful before the umount2 so it doesn't try
> > to unmount something useful?
> 
> Perhaps, but still, it shouldn’t matter.  I rather suspect that
> setup_namespaces() and setup_mounts() were developed (or taken from
> elsewhere) independently, so they both have to work independently, and
> thus they do overlapping stuff.

Yep, agreed.

Dave

> Max
> 



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH V2] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-04-28 Thread Ani Sinha

+julie
+ laine

Rebased patch to latest Qemu master. Only minimal changes.


> On Apr 28, 2020, at 3:46 PM, Ani Sinha  wrote:
> 
> A new option "use_acpi_unplug" is introduced for PIIX which will
> selectively only disable hot unplugging of both hot plugged and
> cold plugged PCI devices on non-root PCI buses. This will prevent
> hot unplugging of devices from Windows based guests from system
> tray but will not prevent devices from being hot plugged into the
> guest.
> 
> It has been tested on Windows guests.
> 
> Signed-off-by: Ani Sinha 
> ---
> hw/acpi/piix4.c  |  3 +++
> hw/i386/acpi-build.c | 40 ++--
> 2 files changed, 29 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 964d6f5..59fa707 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -78,6 +78,7 @@ typedef struct PIIX4PMState {
> 
> AcpiPciHpState acpi_pci_hotplug;
> bool use_acpi_pci_hotplug;
> +bool use_acpi_unplug;
> 
> uint8_t disable_s3;
> uint8_t disable_s4;
> @@ -633,6 +634,8 @@ static Property piix4_pm_properties[] = {
> DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PIIX4PMState, s4_val, 2),
> DEFINE_PROP_BOOL("acpi-pci-hotplug-with-bridge-support", PIIX4PMState,
>  use_acpi_pci_hotplug, true),
> +DEFINE_PROP_BOOL("acpi-pci-hotunplug-enable-bridge", PIIX4PMState,
> + use_acpi_unplug, true),
> DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
>  acpi_memory_hotplug.is_enabled, true),
> DEFINE_PROP_END_OF_LIST(),
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 23c77ee..71b3ac3 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> bool s3_disabled;
> bool s4_disabled;
> bool pcihp_bridge_en;
> +bool pcihup_bridge_en;
> uint8_t s4_val;
> AcpiFadtData fadt;
> uint16_t cpu_hp_io_base;
> @@ -240,6 +241,9 @@ static void acpi_get_pm_info(MachineState *machine, 
> AcpiPmInfo *pm)
> pm->pcihp_bridge_en =
> object_property_get_bool(obj, "acpi-pci-hotplug-with-bridge-support",
>  NULL);
> +pm->pcihup_bridge_en =
> +object_property_get_bool(obj, "acpi-pci-hotunplug-enable-bridge",
> + NULL);
> }
> 
> static void acpi_get_misc_info(AcpiMiscInfo *info)
> @@ -451,7 +455,8 @@ static void build_append_pcihp_notify_entry(Aml *method, 
> int slot)
> }
> 
> static void build_append_pci_bus_devices(Aml *parent_scope, PCIBus *bus,
> - bool pcihp_bridge_en)
> + bool pcihp_bridge_en,
> + bool pcihup_bridge_en)
> {
> Aml *dev, *notify_method = NULL, *method;
> QObject *bsel;
> @@ -479,11 +484,14 @@ static void build_append_pci_bus_devices(Aml 
> *parent_scope, PCIBus *bus,
> dev = aml_device("S%.02X", PCI_DEVFN(slot, 0));
> aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
> aml_append(dev, aml_name_decl("_ADR", aml_int(slot << 16)));
> -method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> -aml_append(method,
> -aml_call2("PCEJ", aml_name("BSEL"), aml_name("_SUN"))
> -);
> -aml_append(dev, method);
> +if (pcihup_bridge_en || pci_bus_is_root(bus)) {
> +method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> +aml_append(method,
> +   aml_call2("PCEJ", aml_name("BSEL"),
> + aml_name("_SUN"))
> +);
> +aml_append(dev, method);
> +}
> aml_append(parent_scope, dev);
> 
> build_append_pcihp_notify_entry(notify_method, slot);
> @@ -537,12 +545,14 @@ static void build_append_pci_bus_devices(Aml 
> *parent_scope, PCIBus *bus,
> /* add _SUN/_EJ0 to make slot hotpluggable  */
> aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
> 
> -method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> -aml_append(method,
> -aml_call2("PCEJ", aml_name("BSEL"), aml_name("_SUN"))
> -);
> -aml_append(dev, method);
> -
> +if (pcihup_bridge_en || pci_bus_is_root(bus)) {
> +method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> +aml_append(method,
> +   aml_call2("PCEJ", aml_name("BSEL"),
> + aml_name("_SUN"))
> +);
> +aml_append(dev, method);
> +}
> if (bsel) {
> build_append_pcihp_notify_entry(notify_method, slot);
> }
> @@ -553,7 +563,8 @@ static void build_append_pci_bus_d

[PATCH V2] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-04-28 Thread Ani Sinha

A new option "use_acpi_unplug" is introduced for PIIX which will
selectively only disable hot unplugging of both hot plugged and
cold plugged PCI devices on non-root PCI buses. This will prevent
hot unplugging of devices from Windows based guests from system
tray but will not prevent devices from being hot plugged into the
guest.

It has been tested on Windows guests.

Signed-off-by: Ani Sinha 
---
 hw/acpi/piix4.c  |  3 +++
 hw/i386/acpi-build.c | 40 ++--
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 964d6f5..59fa707 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -78,6 +78,7 @@ typedef struct PIIX4PMState {
 
 AcpiPciHpState acpi_pci_hotplug;
 bool use_acpi_pci_hotplug;
+bool use_acpi_unplug;
 
 uint8_t disable_s3;
 uint8_t disable_s4;
@@ -633,6 +634,8 @@ static Property piix4_pm_properties[] = {
 DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PIIX4PMState, s4_val, 2),
 DEFINE_PROP_BOOL("acpi-pci-hotplug-with-bridge-support", PIIX4PMState,
  use_acpi_pci_hotplug, true),
+DEFINE_PROP_BOOL("acpi-pci-hotunplug-enable-bridge", PIIX4PMState,
+ use_acpi_unplug, true),
 DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
  acpi_memory_hotplug.is_enabled, true),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 23c77ee..71b3ac3 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
 bool s3_disabled;
 bool s4_disabled;
 bool pcihp_bridge_en;
+bool pcihup_bridge_en;
 uint8_t s4_val;
 AcpiFadtData fadt;
 uint16_t cpu_hp_io_base;
@@ -240,6 +241,9 @@ static void acpi_get_pm_info(MachineState *machine, 
AcpiPmInfo *pm)
 pm->pcihp_bridge_en =
 object_property_get_bool(obj, "acpi-pci-hotplug-with-bridge-support",
  NULL);
+pm->pcihup_bridge_en =
+object_property_get_bool(obj, "acpi-pci-hotunplug-enable-bridge",
+ NULL);
 }
 
 static void acpi_get_misc_info(AcpiMiscInfo *info)
@@ -451,7 +455,8 @@ static void build_append_pcihp_notify_entry(Aml *method, 
int slot)
 }
 
 static void build_append_pci_bus_devices(Aml *parent_scope, PCIBus *bus,
- bool pcihp_bridge_en)
+ bool pcihp_bridge_en,
+ bool pcihup_bridge_en)
 {
 Aml *dev, *notify_method = NULL, *method;
 QObject *bsel;
@@ -479,11 +484,14 @@ static void build_append_pci_bus_devices(Aml 
*parent_scope, PCIBus *bus,
 dev = aml_device("S%.02X", PCI_DEVFN(slot, 0));
 aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
 aml_append(dev, aml_name_decl("_ADR", aml_int(slot << 16)));
-method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
-aml_append(method,
-aml_call2("PCEJ", aml_name("BSEL"), aml_name("_SUN"))
-);
-aml_append(dev, method);
+if (pcihup_bridge_en || pci_bus_is_root(bus)) {
+method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
+aml_append(method,
+   aml_call2("PCEJ", aml_name("BSEL"),
+ aml_name("_SUN"))
+);
+aml_append(dev, method);
+}
 aml_append(parent_scope, dev);
 
 build_append_pcihp_notify_entry(notify_method, slot);
@@ -537,12 +545,14 @@ static void build_append_pci_bus_devices(Aml 
*parent_scope, PCIBus *bus,
 /* add _SUN/_EJ0 to make slot hotpluggable  */
 aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
 
-method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
-aml_append(method,
-aml_call2("PCEJ", aml_name("BSEL"), aml_name("_SUN"))
-);
-aml_append(dev, method);
-
+if (pcihup_bridge_en || pci_bus_is_root(bus)) {
+method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
+aml_append(method,
+   aml_call2("PCEJ", aml_name("BSEL"),
+ aml_name("_SUN"))
+);
+aml_append(dev, method);
+}
 if (bsel) {
 build_append_pcihp_notify_entry(notify_method, slot);
 }
@@ -553,7 +563,8 @@ static void build_append_pci_bus_devices(Aml *parent_scope, 
PCIBus *bus,
  */
 PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(pdev));
 
-build_append_pci_bus_devices(dev, sec_bus, pcihp_bridge_en);
+build_append_pci_bus_devices(dev, sec_bus, pcihp_bridge_en,
+

Re: [PATCH 1/2] tpm: tpm-tis-device: set PPI to false by default

2020-04-28 Thread Cornelia Huck

On Mon, 27 Apr 2020 16:31:44 +0200
Eric Auger  wrote:

> The tpm-tis-device device does not support PPI. Let's
> change the default value for the corresponding property
> instead of tricking this latter in the mach-virt machine.
> 
> Signed-off-by: Eric Auger 
> ---
>  hw/tpm/tpm_tis_sysbus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/tpm/tpm_tis_sysbus.c b/hw/tpm/tpm_tis_sysbus.c
> index 18c02aed67..eced1fc843 100644
> --- a/hw/tpm/tpm_tis_sysbus.c
> +++ b/hw/tpm/tpm_tis_sysbus.c
> @@ -91,7 +91,7 @@ static void tpm_tis_sysbus_reset(DeviceState *dev)
>  static Property tpm_tis_sysbus_properties[] = {
>  DEFINE_PROP_UINT32("irq", TPMStateSysBus, state.irq_num, TPM_TIS_IRQ),
>  DEFINE_PROP_TPMBE("tpmdev", TPMStateSysBus, state.be_driver),
> -DEFINE_PROP_BOOL("ppi", TPMStateSysBus, state.ppi_enabled, true),
> +DEFINE_PROP_BOOL("ppi", TPMStateSysBus, state.ppi_enabled, false),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  

This looks like a better place to do this than in the virt compat
machines, and should get us the same result, leaving compatibility
intact.

Reviewed-by: Cornelia Huck

Re: [PATCH 2/2] hw/arm/virt: Remove the compat forcing tpm-tis-device PPI to off

2020-04-28 Thread Cornelia Huck

On Mon, 27 Apr 2020 16:31:45 +0200
Eric Auger  wrote:

> Now that the tpm-tis-device device PPI property is off by default,
> we can remove the compat used for the same goal.
> 
> Signed-off-by: Eric Auger 
> ---
>  hw/arm/virt.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 7dc96abf72..2a68306f28 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2311,11 +2311,6 @@ type_init(machvirt_machine_init);
>  
>  static void virt_machine_5_0_options(MachineClass *mc)
>  {
> -static GlobalProperty compat[] = {
> -{ TYPE_TPM_TIS_SYSBUS, "ppi", "false" },
> -};
> -
> -compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat));
>  }
>  DEFINE_VIRT_MACHINE_AS_LATEST(5, 0)
>  

Reviewed-by: Cornelia Huck

Re: [PATCH 0/2] virt: Set tpm-tis-device ppi property to off by default

2020-04-28 Thread Cornelia Huck

On Mon, 27 Apr 2020 16:31:43 +0200
Eric Auger  wrote:

> Instead of using a compat in the mach-virt machine to force
> PPI off for all virt machines (PPI not supported by the
> tpm-tis-device device), let's simply change the default value
> in the sysbus device.
> 
> Best Regards
> 
> Eric
> 
> Eric Auger (2):
>   tpm: tpm-tis-device: set PPI to false by default
>   hw/arm/virt: Remove the compat forcing tpm-tis-device PPI to off
> 
>  hw/arm/virt.c   | 5 -
>  hw/tpm/tpm_tis_sysbus.c | 2 +-
>  2 files changed, 1 insertion(+), 6 deletions(-)
> 

I think we can apply the compat machines patch on top of these two
patches.

Q: Who will queue this and the machine types patch? It feels a bit
weird taking arm patches through the s390 tree :)

Re: [PATCH 3/3] virtio-net: remove VIRTIO_NET_HDR_F_RSC_INFO compat handling

2020-04-28 Thread Yuri Benditovich

On Tue, Apr 28, 2020 at 12:18 PM Cornelia Huck  wrote:

> On Tue, 28 Apr 2020 16:58:44 +0800
> Jason Wang  wrote:
>
> > On 2020/4/28 下午4:34, Cornelia Huck wrote:
> > > On Tue, 28 Apr 2020 16:19:15 +0800
> > > Jason Wang  wrote:
> > >
> > >> On 2020/4/27 下午6:24, Cornelia Huck wrote:
> > >>> VIRTIO_NET_HDR_F_RSC_INFO is available in the headers now.
> > >>>
> > >>> Signed-off-by: Cornelia Huck 
> > >>> ---
> > >>>hw/net/virtio-net.c | 8 
> > >>>1 file changed, 8 deletions(-)
> > >>>
> > >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > >>> index e85d902588b3..7449570c7123 100644
> > >>> --- a/hw/net/virtio-net.c
> > >>> +++ b/hw/net/virtio-net.c
> > >>> @@ -77,14 +77,6 @@
> > >>>   tso/gso/gro 'off'. */
> > >>>#define VIRTIO_NET_RSC_DEFAULT_INTERVAL 30
> > >>>
> > >>> -/* temporary until standard header include it */
> > >>> -#if !defined(VIRTIO_NET_HDR_F_RSC_INFO)
> > >>> -
> > >>> -#define VIRTIO_NET_HDR_F_RSC_INFO  4 /* rsc_ext data in csum_
> fields */
> > >>> -#define VIRTIO_NET_F_RSC_EXT   61
> > >>> -
> > >>> -#endif
> > >>> -
> > >>>static inline __virtio16 *virtio_net_rsc_ext_num_packets(
> > >>>struct virtio_net_hdr *hdr)
> > >>>{
> > >>
> > >> I think we should not keep the those tricky num_packets/dup_acks.
> > > No real opinion here, patch 3 is only a cleanup.
> > >
> > > The important one is patch 1, because without it I cannot do a headers
> > > update.
> >
> >
> > Yes, at least we should dereference segments/dup_acks instead of
> > csum_start/csum_offsets since the header has been synced.
>
> So what about:
>
> - I merge patch 1 and the header sync now (because I have a bunch of
>   patches that depend on it...)
> - We change virtio-net to handle that properly on top (probably best
>   done by someone familiar with the code base ;)
>
>
Jason,
This series just solves the conflict caused by the update of Linux headers.
After this series is applied I can submit further patch to use actual RSC
definitions from linux headers.

Thanks,
Yuri

Re: [PATCH 0/2] virt: Set tpm-tis-device ppi property to off by default

2020-04-28 Thread Auger Eric

Hi Stefan,

On 4/27/20 10:21 PM, Stefan Berger wrote:
> On 4/27/20 10:31 AM, Eric Auger wrote:
>> Instead of using a compat in the mach-virt machine to force
>> PPI off for all virt machines (PPI not supported by the
>> tpm-tis-device device), let's simply change the default value
>> in the sysbus device.
> 
> There is no change in behavior on any arm machine due to this patch,
> right? So backporting would not be necessary?

Indeed, there is no functional change as we keep PPI off. My
understanding is we can safely remove the compat mechanism. I tested
migration between virt machine with compat to virt machine without
compat and conversely (without tpm-tis-device though) and it passed.

Thanks

Eric
> 
> 
>    Stefan
> 
>

Re: [PATCH v2] char-socket: initialize reconnect timer only when the timer doesn't start

2020-04-28 Thread Marc-André Lureau

Hi

On Tue, Apr 28, 2020 at 10:59 AM Li Feng  wrote:
>
> Sorry for sending the weird same mail out.
> Ignore the second one.
>
> Hi Lureau,
>
> I found another issue when running my new test:  tests/test-char -p
> /char/socket/client/reconnect-error/unix
> The backtrace like this:
> #0  0x75ac3277 in raise () from /lib64/libc.so.6
> #1  0x75ac4968 in abort () from /lib64/libc.so.6
> #2  0x555aaa34 in error_handle_fatal (errp=,
> err=0x7fffec0012d0) at util/error.c:40
> #3  0x555aab0d in error_setv (errp=0x55802a08
> , src=0x555c4220 "io/channel.c", line=148,
> func=0x555c4520 <__func__.17450> "qio_channel_readv_all",
> err_class=ERROR_CLASS_GENERIC_ERROR, fmt=,
> ap=0x7423bb10, suffix=0x0) at util/error.c:73
> #4  0x555aac90 in error_setg_internal
> (errp=errp@entry=0x55802a08 ,
> src=src@entry=0x555c4220 "io/channel.c", line=line@entry=148,
> func=func@entry=0x555c4520 <__func__.17450>
> "qio_channel_readv_all", fmt=fmt@entry=0x555c4340 "Unexpected
> end-of-file before all bytes were read") at util/error.c:97
> #5  0x5556c1fc in qio_channel_readv_all (ioc=,
> iov=, niov=, errp=0x55802a08
> ) at io/channel.c:147
> #6  0x5556c23a in qio_channel_read_all (ioc=,
> buf=, buflen=, errp=) at
> io/channel.c:247
> #7  0x5556ad0e in char_socket_ping_pong (ioc=0x7fffec0008c0)
> at tests/test-char.c:727
> #8  0x5556adcf in char_socket_client_server_thread
> (data=data@entry=0x5582e350) at tests/test-char.c:881
> #9  0x555a9556 in qemu_thread_start (args=) at
> util/qemu-thread-posix.c:519
> #10 0x75e61e25 in start_thread () from /lib64/libpthread.so.0
> #11 0x75b8bbad in clone () from /lib64/libc.so.6
>
> I think this is a new issue of qemu, not my test issue.
> How do you think?


No idea, it could be a bug in the test itself. How did you produce it?

>
> Thanks,
>
> Feng Li
>
> Li Feng  于2020年4月28日周二 下午4:53写道：
>
> >
> > When the disconnect event is triggered in the connecting stage,
> > the tcp_chr_disconnect_locked may be called twice.
> >
> > The first call:
> > #0  qemu_chr_socket_restart_timer (chr=0x5582ee90) at 
> > chardev/char-socket.c:120
> > #1  0x5558e38c in tcp_chr_disconnect_locked (chr= > out>) at chardev/char-socket.c:490
> > #2  0x5558e3cd in tcp_chr_disconnect (chr=0x5582ee90) at 
> > chardev/char-socket.c:497
> > #3  0x5558ea32 in tcp_chr_new_client 
> > (chr=chr@entry=0x5582ee90, sioc=sioc@entry=0x5582f0b0) at 
> > chardev/char-socket.c:892
> > #4  0x5558eeb8 in qemu_chr_socket_connected 
> > (task=0x5582f300, opaque=) at chardev/char-socket.c:1090
> > #5  0x55574352 in qio_task_complete 
> > (task=task@entry=0x5582f300) at io/task.c:196
> > #6  0x555745f4 in qio_task_thread_result 
> > (opaque=0x5582f300) at io/task.c:111
> > #7  qio_task_wait_thread (task=0x5582f300) at io/task.c:190
> > #8  0x5558f17e in tcp_chr_wait_connected (chr=0x5582ee90, 
> > errp=0x55802a08 ) at chardev/char-socket.c:1013
> > #9  0x55567cbd in char_socket_client_reconnect_test 
> > (opaque=0x557fe020 ) at tests/test-char.c:1152
> > The second call:
> > #0  0x75ac3277 in raise () from /lib64/libc.so.6
> > #1  0x75ac4968 in abort () from /lib64/libc.so.6
> > #2  0x75abc096 in __assert_fail_base () from /lib64/libc.so.6
> > #3  0x75abc142 in __assert_fail () from /lib64/libc.so.6
> > #4  0x5558d10a in qemu_chr_socket_restart_timer 
> > (chr=0x5582ee90) at chardev/char-socket.c:125
> > #5  0x5558df0c in tcp_chr_disconnect_locked (chr= > out>) at chardev/char-socket.c:490
> > #6  0x5558df4d in tcp_chr_disconnect (chr=0x5582ee90) at 
> > chardev/char-socket.c:497
> > #7  0x5558e5b2 in tcp_chr_new_client 
> > (chr=chr@entry=0x5582ee90, sioc=sioc@entry=0x5582f0b0) at 
> > chardev/char-socket.c:892
> > #8  0x5558e93a in tcp_chr_connect_client_sync 
> > (chr=chr@entry=0x5582ee90, errp=errp@entry=0x7fffd178) at 
> > chardev/char-socket.c:944
> > #9  0x5558ec78 in tcp_chr_wait_connected (chr=0x5582ee90, 
> > errp=0x55802a08 ) at chardev/char-socket.c:1035
> > #10 0x5556804b in char_socket_client_test 
> > (opaque=0x557fe020 ) at tests/test-char.c:1023
> >
> > Run test/test-char to reproduce this issue.
> >
> > test-char: chardev/char-socket.c:125: qemu_chr_socket_restart_timer: 
> > Assertion `!s->reconnect_timer' failed.
> >
> > Signed-off-by: Li Feng 
> > ---
> > v2:
> > - Rewrite the solution.
> > - Add test to reproduce this issue.
> >
> >  chardev/char-socket.c |  2 +-
> >  tests/test-char.c | 48 ++--
> >  2 files changed, 39 insertions(+), 11 deletions(-)
> >
> > diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> > in

Re: backing chain & block status & filters

2020-04-28 Thread Max Reitz

On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
> is_allocated_above", and returned to all the inconsistencies about
> block-status. I keep in mind Max's series about child-access functions,
> and Andrey's work about using COR filter in block-stream, which depends
> on Max's series (because, without them COR fitler with file child breaks
> backing chains).. And, it seems that it's better to discuss some
> questions before resending.
> 
> First, problems about block-status:
> 
> 1. We consider ALLOCATED = ZERO | DATA, and documented as follows:
> 
>    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
>    * BDRV_BLOCK_ZERO: offset reads as zero
>    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
> raw data
>    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>    *   layer rather than any backing, set by block
> layer
> 
> This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
> formats which doesn't support backing. So, all such format drivers must
> return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
> example, iscsi - doesn't.

Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.

> 2. ZERO. The meaning differs a bit for generic block_status and for
> drivers.. I think, we at least should document it like this:
> 
> BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at
> this layer and read as ZERO. If generic block_status returns ZERO, it
> only mean that it reads as zero, but the region may be allocated on
> underlying level.

Hm.  What does that mean?

One of the problems is that “allocated” has two meanings:
(1) reading data returns data defined at this backing layer,
(2) actually allocated, i.e. takes up space on the file represented by
this BDS.

As far as I understand, we actually don’t care about (2) in the context
of block_status, but just about (1).

So if a layer returns ZERO, it is by definition (1)-allocated.  (It
isn’t necessarily (2)-allocated.)

> 3. bdi.unallocated_blocks_are_zero
> 
> I think it's very bad, that we have formats, that supports backing, but
> doesn't report bdi.unallocated_blocks_are_zero as true. It means that
> UNALLOCATED region reads as zero if we have short backing file, and not
> as zero if we remove this short backing file.

What do you mean by “remove this short backing file”?  Because generally
one can’t just drop a backing file.

So maybe a case like block-stream?  Wouldn’t that be a bug in
block-stream them, i.e. shouldn’t it stream zeros after the end of the
backing file?

> I can live with it but
> this is weird logic. These bad drivers are qcow (not qcow2), parallels
> and vmdk. I hope, they actually just forget to set
> unallocated_blocks_are_zero to true.

qcow definitely sounds like it.

> Next. But what about drivers which doesn't support backing? As we
> considered above, they should always return ZERO or DATA, as everything
> is allocated in this backing-chain level (last level, of course).. So
> again unallocated_blocks_are_zero is meaningless. So, I think, that
> driver which doesn't support backings, should be fixed to return always
> ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all.

Agreed.

> 3.

The second 3.? :)

> Short backing files in allocated_above: we must consider space after
> EOF as ALLOCATED, if short backing file is inside requested
> backing-chain part, as it produced exactly because of this short file
> (and we never go to backing).

Sounds correct.

> (current realization of allocated_above is
> buggy, see my outdated series "[PATCH 0/4] fix & merge
> block_status_above and is_allocated_above")
> 
> 4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have
> a backing chain of non-backing child.. I just remember that we didn't
> reach the consensus.

Possible? :)

> 5. Filters.. OK we have two functions for them:
> bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I
> think both are wrong:
> 
> bdrv_co_block_status_from_file leads to problem [4], when we can report
> UNALLOCATED, which refers not to the current backing chain, but to sub
> backing chain of file child, which is inconsistent with
> block_status_above and is_allocated_above iteration.
> 
> bdrv_co_block_status_from_backing is also is not consistent with
> block_status_ab

Re: backing chain & block status & filters

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 13:08 hat Max Reitz geschrieben:
> On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:
> > Hi!
> > 
> > I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
> > is_allocated_above", and returned to all the inconsistencies about
> > block-status. I keep in mind Max's series about child-access functions,
> > and Andrey's work about using COR filter in block-stream, which depends
> > on Max's series (because, without them COR fitler with file child breaks
> > backing chains).. And, it seems that it's better to discuss some
> > questions before resending.
> > 
> > First, problems about block-status:
> > 
> > 1. We consider ALLOCATED = ZERO | DATA, and documented as follows:
> > 
> >    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> >    * BDRV_BLOCK_ZERO: offset reads as zero
> >    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
> > raw data
> >    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
> >    *   layer rather than any backing, set by block
> > layer
> > 
> > This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
> > formats which doesn't support backing. So, all such format drivers must
> > return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
> > example, iscsi - doesn't.
> 
> Hm.  I could imagine that there are formats that have non-zero holes
> (e.g. 0xff or just garbage).  It would be a bit wrong for them to return
> ZERO or DATA then.
> 
> But OTOH we don’t care about such cases, do we?  We need to know whether
> ranges are zero, data, or unallocated.  If they aren’t zero, we only
> care about whether reading from it will return data from this layer or not.
> 
> So I suppose that anything that doesn’t support backing files (or
> filtered children) should always return ZERO and/or DATA.

I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.

Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.

Kevin


signature.asc
Description: PGP signature

Re: [PATCH 0/3] headers update and virtio-net fixup

2020-04-28 Thread Cornelia Huck

On Mon, 27 Apr 2020 12:24:12 +0200
Cornelia Huck  wrote:

> This updates the headers to Linux 5.7-rc3. Doing so exposes
> a problem in virtio-net (the #define for compat covers too much),
> fix it.
> 
> Note 1: I'd like this to go through s390-next so that I can go
> ahead with protected virtualization, which needs a headers
> update.
> 
> Note 2: Why has the feature been merged in the first place without the
> kernel part being upstream yet?
> 
> Cornelia Huck (3):
>   virtio-net: fix rsc_ext compat handling
>   linux-headers: update against Linux 5.7-rc3
>   virtio-net: remove VIRTIO_NET_HDR_F_RSC_INFO compat handling
> 
>  hw/net/virtio-net.c   |   8 --
>  include/standard-headers/linux/ethtool.h  |  10 +-
>  .../linux/input-event-codes.h |   5 +-
>  include/standard-headers/linux/pci_regs.h |   2 +
>  include/standard-headers/linux/vhost_types.h  |   8 ++
>  .../standard-headers/linux/virtio_balloon.h   |  12 ++-
>  include/standard-headers/linux/virtio_ids.h   |   1 +
>  include/standard-headers/linux/virtio_net.h   | 102 +-
>  linux-headers/COPYING |   2 +
>  linux-headers/asm-x86/kvm.h   |   1 +
>  linux-headers/asm-x86/unistd_32.h |   1 +
>  linux-headers/asm-x86/unistd_64.h |   1 +
>  linux-headers/asm-x86/unistd_x32.h|   1 +
>  linux-headers/linux/kvm.h |  47 +++-
>  linux-headers/linux/mman.h|   5 +-
>  linux-headers/linux/userfaultfd.h |  40 +--
>  linux-headers/linux/vfio.h|  37 +++
>  linux-headers/linux/vhost.h   |  24 +
>  18 files changed, 280 insertions(+), 27 deletions(-)
> 

Queued patches 1+2 to s390-next.

Re: [PATCH 1/2] virtiofsd: only retain file system capabilities

2020-04-28 Thread Dr. David Alan Gilbert

* Stefan Hajnoczi (stefa...@redhat.com) wrote:
> virtiofsd runs as root but only needs a subset of root's Linux
> capabilities(7).  As a file server its purpose is to create and access
> files on behalf of a client.  It needs to be able to access files with
> arbitrary uid/gid owners.  It also needs to be create device nodes.
> 
> Introduce a Linux capabilities(7) whitelist and drop all capabilities
> that we don't need, making the virtiofsd process less powerful than a
> regular uid root process.
> 
>   # cat /proc/PID/status
>   ...
>   Before   After
>   CapInh:  
>   CapPrm: 003f 88df
>   CapEff: 003f 88df
>   CapBnd: 003f 
>   CapAmb:  
> 
> Note that file capabilities cannot be used to achieve the same effect on
> the virtiofsd executable because mount is used during sandbox setup.
> Therefore we drop capabilities programmatically at the right point
> during startup.
> 
> This patch only affects the sandboxed child process.  The parent process
> that sits in waitpid(2) still has full root capabilities and will be
> addressed in the next patch.
> 
> Signed-off-by: Stefan Hajnoczi 

Looks reasonable to me; I can't see any capabilities in the manpage that
you're missing that make sense.
They also look old enough not to be a problem with reasonably old
systems.



Reviewed-by: Dr. David Alan Gilbert 

> ---
>  tools/virtiofsd/passthrough_ll.c | 38 
>  1 file changed, 38 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c 
> b/tools/virtiofsd/passthrough_ll.c
> index 4c35c95b25..af97ba1c41 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2695,6 +2695,43 @@ static void setup_mounts(const char *source)
>  close(oldroot);
>  }
>  
> +/*
> + * Only keep whitelisted capabilities that are needed for file system 
> operation
> + */
> +static void setup_capabilities(void)
> +{
> +pthread_mutex_lock(&cap.mutex);
> +capng_restore_state(&cap.saved);
> +
> +/*
> + * Whitelist file system-related capabilities that are needed for a file
> + * server to act like root.  Drop everything else like networking and
> + * sysadmin capabilities.
> + *
> + * Exclusions:
> + * 1. CAP_LINUX_IMMUTABLE is not included because it's only used via 
> ioctl
> + *and we don't support that.
> + * 2. CAP_MAC_OVERRIDE is not included because it only seems to be
> + *used by the Smack LSM.  Omit it until there is demand for it.
> + */
> +capng_setpid(syscall(SYS_gettid));
> +capng_clear(CAPNG_SELECT_BOTH);
> +capng_updatev(CAPNG_ADD, CAPNG_PERMITTED | CAPNG_EFFECTIVE,
> +CAP_CHOWN,
> +CAP_DAC_OVERRIDE,
> +CAP_DAC_READ_SEARCH,
> +CAP_FOWNER,
> +CAP_FSETID,
> +CAP_SETGID,
> +CAP_SETUID,
> +CAP_MKNOD,
> +CAP_SETFCAP);
> +capng_apply(CAPNG_SELECT_BOTH);
> +
> +cap.saved = capng_save_state();
> +pthread_mutex_unlock(&cap.mutex);
> +}
> +
>  /*
>   * Lock down this process to prevent access to other processes or files 
> outside
>   * source directory.  This reduces the impact of arbitrary code execution 
> bugs.
> @@ -2705,6 +2742,7 @@ static void setup_sandbox(struct lo_data *lo, struct 
> fuse_session *se,
>  setup_namespaces(lo, se);
>  setup_mounts(lo->source);
>  setup_seccomp(enable_syslog);
> +setup_capabilities();
>  }
>  
>  /* Raise the maximum number of open file descriptors */
> -- 
> 2.25.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Denis Plotnikov





On 27.04.2020 16:29, Max Reitz wrote:

On 21.04.20 10:11, Denis Plotnikov wrote:

The test checks fulfilling qcow2 requirements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
---
  tests/qemu-iotests/287 | 146 +
  tests/qemu-iotests/287.out |  67 +
  tests/qemu-iotests/group   |   1 +
  3 files changed, 214 insertions(+)
  create mode 100755 tests/qemu-iotests/287
  create mode 100644 tests/qemu-iotests/287.out

diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
new file mode 100755
index 00..156acc40ad
--- /dev/null
+++ b/tests/qemu-iotests/287
@@ -0,0 +1,146 @@
+#!/usr/bin/env bash
+#
+# Test case for an image using zstd compression
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=dplotni...@virtuozzo.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+# standard environment
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux

This test doesn’t work with compat=0.10 (because we can’t store a
non-default compression type there) or data_file (because those don’t
support compression), so those options should be marked as unsupported.

(It does seem to work with any refcount_bits, though.)


Could I ask how to achieve that?
I can't find any _supported_* related.

Denis



+
+COMPR_IMG="$TEST_IMG.compressed"
+RAND_FILE="$TEST_DIR/rand_data"
+
+_cleanup()
+{
+   _cleanup_test_img
+   rm -f "$COMPR_IMG"

Using _rm_test_img() would be nicer.  There shouldn’t be a functional
difference here because there’d only be one with external data files (I
think), which won’t work with this test, but still.


+   rm -f "$RAND_FILE"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# for all the cases
+CLUSTER_SIZE=65536
+
+# Check if we can run this test.
+if IMGOPTS='compression_type=zstd' _make_test_img 64M |
+grep "Invalid parameter 'zstd'"; then
+_notrun "ZSTD is disabled"
+fi
+
+echo
+echo "=== Testing compression type incompatible bit setting for zlib ==="
+echo
+IMGOPTS='compression_type=zlib' _make_test_img 64M

Please use -o so user options are still considered.

(i.e., _make_test_img -o compression_type=zlib)

[...]


+echo
+echo "=== Testing incompressible cluster processing with zstd ==="
+echo
+# create a 2M image and fill it with 1M likely incompressible data
+# and 1M compressible data
+dd if=/dev/urandom of="$RAND_FILE" bs=1M count=1 seek=1
+QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" \
+$QEMU_IO -f raw -c "write -P 0xFA 0 1M" "$RAND_FILE" | _filter_qemu_io
+$QEMU_IMG convert -f raw -O $IMGFMT -c "$RAND_FILE" "$TEST_IMG" | 
_filter_qemu_io
+
+$QEMU_IMG convert -O $IMGFMT -c -o compression_type=zstd \
+  "$TEST_IMG" "$COMPR_IMG"

Again, it would be nice to not discard the user-supplied options here,
and maybe it would also be nicer to explicitly pass the compression type
for the first convert, too.  So we’d use
   -o "$(_optstr_add "$IMGOPTS" "compression_type=zlib")"
for the first convert, and
   -o "$(_optstr_add "$IMGOPTS" "compression_type=zstd")"
for the second one.

Max

Re: [PATCH v10 00/14] iotests: use python logging

2020-04-28 Thread Max Reitz

On 31.03.20 02:00, John Snow wrote:
> This series uses python logging to enable output conditionally on
> iotests.log(). We unify an initialization call (which also enables
> debugging output for those tests with -d) and then make the switch
> inside of iotests.
> 
> It will help alleviate the need to create logged/unlogged versions
> of all the various helpers we have made.
> 
> Also, I got lost and accidentally delinted iotests while I was here.
> Sorry about that. By version 9, it's now the overwhelming focus of
> this series. No good deed, etc.

Seems like nobody else wants it, so I thank you and let you know that
I’ve applied this series to my block-next branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

Max



signature.asc
Description: OpenPGP digital signature

Re: [RFC PATCH v1 20/26] kvm: vmi: intercept live migration

2020-04-28 Thread Adalbert Lazăr

On Mon, 27 Apr 2020 20:08:55 +0100, "Dr. David Alan Gilbert" 
 wrote:
> * Adalbert Lazăr (ala...@bitdefender.com) wrote:
> > From: Marian Rotariu 
> > 
> > It is possible that the introspection tool has made some changes inside
> > the introspected VM which can make the guest crash if the introspection
> > connection is suddenly closed.
> > 
> > When the live migration starts, for now, the introspection tool is
> > signaled to remove its hooks from the introspected VM.
> > 
> > CC: Juan Quintela 
> > CC: "Dr. David Alan Gilbert" 
> > Signed-off-by: Marian Rotariu 
> > Signed-off-by: Adalbert Lazăr 
> 
> OK, so this isn't too intrusive to the migration code; and other than
> renaming 'start_live_migration_thread' to
> 'start_outgoing_migration_thread' I think I'd be OK with this,
> 
> but it might depend what your overall aim is.
> 
> For example, you might be better intercepting each migration_state
> change in your notifier, that's much finer grain than just the start of
> migration.

Thank you, Dave.

We want to intercept the live migration and 'block' it while the guest
is running (some changes made to the guest by the introspection app has
to be undone while the vCPUs are in certain states).

I'm not sure what is the best way to block these kind of events
(including the pause/shutdown commands). If calling main_loop_wait()
is enough (patch [22/26] kvm: vmi: add 'async_unhook' property [1])
then we can drop a lot of code.

The use of a notifier will be nice, but from what I understand, we can't
block the migration from a notification callback.

> The other thing I worry about is that there doesn't seem to be much
> guard against odd orderings of things - for example, what happens
> if the introspection client was to issue the  INTERCEPT_MIGRATE command
> twice while a migration was already running?  Or before an actual
> incoming channel connetion had happened?
> 
> Dave

Sorry that I haven't described the interception. When we intercept
an action that we want to 'block', we set a static variable first,
regardless if the introspection channel is connected or not, and :

   - if the introspection channel is not connected we don't block the
   action, but this (variable) will prevent the activation of this
   channel until the action (ie. migrate) is completed (a). I assume
   that there could be only one migrate (or suspend/pause) user command
   at any given time (b).

   - if the introspection channel is connected, the introspection app
   is signaled to start its unhook/undo process. We let the code flow
   continue, but the action (migrate/suspend/pause) is delayed until
   the introspection channel is closed. Meanwhile, any other intercepted
   action will not be blocked/delayed (c), but the fact that these actions
   are in progress is saved to static variables and the introspecton
   channel won't be reactivated.

Indeed, there are cases that are not handled well:

  a) if the migration is started and canceled before the introspection
  object is created (through QMP), the introspection channel will be
  disabled until the next migration starts and finishes.

  b) if a migration command has been delayed, a following migrate command
  (if this is possible) won't be delayed and we will have two migration
  threads started.

  c) if a migration command has been delayed, a following suspend/pause
  command won't be delayed and the introspection app might not have
  enough time to finish its unhook/undo process.

[1]: 
https://lore.kernel.org/qemu-devel/20200415005938.23895-23-ala...@bitdefender.com/

> > ---
> >  accel/kvm/vmi.c| 31 +++
> >  include/sysemu/vmi-intercept.h |  1 +
> >  migration/migration.c  | 18 +++---
> >  migration/migration.h  |  2 ++
> >  4 files changed, 45 insertions(+), 7 deletions(-)
> > 
> > diff --git a/accel/kvm/vmi.c b/accel/kvm/vmi.c
> > index 90906478b4..ea7191e48d 100644
> > --- a/accel/kvm/vmi.c
> > +++ b/accel/kvm/vmi.c
> > @@ -21,6 +21,8 @@
> >  #include "chardev/char.h"
> >  #include "chardev/char-fe.h"
> >  #include "migration/vmstate.h"
> > +#include "migration/migration.h"
> > +#include "migration/misc.h"
> >  
> >  #include "sysemu/vmi-intercept.h"
> >  #include "sysemu/vmi-handshake.h"
> > @@ -58,6 +60,7 @@ typedef struct VMIntrospection {
> >  int64_t vm_start_time;
> >  
> >  Notifier machine_ready;
> > +Notifier migration_state_change;
> >  bool created_from_command_line;
> >  
> >  bool kvmi_hooked;
> > @@ -74,9 +77,11 @@ static const char *action_string[] = {
> >  "suspend",
> >  "resume",
> >  "force-reset",
> > +"migrate",
> >  };
> >  
> >  static bool suspend_pending;
> > +static bool migrate_pending;
> >  
> >  #define TYPE_VM_INTROSPECTION "introspection"
> >  
> > @@ -88,6 +93,15 @@ static bool suspend_pending;
> >  static Error *vm_introspection_init(VMIntrospection *i);
> >  static void vm_introspection_reset(void *opaque);
> >  
> > +static

Re: [PATCH v10 00/14] iotests: use python logging

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 13:46 hat Max Reitz geschrieben:
> On 31.03.20 02:00, John Snow wrote:
> > This series uses python logging to enable output conditionally on
> > iotests.log(). We unify an initialization call (which also enables
> > debugging output for those tests with -d) and then make the switch
> > inside of iotests.
> > 
> > It will help alleviate the need to create logged/unlogged versions
> > of all the various helpers we have made.
> > 
> > Also, I got lost and accidentally delinted iotests while I was here.
> > Sorry about that. By version 9, it's now the overwhelming focus of
> > this series. No good deed, etc.
> 
> Seems like nobody else wants it, so I thank you and let you know that
> I’ve applied this series to my block-next branch:
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

John said he wanted to address my comment on patch 14, so I expected him
to send another version. This need not stop this series (we can still
fix that on top), but just as an explanation why I didn't take it yet.

Kevin


signature.asc
Description: PGP signature

Re: [RFC PATCH v1 20/26] kvm: vmi: intercept live migration

2020-04-28 Thread Dr. David Alan Gilbert

* Adalbert Lazăr (ala...@bitdefender.com) wrote:
> On Mon, 27 Apr 2020 20:08:55 +0100, "Dr. David Alan Gilbert" 
>  wrote:
> > * Adalbert Lazăr (ala...@bitdefender.com) wrote:
> > > From: Marian Rotariu 
> > > 
> > > It is possible that the introspection tool has made some changes inside
> > > the introspected VM which can make the guest crash if the introspection
> > > connection is suddenly closed.
> > > 
> > > When the live migration starts, for now, the introspection tool is
> > > signaled to remove its hooks from the introspected VM.
> > > 
> > > CC: Juan Quintela 
> > > CC: "Dr. David Alan Gilbert" 
> > > Signed-off-by: Marian Rotariu 
> > > Signed-off-by: Adalbert Lazăr 
> > 
> > OK, so this isn't too intrusive to the migration code; and other than
> > renaming 'start_live_migration_thread' to
> > 'start_outgoing_migration_thread' I think I'd be OK with this,
> > 
> > but it might depend what your overall aim is.
> > 
> > For example, you might be better intercepting each migration_state
> > change in your notifier, that's much finer grain than just the start of
> > migration.
> 
> Thank you, Dave.
> 
> We want to intercept the live migration and 'block' it while the guest
> is running (some changes made to the guest by the introspection app has
> to be undone while the vCPUs are in certain states).
> 
> I'm not sure what is the best way to block these kind of events
> (including the pause/shutdown commands). If calling main_loop_wait()
> is enough (patch [22/26] kvm: vmi: add 'async_unhook' property [1])
> then we can drop a lot of code.
> 
> The use of a notifier will be nice, but from what I understand, we can't
> block the migration from a notification callback.

Oh, if your intention is *just* to block a migration starting then you
can use 'migrate_add_blocker' - see hw/9pfs/9p.c for an example where
it's used and then removed; they use it to stop migration while the fs
 is mounted.  That causes an attempt to start a migration to give an
error (of your choosing).

> > The other thing I worry about is that there doesn't seem to be much
> > guard against odd orderings of things - for example, what happens
> > if the introspection client was to issue the  INTERCEPT_MIGRATE command
> > twice while a migration was already running?  Or before an actual
> > incoming channel connetion had happened?
> > 
> > Dave
> 
> Sorry that I haven't described the interception. When we intercept
> an action that we want to 'block', we set a static variable first,
> regardless if the introspection channel is connected or not, and :
> 
>- if the introspection channel is not connected we don't block the
>action, but this (variable) will prevent the activation of this
>channel until the action (ie. migrate) is completed (a). I assume
>that there could be only one migrate (or suspend/pause) user command
>at any given time (b).
> 
>- if the introspection channel is connected, the introspection app
>is signaled to start its unhook/undo process. We let the code flow
>continue, but the action (migrate/suspend/pause) is delayed until
>the introspection channel is closed. Meanwhile, any other intercepted
>action will not be blocked/delayed (c), but the fact that these actions
>are in progress is saved to static variables and the introspecton
>channel won't be reactivated.
> 
> Indeed, there are cases that are not handled well:
> 
>   a) if the migration is started and canceled before the introspection
>   object is created (through QMP), the introspection channel will be
>   disabled until the next migration starts and finishes.
> 
>   b) if a migration command has been delayed, a following migrate command
>   (if this is possible) won't be delayed and we will have two migration
>   threads started.
> 
>   c) if a migration command has been delayed, a following suspend/pause
>   command won't be delayed and the introspection app might not have
>   enough time to finish its unhook/undo process.

Yeh that sounds a bit messy.

Dave


> [1]: 
> https://lore.kernel.org/qemu-devel/20200415005938.23895-23-ala...@bitdefender.com/
> 
> > > ---
> > >  accel/kvm/vmi.c| 31 +++
> > >  include/sysemu/vmi-intercept.h |  1 +
> > >  migration/migration.c  | 18 +++---
> > >  migration/migration.h  |  2 ++
> > >  4 files changed, 45 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/accel/kvm/vmi.c b/accel/kvm/vmi.c
> > > index 90906478b4..ea7191e48d 100644
> > > --- a/accel/kvm/vmi.c
> > > +++ b/accel/kvm/vmi.c
> > > @@ -21,6 +21,8 @@
> > >  #include "chardev/char.h"
> > >  #include "chardev/char-fe.h"
> > >  #include "migration/vmstate.h"
> > > +#include "migration/migration.h"
> > > +#include "migration/misc.h"
> > >  
> > >  #include "sysemu/vmi-intercept.h"
> > >  #include "sysemu/vmi-handshake.h"
> > > @@ -58,6 +60,7 @@ typedef struct VMIntrospection {
> > >  int64_t vm_start_time;
> > >  
> > >  Notifier ma

RE: [PATCH 1/2] Fix undefined behaviour

2020-04-28 Thread Paul Durrant

> -Original Message-
> From: Artur Puzio 
> Sent: 28 April 2020 10:41
> To: p...@xen.org; 'Grzegorz Uriasz' ; 
> qemu-devel@nongnu.org
> Cc: marma...@invisiblethingslab.com; ja...@bartmin.ski; 
> j.nowa...@student.uw.edu.pl; 'Stefano
> Stabellini' ; 'Anthony Perard' 
> ; xen-
> de...@lists.xenproject.org
> Subject: Re: [PATCH 1/2] Fix undefined behaviour
> 
> On 28.04.2020 10:10, Paul Durrant wrote:
> >> -Original Message-
> >> From: Grzegorz Uriasz 
> >> Sent: 28 April 2020 07:29
> >> To: qemu-devel@nongnu.org
> >> Cc: Grzegorz Uriasz ; marma...@invisiblethingslab.com; 
> >> ar...@puzio.waw.pl;
> >> ja...@bartmin.ski; j.nowa...@student.uw.edu.pl; Stefano Stabellini 
> >> ;
> Anthony
> >> Perard ; Paul Durrant ; 
> >> xen-de...@lists.xenproject.org
> >> Subject: [PATCH 1/2] Fix undefined behaviour
> >>
> >> Signed-off-by: Grzegorz Uriasz 
> > I think we need more of a commit comment for both this and patch #2 to 
> > explain why you are making
> the changes.
> >
> >   Paul
> 
> I agree Grzegorz should improve the commit messages. In the mean time
> see email with subject "[PATCH 0/2] Fix QEMU crashes when passing IGD to
> a guest VM under XEN", it contains quite detailed explanation for both
> "Fix undefined behaviour" and "Improve legacy vbios handling" patches.
> 

Ok. Can you please make sure maintainers are cc-ed on patch #0 too.

  Paul

RE: [PATCH 1/2] Fix undefined behaviour

2020-04-28 Thread Paul Durrant

> -Original Message-
> From: Paul Durrant 
> Sent: 28 April 2020 13:33
> To: 'Artur Puzio' ; 'Grzegorz Uriasz' 
> ; qemu-devel@nongnu.org
> Cc: marma...@invisiblethingslab.com; ja...@bartmin.ski; 
> j.nowa...@student.uw.edu.pl; 'Stefano
> Stabellini' ; 'Anthony Perard' 
> ; xen-
> de...@lists.xenproject.org
> Subject: RE: [PATCH 1/2] Fix undefined behaviour
> 
> > -Original Message-
> > From: Artur Puzio 
> > Sent: 28 April 2020 10:41
> > To: p...@xen.org; 'Grzegorz Uriasz' ; 
> > qemu-devel@nongnu.org
> > Cc: marma...@invisiblethingslab.com; ja...@bartmin.ski; 
> > j.nowa...@student.uw.edu.pl; 'Stefano
> > Stabellini' ; 'Anthony Perard' 
> > ; xen-
> > de...@lists.xenproject.org
> > Subject: Re: [PATCH 1/2] Fix undefined behaviour
> >
> > On 28.04.2020 10:10, Paul Durrant wrote:
> > >> -Original Message-
> > >> From: Grzegorz Uriasz 
> > >> Sent: 28 April 2020 07:29
> > >> To: qemu-devel@nongnu.org
> > >> Cc: Grzegorz Uriasz ; 
> > >> marma...@invisiblethingslab.com; ar...@puzio.waw.pl;
> > >> ja...@bartmin.ski; j.nowa...@student.uw.edu.pl; Stefano Stabellini 
> > >> ;
> > Anthony
> > >> Perard ; Paul Durrant ; 
> > >> xen-de...@lists.xenproject.org
> > >> Subject: [PATCH 1/2] Fix undefined behaviour
> > >>
> > >> Signed-off-by: Grzegorz Uriasz 
> > > I think we need more of a commit comment for both this and patch #2 to 
> > > explain why you are making
> > the changes.
> > >
> > >   Paul
> >
> > I agree Grzegorz should improve the commit messages. In the mean time
> > see email with subject "[PATCH 0/2] Fix QEMU crashes when passing IGD to
> > a guest VM under XEN", it contains quite detailed explanation for both
> > "Fix undefined behaviour" and "Improve legacy vbios handling" patches.
> >
> 
> Ok. Can you please make sure maintainers are cc-ed on patch #0 too.
> 

Actually they are, sorry. My MUA is playing tricks on me.

  Paul

Re: [PATCH v3 1/3] block: Add blk_new_with_bs() helper

2020-04-28 Thread Eric Blake


On 4/28/20 1:34 AM, Max Reitz wrote:


block_crypto_co_create_generic(BlockDriverState *bs,
     PreallocMode prealloc,
     Error **errp)
   {
-    int ret;
+    int ret = -EPERM;


I’m not sure I’m a fan of this, because I feel like it makes the code
harder to read, due to having to look in three places (here, around the
blk_new_with_bs() call, and under the cleanup label) instead of in two
(not here) to verify that the error handling code is correct.

There’s also the fact that this is not really a default return value,
but one very specific error code for if one very specific function call
fails.

I suppose it comes down to whether one considers LoC a complexity
problem.  (I don’t, necessarily.)

(Also I realize it seems rather common in the kernel to set error return
variables before the function call, but I think the more common pattern
in qemu is to set it in the error path.)


I'm fine with either style.  Setting it up front is handy if that
particular error makes a good default, but in many of the functions I
touched, we were returning a variety of errors (-EIO, -EINVAL, -EPERM,
etc) such that there was no good default, and thus no reason to set a
default up front.  Is this something that would go through your tree,
and if so, are you okay making that tweak, or do I need to send v4?


I suppose I can do that, this is what I’d squash in, OK?


Yes, that change looks correct to me.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Max Reitz

On 28.04.20 13:41, Denis Plotnikov wrote:
> 
> 
> On 27.04.2020 16:29, Max Reitz wrote:
>> On 21.04.20 10:11, Denis Plotnikov wrote:
>>> The test checks fulfilling qcow2 requirements for the compression
>>> type feature and zstd compression type operability.
>>>
>>> Signed-off-by: Denis Plotnikov 
>>> ---
>>>   tests/qemu-iotests/287 | 146 +
>>>   tests/qemu-iotests/287.out |  67 +
>>>   tests/qemu-iotests/group   |   1 +
>>>   3 files changed, 214 insertions(+)
>>>   create mode 100755 tests/qemu-iotests/287
>>>   create mode 100644 tests/qemu-iotests/287.out
>>>
>>> diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
>>> new file mode 100755
>>> index 00..156acc40ad
>>> --- /dev/null
>>> +++ b/tests/qemu-iotests/287
>>> @@ -0,0 +1,146 @@
>>> +#!/usr/bin/env bash
>>> +#
>>> +# Test case for an image using zstd compression
>>> +#
>>> +# Copyright (c) 2020 Virtuozzo International GmbH
>>> +#
>>> +# This program is free software; you can redistribute it and/or modify
>>> +# it under the terms of the GNU General Public License as published by
>>> +# the Free Software Foundation; either version 2 of the License, or
>>> +# (at your option) any later version.
>>> +#
>>> +# This program is distributed in the hope that it will be useful,
>>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> +# GNU General Public License for more details.
>>> +#
>>> +# You should have received a copy of the GNU General Public License
>>> +# along with this program.  If not, see .
>>> +#
>>> +
>>> +# creator
>>> +owner=dplotni...@virtuozzo.com
>>> +
>>> +seq="$(basename $0)"
>>> +echo "QA output created by $seq"
>>> +
>>> +status=1    # failure is the default!
>>> +
>>> +# standard environment
>>> +. ./common.rc
>>> +. ./common.filter
>>> +
>>> +# This tests qocw2-specific low-level functionality
>>> +_supported_fmt qcow2
>>> +_supported_proto file
>>> +_supported_os Linux
>> This test doesn’t work with compat=0.10 (because we can’t store a
>> non-default compression type there) or data_file (because those don’t
>> support compression), so those options should be marked as unsupported.
>>
>> (It does seem to work with any refcount_bits, though.)
> 
> Could I ask how to achieve that?
> I can't find any _supported_* related.


It’s _unsupported_imgopts.

Max



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 3/3] virtio-net: remove VIRTIO_NET_HDR_F_RSC_INFO compat handling

2020-04-28 Thread Jason Wang




On 2020/4/28 下午5:18, Cornelia Huck wrote:

On Tue, 28 Apr 2020 16:58:44 +0800
Jason Wang  wrote:


On 2020/4/28 下午4:34, Cornelia Huck wrote:

On Tue, 28 Apr 2020 16:19:15 +0800
Jason Wang  wrote:
  

On 2020/4/27 下午6:24, Cornelia Huck wrote:

VIRTIO_NET_HDR_F_RSC_INFO is available in the headers now.

Signed-off-by: Cornelia Huck 
---
hw/net/virtio-net.c | 8 
1 file changed, 8 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e85d902588b3..7449570c7123 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -77,14 +77,6 @@
   tso/gso/gro 'off'. */
#define VIRTIO_NET_RSC_DEFAULT_INTERVAL 30

-/* temporary until standard header include it */

-#if !defined(VIRTIO_NET_HDR_F_RSC_INFO)
-
-#define VIRTIO_NET_HDR_F_RSC_INFO  4 /* rsc_ext data in csum_ fields */
-#define VIRTIO_NET_F_RSC_EXT   61
-
-#endif
-
static inline __virtio16 *virtio_net_rsc_ext_num_packets(
struct virtio_net_hdr *hdr)
{

I think we should not keep the those tricky num_packets/dup_acks.

No real opinion here, patch 3 is only a cleanup.

The important one is patch 1, because without it I cannot do a headers
update.


Yes, at least we should dereference segments/dup_acks instead of
csum_start/csum_offsets since the header has been synced.

So what about:

- I merge patch 1 and the header sync now (because I have a bunch of
   patches that depend on it...)
- We change virtio-net to handle that properly on top (probably best
   done by someone familiar with the code base ;)



That's fine.

Thanks

Re: [PATCH v3 0/3] qcow2: Allow resize of images with internal snapshots

2020-04-28 Thread Max Reitz

On 24.04.20 21:09, Eric Blake wrote:
> In v3:
> - patch 1: fix error returns [patchew, Max], R-b dropped
> - patch 2,3: unchanged, so add R-b
> 
> Eric Blake (3):
>   block: Add blk_new_with_bs() helper
>   qcow2: Allow resize of images with internal snapshots
>   qcow2: Tweak comment about bitmaps vs. resize

Thanks, I’ve squashed the diff into patch 1 and applied the series to my
block-next branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

Max



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Eric Blake


On 4/28/20 7:55 AM, Max Reitz wrote:


+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux

This test doesn’t work with compat=0.10 (because we can’t store a
non-default compression type there) or data_file (because those don’t
support compression), so those options should be marked as unsupported.

(It does seem to work with any refcount_bits, though.)


Could I ask how to achieve that?
I can't find any _supported_* related.



It’s _unsupported_imgopts.


Test 036 is an example of this.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 3/3] virtio-net: remove VIRTIO_NET_HDR_F_RSC_INFO compat handling

2020-04-28 Thread Jason Wang




On 2020/4/28 下午6:55, Yuri Benditovich wrote:


On Tue, Apr 28, 2020 at 12:18 PM Cornelia Huck > wrote:


On Tue, 28 Apr 2020 16:58:44 +0800
Jason Wang mailto:jasow...@redhat.com>> wrote:

> On 2020/4/28 下午4:34, Cornelia Huck wrote:
> > On Tue, 28 Apr 2020 16:19:15 +0800
> > Jason Wang mailto:jasow...@redhat.com>>
wrote:
> >
> >> On 2020/4/27 下午6:24, Cornelia Huck wrote:
> >>> VIRTIO_NET_HDR_F_RSC_INFO is available in the headers now.
> >>>
> >>> Signed-off-by: Cornelia Huck mailto:coh...@redhat.com>>
> >>> ---
> >>>    hw/net/virtio-net.c | 8 
> >>>    1 file changed, 8 deletions(-)
> >>>
> >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>> index e85d902588b3..7449570c7123 100644
> >>> --- a/hw/net/virtio-net.c
> >>> +++ b/hw/net/virtio-net.c
> >>> @@ -77,14 +77,6 @@
> >>>       tso/gso/gro 'off'. */
> >>>    #define VIRTIO_NET_RSC_DEFAULT_INTERVAL 30
> >>>
> >>> -/* temporary until standard header include it */
> >>> -#if !defined(VIRTIO_NET_HDR_F_RSC_INFO)
> >>> -
> >>> -#define VIRTIO_NET_HDR_F_RSC_INFO  4 /* rsc_ext data in
csum_ fields */
> >>> -#define VIRTIO_NET_F_RSC_EXT       61
> >>> -
> >>> -#endif
> >>> -
> >>>    static inline __virtio16 *virtio_net_rsc_ext_num_packets(
> >>>        struct virtio_net_hdr *hdr)
> >>>    {
> >>
> >> I think we should not keep the those tricky
num_packets/dup_acks.
> > No real opinion here, patch 3 is only a cleanup.
> >
> > The important one is patch 1, because without it I cannot do a
headers
> > update.
>
>
> Yes, at least we should dereference segments/dup_acks instead of
> csum_start/csum_offsets since the header has been synced.

So what about:

- I merge patch 1 and the header sync now (because I have a bunch of
  patches that depend on it...)
- We change virtio-net to handle that properly on top (probably best
  done by someone familiar with the code base ;)


Jason,
This series just solves the conflict caused by the update of Linux 
headers.
After this series is applied I can submit further patch to use actual 
RSC definitions from linux headers.


Thanks,
Yuri



Yes, please.

Thanks

Re: [PATCH 4/5] ramfb: add sanity checks to ramfb_create_display_surface

2020-04-28 Thread Laszlo Ersek

On 04/27/20 13:11, Gerd Hoffmann wrote:
>>> -size = (hwaddr)linesize * height;
>>> -data = cpu_physical_memory_map(addr, &size, false);
>>> -if (size != (hwaddr)linesize * height) {
>>> -cpu_physical_memory_unmap(data, size, 0, 0);
>>> +mapsize = size = stride * (height - 1) + linesize;
>>> +data = cpu_physical_memory_map(addr, &mapsize, false);
>>> +if (size != mapsize) {
>>> +cpu_physical_memory_unmap(data, mapsize, 0, 0);
>>>  return NULL;
>>>  }
>>>  
>>>  surface = qemu_create_displaysurface_from(width, height,
>>> -  format, linesize, data);
>>> +  format, stride, data);
>>>  pixman_image_set_destroy_function(surface->image,
>>>ramfb_unmap_display_surface, NULL);
>>>  
>>>
>>
>> I don't understand two things about this patch:
>>
>> - "stride" can still be smaller than "linesize" (scanlines can still
>> overlap). Why is that not a problem?
> 
> Why it should be?  It is the guests choice.  Not a very useful one, but
> hey, if the guest prefers it that way we are at your service ...
> 
> We only must make sure our size calculations are correct.  The patch
> does that.  I think we can also outlaw stride < linesize if you are
> happier with that alternative approach.  I doubt we have any guests
> relying on this working.

OK, thanks. I agree -- if it doesn't break QEMU, then we can let guests
break themselves.

> 
>> - assuming "stride > linesize" (i.e., strictly larger), we don't seem to
>> map the last complete stride, and that seems to be intentional. Is that
>> safe with regard to qemu_create_displaysurface_from()? Ultimately the
>> stride is passed to pixman_image_create_bits(), and the underlying
>> "data" may not be large enough for that. What am I missing?
> 
> Lets take a real-world example.  Wayland rounds up width and height to
> multiples of 64 (probably for tiling on modern GPUs).  So with 800x600
> you get an allocation of 832x640, like this:
> 
>  ###**   <- y 0
>  ###**
>  ###**
>  ###**
>  ###**   <- y 600
>  *   <- y 640
> 
>  ^ ^ ^- x 832
>  | +--- x 800
>  +- x 0
> 
> where "#" is image data and "*" is unused padding space.  Pixman will
> access all "#", so we are mapping the region from the first "#" to the
> last "#", including the unused padding on each scanline, except for the
> last scanline.  Any unused scanlines at the bottom are excluded too
> (ramfb doesn't even know whenever they exist).
> 
> The unused padding is only mapped because it is the easiest way to
> handle things, not because we need it.  Also the padding is typically
> *alot* smaller than PAGE_SIZE, so we couldn't exclude it from the
> mapping even if we would like to ;)

OK. If pixman only accesses the "#" marks, then it should be OK.

> 
>> Hm... bonus question: qemu_create_displaysurface_from() still accepts
>> "linesize" as a signed int. (Not sure about pixman_image_create_bits().)
>> Should we do something specific to prevent that value from being
>> negative? The guest gives us a uint32_t.
> 
> Not fully sure we can do that without breaking something, might be a
> negative stride is used for upside down images (last scanline comes
> first in memory).

Ugh... Upside down images???... Well, OK, I guess. :)

For the followup patch:

Acked-by: Laszlo Ersek 

Laszlo

Re: [RFC PATCH v1 20/26] kvm: vmi: intercept live migration

2020-04-28 Thread Adalbert Lazăr

On Tue, 28 Apr 2020 13:24:39 +0100, "Dr. David Alan Gilbert" 
 wrote:
> * Adalbert Lazăr (ala...@bitdefender.com) wrote:
> > On Mon, 27 Apr 2020 20:08:55 +0100, "Dr. David Alan Gilbert" 
> >  wrote:
> > > * Adalbert Lazăr (ala...@bitdefender.com) wrote:
> > > > From: Marian Rotariu 
> > > > 
> > > > It is possible that the introspection tool has made some changes inside
> > > > the introspected VM which can make the guest crash if the introspection
> > > > connection is suddenly closed.
> > > > 
> > > > When the live migration starts, for now, the introspection tool is
> > > > signaled to remove its hooks from the introspected VM.
> > > > 
> > > > CC: Juan Quintela 
> > > > CC: "Dr. David Alan Gilbert" 
> > > > Signed-off-by: Marian Rotariu 
> > > > Signed-off-by: Adalbert Lazăr 
> > > 
> > > OK, so this isn't too intrusive to the migration code; and other than
> > > renaming 'start_live_migration_thread' to
> > > 'start_outgoing_migration_thread' I think I'd be OK with this,
> > > 
> > > but it might depend what your overall aim is.
> > > 
> > > For example, you might be better intercepting each migration_state
> > > change in your notifier, that's much finer grain than just the start of
> > > migration.
> > 
> > Thank you, Dave.
> > 
> > We want to intercept the live migration and 'block' it while the guest
> > is running (some changes made to the guest by the introspection app has
> > to be undone while the vCPUs are in certain states).
> > 
> > I'm not sure what is the best way to block these kind of events
> > (including the pause/shutdown commands). If calling main_loop_wait()
> > is enough (patch [22/26] kvm: vmi: add 'async_unhook' property [1])
> > then we can drop a lot of code.
> > 
> > The use of a notifier will be nice, but from what I understand, we can't
> > block the migration from a notification callback.
> 
> Oh, if your intention is *just* to block a migration starting then you
> can use 'migrate_add_blocker' - see hw/9pfs/9p.c for an example where
> it's used and then removed; they use it to stop migration while the fs
>  is mounted.  That causes an attempt to start a migration to give an
> error (of your choosing).

One use case is to do VM introspection all the time the guest is running.
>From the user perspective, the pause/suspend/shutdown/snapshot/migrate
commands should work regardless if the VM is currently introspected
or not. Our first option was to delay these commands for a couple of
seconds when the VM is introspected, while the introspection app reverts
its changes, without blocking the vCPUs.

I'll see if we can mix the migrate notifier with migrate_add_blocker(),
or add a new migration state. To block the migration (with an error)
is our second option, because the user doing this might not be allowed
to stop the VM introspection.

Thank you,
Adalbert

Re: [PATCH v2 02/14] qcrypto/luks: implement encryption key management

2020-04-28 Thread Daniel P . Berrangé

On Sun, Mar 08, 2020 at 05:18:51PM +0200, Maxim Levitsky wrote:
> Next few patches will expose that functionality
> to the user.
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  crypto/block-luks.c | 398 +++-
>  qapi/crypto.json|  61 ++-
>  2 files changed, 455 insertions(+), 4 deletions(-)
> 
> diff --git a/crypto/block-luks.c b/crypto/block-luks.c
> index 4861db810c..b11ee08c6d 100644
> --- a/crypto/block-luks.c
> +++ b/crypto/block-luks.c

> +/*
> + * Erases an keyslot given its index
> + * Returns:
> + *0 if the keyslot was erased successfully
> + *   -1 if a error occurred while erasing the keyslot
> + *
> + */
> +static int
> +qcrypto_block_luks_erase_key(QCryptoBlock *block,
> + unsigned int slot_idx,
> + QCryptoBlockWriteFunc writefunc,
> + void *opaque,
> + Error **errp)
> +{
> +QCryptoBlockLUKS *luks = block->opaque;
> +QCryptoBlockLUKSKeySlot *slot = &luks->header.key_slots[slot_idx];
> +g_autofree uint8_t *garbagesplitkey = NULL;
> +size_t splitkeylen = luks->header.master_key_len * slot->stripes;
> +size_t i;
> +
> +assert(slot_idx < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS);
> +assert(splitkeylen > 0);
> +garbagesplitkey = g_new0(uint8_t, splitkeylen);
> +
> +/* Reset the key slot header */
> +memset(slot->salt, 0, QCRYPTO_BLOCK_LUKS_SALT_LEN);
> +slot->iterations = 0;
> +slot->active = QCRYPTO_BLOCK_LUKS_KEY_SLOT_DISABLED;
> +
> +qcrypto_block_luks_store_header(block,  writefunc, opaque, errp);

This may set  errp and we don't return immediately, so

> +/*
> + * Now try to erase the key material, even if the header
> + * update failed
> + */
> +for (i = 0; i < QCRYPTO_BLOCK_LUKS_ERASE_ITERATIONS; i++) {
> +if (qcrypto_random_bytes(garbagesplitkey, splitkeylen, errp) < 0) {

...this may then set errp a second time, which is not permitted.

This call needs to use a "local_err", and error_propagate(errp, local_err).
The latter is a no-op if errp is already set.

> +/*
> + * If we failed to get the random data, still write
> + * at least zeros to the key slot at least once
> + */
> +if (i > 0) {
> +return -1;
> +}
> +}
> +if (writefunc(block,
> +  slot->key_offset_sector * 
> QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
> +  garbagesplitkey,
> +  splitkeylen,
> +  opaque,
> +  errp) != splitkeylen) {

same issue with errp here too.

> +return -1;
> +}
> +}
> +return 0;
> +}


> +/*
> + * Given LUKSKeyslotUpdate command, set @slots_bitmap with all slots
> + * that will be updated with new password (or erased)
> + * returns 0 on success, and -1 on failure
> + */
> +static int
> +qcrypto_block_luks_get_update_bitmap(QCryptoBlock *block,
> + QCryptoBlockReadFunc readfunc,
> + void *opaque,
> + const QCryptoBlockAmendOptionsLUKS 
> *opts,
> + unsigned long *slots_bitmap,
> + Error **errp)
> +{
> +const QCryptoBlockLUKS *luks = block->opaque;
> +size_t i;
> +
> +bitmap_zero(slots_bitmap, QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS);
> +
> +if (opts->has_keyslot) {
> +/* keyslot set, select only this keyslot */
> +int keyslot = opts->keyslot;
> +
> +if (keyslot < 0 || keyslot >= QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS) {
> +error_setg(errp,
> +   "Invalid slot %u specified, must be between 0 and %u",
> +   keyslot, QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS - 1);
> +return -1;
> +}
> +bitmap_set(slots_bitmap, keyslot, 1);
> +
> +} else if (opts->has_old_secret) {
> +/* initially select all active keyslots */
> +for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
> +if (qcrypto_block_luks_slot_active(luks, i)) {
> +bitmap_set(slots_bitmap, i, 1);
> +}
> +}
> +} else {
> +/* find a free keyslot */
> +int slot = qcrypto_block_luks_find_free_keyslot(luks);
> +
> +if (slot == -1) {
> +error_setg(errp,
> +   "Can't add a keyslot - all key slots are in use");
> +return -1;
> +}
> +bitmap_set(slots_bitmap, slot, 1);
> +}
> +
> +if (opts->has_old_secret) {
> +/* now deselect all keyslots that don't contain the password */
> +g_autofree uint8_t *tmpkey = g_new0(uint8_t,
> +luks->header.master_key_len);
> +
> +for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
>

Re: [PATCH v1] target/m68k: fix gdb for m68xxx

2020-04-28 Thread KONRAD Frederic





Le 4/27/20 à 9:53 AM, Laurent Vivier a écrit :

Le 20/04/2020 à 16:01, frederic.kon...@adacore.com a écrit :

From: KONRAD Frederic 

Currently "cf-core.xml" is sent to GDB when using any m68k flavor.  Thing is
it uses the "org.gnu.gdb.coldfire.core" feature name and gdb 8.3 then expects
a coldfire FPU instead of the default m68881 FPU.

This is not OK because the m68881 floats registers are 96 bits wide so it
crashes GDB with the following error message:

(gdb) target remote localhost:7960
Remote debugging using localhost:7960
warning: Register "fp0" has an unsupported size (96 bits)
warning: Register "fp1" has an unsupported size (96 bits)
...
Remote 'g' packet reply is too long (expected 148 bytes, got 180 bytes):\
   000[...]

With this patch: qemu-system-m68k -M none -cpu m68020 -s -S

(gdb) tar rem :1234
Remote debugging using :1234
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
0x in ?? ()
(gdb) p $fp0
$1 = nan(0x)

Signed-off-by: KONRAD Frederic 
---
  configure |  2 +-
  gdb-xml/m68k-core.xml | 29 +
  target/m68k/cpu.c | 30 +-
  3 files changed, 55 insertions(+), 6 deletions(-)
  create mode 100644 gdb-xml/m68k-core.xml

diff --git a/configure b/configure
index 23b5e93..2b07b85 100755
--- a/configure
+++ b/configure
@@ -7825,7 +7825,7 @@ case "$target_name" in
;;
m68k)
  bflt="yes"
-gdb_xml_files="cf-core.xml cf-fp.xml m68k-fp.xml"
+gdb_xml_files="cf-core.xml cf-fp.xml m68k-core.xml m68k-fp.xml"
  TARGET_SYSTBL_ABI=common
;;
microblaze|microblazeel)
diff --git a/gdb-xml/m68k-core.xml b/gdb-xml/m68k-core.xml
new file mode 100644
index 000..5b092d2
--- /dev/null
+++ b/gdb-xml/m68k-core.xml
@@ -0,0 +1,29 @@
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
+  
+  
+
+
diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
index 9445fcd..976e624 100644
--- a/target/m68k/cpu.c
+++ b/target/m68k/cpu.c
@@ -297,6 +297,21 @@ static void m68k_cpu_class_init(ObjectClass *c, void *data)
  dc->vmsd = &vmstate_m68k_cpu;
  }
  
+static void m68k_cpu_class_init_m68k_core(ObjectClass *c, void *data)

+{
+CPUClass *cc = CPU_CLASS(c);
+
+cc->gdb_core_xml_file = "m68k-core.xml";
+}


Could you also add a m68k_cpu_class_init_cf_core() and move the
cf-core.xml into it?


Yes I can do that:
  - DEFINE_M68K_CPU_TYPE_M68K will use m68k_cpu_class_init_m68k_core.
  - DEFINE_M68K_CPU_TYPE_CF will use m68k_cpu_class_init_cf_core.
  - drop xxx_WITH_CLASS behind.




+
+#define DEFINE_M68K_CPU_TYPE_WITH_CLASS(cpu_model, initfn, classinit)  \
+{  \
+.name = M68K_CPU_TYPE_NAME(cpu_model), \
+.instance_init = initfn,   \
+.parent = TYPE_M68K_CPU,   \
+.class_init = classinit,   \
+}
+


I would prefer to have two macros with no class parameter, something
like DEFINE_M68K_CPU_TYPE_M68K() and DEFINE_M68K_CPU_TYPE_CF().


  #define DEFINE_M68K_CPU_TYPE(cpu_model, initfn) \
  {   \
  .name = M68K_CPU_TYPE_NAME(cpu_model),  \
@@ -314,11 +329,16 @@ static const TypeInfo m68k_cpus_type_infos[] = {
  .class_size = sizeof(M68kCPUClass),
  .class_init = m68k_cpu_class_init,
  },
-DEFINE_M68K_CPU_TYPE("m68000", m68000_cpu_initfn),
-DEFINE_M68K_CPU_TYPE("m68020", m68020_cpu_initfn),
-DEFINE_M68K_CPU_TYPE("m68030", m68030_cpu_initfn),
-DEFINE_M68K_CPU_TYPE("m68040", m68040_cpu_initfn),
-DEFINE_M68K_CPU_TYPE("m68060", m68060_cpu_initfn),
+DEFINE_M68K_CPU_TYPE_WITH_CLASS("m68000", m68000_cpu_initfn,
+m68k_cpu_class_init_m68k_core),
+DEFINE_M68K_CPU_TYPE_WITH_CLASS("m68020", m68020_cpu_initfn,
+m68k_cpu_class_init_m68k_core),
+DEFINE_M68K_CPU_TYPE_WITH_CLASS("m68030", m68030_cpu_initfn,
+m68k_cpu_class_init_m68k_core),
+DEFINE_M68K_CPU_TYPE_WITH_CLASS("m68040", m68040_cpu_initfn,
+m68k_cpu_class_init_m68k_core),
+DEFINE_M68K_CPU_TYPE_WITH_CLASS("m68060", m68060_cpu_initfn,
+m68k_cpu_class_init_m68k_core),
  DEFINE_M68K_CPU_TYPE("m5206", m5206_cpu_initfn),
  DEFINE_M68K_CPU_TYPE("m5208", m5208_cpu_initfn),
  DEFINE_M68K_CPU_TYPE("cfv4e", cfv4e_cpu_initfn),


But what about the "any" which is out of context here?

DEFINE_M68K_CPU_TYPE("any", any_cpu_initfn),

Should it be TYPE_M68K or TYPE_CF in which case which xml will it take?  I guess
we can take TYPE_CF so it doesn't change from what is done today.





I think we c

[PATCH 4/4] block: Use blk_make_empty() after commits

2020-04-28 Thread Max Reitz

bdrv_commit() already has a BlockBackend pointing to the BDS that we
want to empty, it just has the wrong permissions.

qemu-img commit has no BlockBackend pointing to the old backing file
yet, but introducing one is simple.

After this commit, bdrv_make_empty() is the only remaining caller of
BlockDriver.bdrv_make_empty().

Signed-off-by: Max Reitz 
---
 block/commit.c |  8 +++-
 qemu-img.c | 19 ++-
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 8e672799af..24720ba67d 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -493,10 +493,16 @@ int bdrv_commit(BlockDriverState *bs)
 }
 
 if (drv->bdrv_make_empty) {
-ret = drv->bdrv_make_empty(bs);
+ret = blk_set_perm(src, BLK_PERM_WRITE, BLK_PERM_ALL, NULL);
 if (ret < 0) {
 goto ro_cleanup;
 }
+
+ret = blk_make_empty(src, NULL);
+if (ret < 0) {
+goto ro_cleanup;
+}
+
 blk_flush(src);
 }
 
diff --git a/qemu-img.c b/qemu-img.c
index 821cbf610e..a5e8659867 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1065,11 +1065,20 @@ static int img_commit(int argc, char **argv)
 goto unref_backing;
 }
 
-if (!drop && bs->drv->bdrv_make_empty) {
-ret = bs->drv->bdrv_make_empty(bs);
-if (ret) {
-error_setg_errno(&local_err, -ret, "Could not empty %s",
- filename);
+if (!drop) {
+BlockBackend *old_backing_blk;
+
+old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
+  &local_err);
+if (!old_backing_blk) {
+goto unref_backing;
+}
+ret = blk_make_empty(old_backing_blk, &local_err);
+blk_unref(old_backing_blk);
+if (ret == -ENOTSUP) {
+error_free(local_err);
+local_err = NULL;
+} else if (ret < 0) {
 goto unref_backing;
 }
 }
-- 
2.25.4

[PATCH 2/4] block: Use bdrv_make_empty() where possible

2020-04-28 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block/replication.c | 6 ++
 block/vvfat.c   | 4 +---
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index da013c2041..cc6a40d577 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -331,9 +331,8 @@ static void secondary_do_checkpoint(BDRVReplicationState 
*s, Error **errp)
 return;
 }
 
-ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
+ret = bdrv_make_empty(s->active_disk, errp);
 if (ret < 0) {
-error_setg(errp, "Cannot make active disk empty");
 return;
 }
 
@@ -343,9 +342,8 @@ static void secondary_do_checkpoint(BDRVReplicationState 
*s, Error **errp)
 return;
 }
 
-ret = s->hidden_disk->bs->drv->bdrv_make_empty(s->hidden_disk->bs);
+ret = bdrv_make_empty(s->hidden_disk, errp);
 if (ret < 0) {
-error_setg(errp, "Cannot make hidden disk empty");
 return;
 }
 }
diff --git a/block/vvfat.c b/block/vvfat.c
index ab800c4887..e3020b65c8 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2960,9 +2960,7 @@ static int do_commit(BDRVVVFATState* s)
 return ret;
 }
 
-if (s->qcow->bs->drv && s->qcow->bs->drv->bdrv_make_empty) {
-s->qcow->bs->drv->bdrv_make_empty(s->qcow->bs);
-}
+bdrv_make_empty(s->qcow, NULL);
 
 memset(s->used_clusters, 0, sector2cluster(s, s->sector_count));
 
-- 
2.25.4

[PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Max Reitz

Right now, all users of bdrv_make_empty() call the BlockDriver method
directly.  That is not only bad style, it is also wrong, unless the
caller has a BdrvChild with a WRITE permission.

Introduce bdrv_make_empty() that verifies that it does.

Signed-off-by: Max Reitz 
---
 include/block/block.h |  1 +
 block.c   | 23 +++
 2 files changed, 24 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index b05995fe9c..d947fb4080 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts 
*opts,
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
 int bdrv_commit(BlockDriverState *bs);
+int bdrv_make_empty(BdrvChild *c, Error **errp);
 int bdrv_change_backing_file(BlockDriverState *bs,
 const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
diff --git a/block.c b/block.c
index 2e3905c99e..b0d5b98617 100644
--- a/block.c
+++ b/block.c
@@ -6791,3 +6791,26 @@ void bdrv_del_child(BlockDriverState *parent_bs, 
BdrvChild *child, Error **errp)
 
 parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
 }
+
+int bdrv_make_empty(BdrvChild *c, Error **errp)
+{
+BlockDriver *drv = c->bs->drv;
+int ret;
+
+assert(c->perm & BLK_PERM_WRITE);
+
+if (!drv->bdrv_make_empty) {
+error_setg(errp, "%s does not support emptying nodes",
+   drv->format_name);
+return -ENOTSUP;
+}
+
+ret = drv->bdrv_make_empty(c->bs);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to empty %s",
+ c->bs->filename);
+return ret;
+}
+
+return 0;
+}
-- 
2.25.4

[PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread Max Reitz

Branch: https://github.com/XanClic/qemu.git fix-bdrv_make_empty-v1
Branch: https://git.xanclic.moe/XanClic/qemu.git fix-bdrv_make_empty-v1

Hi,

Right now, there is no centralized bdrv_make_empty() function.  Not only
is it bad style to call BlockDriver methods directly, it is also wrong,
unless the caller has a BdrvChild with BLK_PERM_WRITE taken.

This series fixes that.

Note that as far as I’m aware this series shouldn’t visibly fix anything
at this point; but “block: Introduce real BdrvChildRole”
(https://lists.nongnu.org/archive/html/qemu-block/2020-02/msg00737.html)
makes the iotest break when run with -o data_file=$SOMETHING, without
this series applied beforehand.  (That is because without that series,
external data files are treated much like metadata children, so the
format driver always takes the WRITE permission if the file is writable;
but after that series, it only does so when it itself has a parent
requestion the WRITE permission.)


Max Reitz (4):
  block: Add bdrv_make_empty()
  block: Use bdrv_make_empty() where possible
  block: Add blk_make_empty()
  block: Use blk_make_empty() after commits

 include/block/block.h  |  1 +
 include/sysemu/block-backend.h |  2 ++
 block.c| 23 +++
 block/block-backend.c  |  5 +
 block/commit.c |  8 +++-
 block/replication.c|  6 ++
 block/vvfat.c  |  4 +---
 qemu-img.c | 19 ++-
 8 files changed, 55 insertions(+), 13 deletions(-)

-- 
2.25.4

[PATCH 3/4] block: Add blk_make_empty()

2020-04-28 Thread Max Reitz

Two callers of BlockDriver.bdrv_make_empty() remain that should not call
this method directly.  Both do not have access to a BdrvChild, but they
can use a BlockBackend, so we add this function that lets them use it.

Signed-off-by: Max Reitz 
---
 include/sysemu/block-backend.h | 2 ++
 block/block-backend.c  | 5 +
 2 files changed, 7 insertions(+)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index d37c1244dd..14338b76dc 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -266,4 +266,6 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, 
int64_t off_in,
 
 const BdrvChild *blk_root(BlockBackend *blk);
 
+int blk_make_empty(BlockBackend *blk, Error **errp);
+
 #endif
diff --git a/block/block-backend.c b/block/block-backend.c
index 3592066b42..5d36efd32f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2402,3 +2402,8 @@ const BdrvChild *blk_root(BlockBackend *blk)
 {
 return blk->root;
 }
+
+int blk_make_empty(BlockBackend *blk, Error **errp)
+{
+return bdrv_make_empty(blk->root, errp);
+}
-- 
2.25.4

[PATCH v21 0/4] implement zstd cluster compression method

2020-04-28 Thread Denis Plotnikov

v21:
   03:
   * remove the loop on compression [Max]
   * use designated initializers [Max]
   04:
   * don't erase user's options [Max]
   * use _rm_test_img [Max]
   * add unsupported qcow2 options [Max]

v20:
   04: fix a number of flaws [Vladimir]
   * don't use $RAND_FILE passing to qemu-io,
 so check $TEST_DIR is redundant
   * re-arrage $RAND_FILE writing
   * fix a typo

v19:
   04: fix a number of flaws [Eric]
   * remove rudundant test case descriptions
   * fix stdout redirect
   * don't use (())
   * use peek_file_be instead of od
   * check $TEST_DIR for spaces and other before using
   * use $RAND_FILE safer

v18:
   * 04: add quotes to all file name variables [Vladimir] 
   * 04: add Vladimir's comment according to "qemu-io write -s"
 option issue.

v17:
   * 03: remove incorrect comment in zstd decompress [Vladimir]
   * 03: remove "paraniod" and rewrite the comment on decompress [Vladimir]
   * 03: fix dead assignment [Vladimir]
   * 04: add and remove quotes [Vladimir]
   * 04: replace long offset form with the short one [Vladimir]

v16:
   * 03: ssize_t for ret, size_t for zstd_ret [Vladimir]
   * 04: small fixes according to the comments [Vladimir] 

v15:
   * 01: aiming qemu 5.1 [Eric]
   * 03: change zstd_res definition place [Vladimir]
   * 04: add two new test cases [Eric]
 1. test adjacent cluster compression with zstd
 2. test incompressible cluster processing
   * 03, 04: many rewording and gramma fixing [Eric]

v14:
   * fix bug on compression - looping until compress == 0 [Me]
   * apply reworked Vladimir's suggestions:
  1. not mixing ssize_t with size_t
  2. safe check for ENOMEM in compression part - avoid overflow
  3. tolerate sanity check allow zstd to make progress only
 on one of the buffers
v13:
   * 03: add progress sanity check to decompression loop [Vladimir]
 03: add successful decompression check [Me]

v12:
   * 03: again, rework compression and decompression loops
 to make them more correct [Vladimir]
 03: move assert in compression to more appropriate place
 [Vladimir]
v11:
   * 03: the loops don't need "do{}while" form anymore and
 the they were buggy (missed "do" in the beginning)
 replace them with usual "while(){}" loops [Vladimir]
v10:
   * 03: fix zstd (de)compressed loops for multi-frame
 cases [Vladimir]
v9:
   * 01: fix error checking and reporting in qcow2_amend compression type part 
[Vladimir]
   * 03: replace asserts with -EIO in qcow2_zstd_decompression [Vladimir, 
Alberto]
   * 03: reword/amend/add comments, fix typos [Vladimir]

v8:
   * 03: switch zstd API from simple to stream [Eric]
 No need to state a special cluster layout for zstd
 compressed clusters.
v7:
   * use qapi_enum_parse instead of the open-coding [Eric]
   * fix wording, typos and spelling [Eric]

v6:
   * "block/qcow2-threads: fix qcow2_decompress" is removed from the series
  since it has been accepted by Max already
   * add compile time checking for Qcow2Header to be a multiple of 8 [Max, 
Alberto]
   * report error on qcow2 amending when the compression type is actually 
chnged [Max]
   * remove the extra space and the extra new line [Max]
   * re-arrange acks and signed-off-s [Vladimir]

v5:
   * replace -ENOTSUP with abort in qcow2_co_decompress [Vladimir]
   * set cluster size for all test cases in the beginning of the 287 test

v4:
   * the series is rebased on top of 01 "block/qcow2-threads: fix 
qcow2_decompress"
   * 01 is just a no-change resend to avoid extra dependencies. Still, it may 
be merged in separate

v3:
   * remove redundant max compression type value check [Vladimir, Eric]
 (the switch below checks everything)
   * prevent compression type changing on "qemu-img amend" [Vladimir]
   * remove zstd config setting, since it has been added already by
 "migration" patches [Vladimir]
   * change the compression type error message [Vladimir] 
   * fix alignment and 80-chars exceeding [Vladimir]

v2:
   * rework compression type setting [Vladimir]
   * squash iotest changes to the compression type introduction patch 
[Vladimir, Eric]
   * fix zstd availability checking in zstd iotest [Vladimir]
   * remove unnecessry casting [Eric]
   * remove rudundant checks [Eric]
   * fix compressed cluster layout in qcow2 spec [Vladimir]
   * fix wording [Eric, Vladimir]
   * fix compression type filtering in iotests [Eric]

v1:
   the initial series

Denis Plotnikov (4):
  qcow2: introduce compression type feature
  qcow2: rework the cluster compression routine
  qcow2: add zstd cluster compression
  iotests: 287: add qcow2 compression type test

 docs/interop/qcow2.txt   |   1 +
 configure|   2 +-
 qapi/block-core.json |  23 ++-
 block/qcow2.h|  20 ++-
 include/block/block_int.h|   1 +
 block/qcow2-threads.c

[PATCH v21 2/4] qcow2: rework the cluster compression routine

2020-04-28 Thread Denis Plotnikov

The patch enables processing the image compression type defined
for the image and chooses an appropriate method for image clusters
(de)compression.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2-threads.c | 71 ---
 1 file changed, 60 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index a68126f291..7dbaf53489 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -74,7 +74,9 @@ typedef struct Qcow2CompressData {
 } Qcow2CompressData;
 
 /*
- * qcow2_compress()
+ * qcow2_zlib_compress()
+ *
+ * Compress @src_size bytes of data using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -83,8 +85,8 @@ typedef struct Qcow2CompressData {
  *  -ENOMEM destination buffer is not enough to store compressed data
  *  -EIOon any other error
  */
-static ssize_t qcow2_compress(void *dest, size_t dest_size,
-  const void *src, size_t src_size)
+static ssize_t qcow2_zlib_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
 {
 ssize_t ret;
 z_stream strm;
@@ -119,10 +121,10 @@ static ssize_t qcow2_compress(void *dest, size_t 
dest_size,
 }
 
 /*
- * qcow2_decompress()
+ * qcow2_zlib_decompress()
  *
  * Decompress some data (not more than @src_size bytes) to produce exactly
- * @dest_size bytes.
+ * @dest_size bytes using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -130,8 +132,8 @@ static ssize_t qcow2_compress(void *dest, size_t dest_size,
  * Returns: 0 on success
  *  -EIO on fail
  */
-static ssize_t qcow2_decompress(void *dest, size_t dest_size,
-const void *src, size_t src_size)
+static ssize_t qcow2_zlib_decompress(void *dest, size_t dest_size,
+ const void *src, size_t src_size)
 {
 int ret;
 z_stream strm;
@@ -191,20 +193,67 @@ qcow2_co_do_compress(BlockDriverState *bs, void *dest, 
size_t dest_size,
 return arg.ret;
 }
 
+/*
+ * qcow2_co_compress()
+ *
+ * Compress @src_size bytes of data using the compression
+ * method defined by the image compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
   const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_compress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_compress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
+/*
+ * qcow2_co_decompress()
+ *
+ * Decompress some data (not more than @src_size bytes) to produce exactly
+ * @dest_size bytes using the compression method defined by the image
+ * compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: 0 on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
 const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_decompress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_decompress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
 
-- 
2.17.0

[PATCH v21 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Denis Plotnikov

The test checks fulfilling qcow2 requirements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Vladimir Sementsov-Ogievskiy 
---
 slirp  |   2 +-
 tests/qemu-iotests/287 | 152 +
 tests/qemu-iotests/287.out |  67 
 tests/qemu-iotests/group   |   1 +
 4 files changed, 221 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/287
 create mode 100644 tests/qemu-iotests/287.out

diff --git a/slirp b/slirp
index 2faae0f778..55ab21c9a3 16
--- a/slirp
+++ b/slirp
@@ -1 +1 @@
-Subproject commit 2faae0f778f818fadc873308f983289df697eb93
+Subproject commit 55ab21c9a36852915b81f1b41ebaf3b6509dd8ba
diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
new file mode 100755
index 00..21fe1f19f5
--- /dev/null
+++ b/tests/qemu-iotests/287
@@ -0,0 +1,152 @@
+#!/usr/bin/env bash
+#
+# Test case for an image using zstd compression
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=dplotni...@virtuozzo.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+# standard environment
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+_unsupported_imgopts 'compat=0.10' data_file
+
+COMPR_IMG="$TEST_IMG.compressed"
+RAND_FILE="$TEST_DIR/rand_data"
+
+_cleanup()
+{
+   _rm_test_img
+   rm -f "$COMPR_IMG"
+   rm -f "$RAND_FILE"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# for all the cases
+CLUSTER_SIZE=65536
+
+# Check if we can run this test.
+if IMGOPTS='compression_type=zstd' _make_test_img 64M |
+grep "Invalid parameter 'zstd'"; then
+_notrun "ZSTD is disabled"
+fi
+
+echo
+echo "=== Testing compression type incompatible bit setting for zlib ==="
+echo
+_make_test_img -o compression_type=zlib 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+echo
+echo "=== Testing compression type incompatible bit setting for zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+echo
+echo "=== Testing zlib with incompatible bit set ==="
+echo
+_make_test_img -o compression_type=zlib 64M
+$PYTHON qcow2.py "$TEST_IMG" set-feature-bit incompatible 3
+# to make sure the bit was actually set
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+if $QEMU_IMG info "$TEST_IMG" >/dev/null 2>&1 ; then
+echo "Error: The image opened successfully. The image must not be opened."
+fi
+
+echo
+echo "=== Testing zstd with incompatible bit unset ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$PYTHON qcow2.py "$TEST_IMG" set-header incompatible_features 0
+# to make sure the bit was actually unset
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+if $QEMU_IMG info "$TEST_IMG" >/dev/null 2>&1 ; then
+echo "Error: The image opened successfully. The image must not be opened."
+fi
+
+echo
+echo "=== Testing compression type values ==="
+echo
+# zlib=0
+_make_test_img -o compression_type=zlib 64M
+peek_file_be "$TEST_IMG" 104 1
+echo
+
+# zstd=1
+_make_test_img -o compression_type=zstd 64M
+peek_file_be "$TEST_IMG" 104 1
+echo
+
+echo
+echo "=== Testing simple reading and writing with zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$QEMU_IO -c "write -c -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+# read on the cluster boundaries
+$QEMU_IO -c "read -v 131070 8 " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -v 65534 8" "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo "=== Testing adjacent clusters reading and writing with zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$QEMU_IO -c "write -c -P 0xAB 0 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xAD 128K 64K " "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IO -c "read -P 0xAB 0 64k " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAC 64K 64k " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAD 128K 64k " "$T

Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Denis Plotnikov





On 28.04.2020 16:01, Eric Blake wrote:

On 4/28/20 7:55 AM, Max Reitz wrote:


+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux

This test doesn’t work with compat=0.10 (because we can’t store a
non-default compression type there) or data_file (because those don’t
support compression), so those options should be marked as 
unsupported.


(It does seem to work with any refcount_bits, though.)


Could I ask how to achieve that?
I can't find any _supported_* related.



It’s _unsupported_imgopts.


Test 036 is an example of this.

Max, Eric

Thanks!

Denis

[PATCH v21 3/4] qcow2: add zstd cluster compression

2020-04-28 Thread Denis Plotnikov

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
  time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
  src.img [zlib|zstd]_compressed.img
decompress cmd
  time ./qemu-img convert -O qcow2
  [zlib|zstd]_compressed.img uncompressed.img

   compression   decompression
 zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)1.9  1.6 (-16 %)
user 65.0   15.85.3  2.5
sys   3.30.22.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---
 docs/interop/qcow2.txt |   1 +
 configure  |   2 +-
 qapi/block-core.json   |   3 +-
 block/qcow2-threads.c  | 167 +
 block/qcow2.c  |   7 ++
 5 files changed, 178 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 640e0eca40..18a77f737e 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -209,6 +209,7 @@ version 2.
 
 Available compression type values:
 0: zlib 
+1: zstd 
 
 
 === Header padding ===
diff --git a/configure b/configure
index 23b5e93752..4e3a1690ea 100755
--- a/configure
+++ b/configure
@@ -1861,7 +1861,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   lzfse   support of lzfse compression library
   (for reading lzfse-compressed dmg images)
   zstdsupport for zstd compression library
-  (for migration compression)
+  (for migration compression and qcow2 cluster compression)
   seccomp seccomp support
   coroutine-pool  coroutine freelist (better performance)
   glusterfs   GlusterFS backend
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1522e2983f..6fbacddab2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4293,11 +4293,12 @@
 # Compression type used in qcow2 image file
 #
 # @zlib: zlib compression, see 
+# @zstd: zstd compression, see 
 #
 # Since: 5.1
 ##
 { 'enum': 'Qcow2CompressionType',
-  'data': [ 'zlib' ] }
+  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
 
 ##
 # @BlockdevCreateOptionsQcow2:
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..0591dafbc8 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
 #define ZLIB_CONST
 #include 
 
+#ifdef CONFIG_ZSTD
+#include 
+#include 
+#endif
+
 #include "qcow2.h"
 #include "block/thread-pool.h"
 #include "crypto.h"
@@ -166,6 +171,158 @@ static ssize_t qcow2_zlib_decompress(void *dest, size_t 
dest_size,
 return ret;
 }
 
+#ifdef CONFIG_ZSTD
+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store compressed data
+ *  -EIOon any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+ssize_t ret;
+size_t zstd_ret;
+ZSTD_outBuffer output = {
+.dst = dest,
+.size = dest_size,
+.pos = 0
+};
+ZSTD_inBuffer input = {
+.src = src,
+.size = src_size,
+.pos = 0
+};
+ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+if (!cctx) {
+return -EIO;
+}
+/*
+ * Use the zstd streamed interface for symmetry with decompression,
+ * where streaming is essential since we don't record the exact
+ * compressed size.
+ *
+ * ZSTD_compressStream2() tries to compress everything it could
+ * with a single call. Although, ZSTD docs says that:
+ * "You must continue calling ZSTD_compressStream2() with ZSTD_e_end
+ * until it returns 0, at which point you are free to start a new frame",
+ * in out tests we saw the only case when it returned with >0 -
+ * when the output buffer was too small. In that case,
+

[PATCH v21 1/4] qcow2: introduce compression type feature

2020-04-28 Thread Denis Plotnikov

The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.

It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.

The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.

The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.

Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.

The tests are fixed in the following ways:
* filter out compression_type for many tests
* fix header size, feature table size and backing file offset
  affected tests: 031, 036, 061, 080
  header_size +=8: 1 byte compression type
   7 bytes padding
  feature_table += 48: incompatible feature compression type
  backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
* add "compression type" for test output matching when it isn't filtered
  affected tests: 049, 060, 061, 065, 144, 182, 242, 255

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
QAPI part:
Acked-by: Markus Armbruster 
---
 qapi/block-core.json |  22 +-
 block/qcow2.h|  20 +-
 include/block/block_int.h|   1 +
 block/qcow2.c| 113 +++
 tests/qemu-iotests/031.out   |  14 ++--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 ++--
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  34 ++
 tests/qemu-iotests/065   |  28 +---
 tests/qemu-iotests/080   |   2 +-
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/common.filter |   3 +-
 16 files changed, 267 insertions(+), 96 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 943df1926a..1522e2983f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -78,6 +78,8 @@
 #
 # @bitmaps: A list of qcow2 bitmap details (since 4.0)
 #
+# @compression-type: the image cluster compression method (since 5.1)
+#
 # Since: 1.7
 ##
 { 'struct': 'ImageInfoSpecificQCow2',
@@ -89,7 +91,8 @@
   '*corrupt': 'bool',
   'refcount-bits': 'int',
   '*encrypt': 'ImageInfoSpecificQCow2Encryption',
-  '*bitmaps': ['Qcow2BitmapInfo']
+  '*bitmaps': ['Qcow2BitmapInfo'],
+  'compression-type': 'Qcow2CompressionType'
   } }
 
 ##
@@ -4284,6 +4287,18 @@
   'data': [ 'v2', 'v3' ] }
 
 
+##
+# @Qcow2CompressionType:
+#
+# Compression type used in qcow2 image file
+#
+# @zlib: zlib compression, see 
+#
+# Since: 5.1
+##
+{ 'enum': 'Qcow2CompressionType',
+  'data': [ 'zlib' ] }
+
 ##
 # @BlockdevCreateOptionsQcow2:
 #
@@ -4307,6 +4322,8 @@
 # allowed values: off, falloc, full, metadata)
 # @lazy-refcounts: True if refcounts may be updated lazily (default: off)
 # @refcount-bits: Width of reference counts in bits (default: 16)
+# @compression-type: The image cluster compression method
+#(default: zlib, since 5.1)
 #
 # Since: 2.12
 ##
@@ -4322,7 +4339,8 @@
 '*cluster-size':'size',
 '*preallocation':   'PreallocMode',
 '*lazy-refcounts':  'bool',
-'*refcount-bits':   'int' } }
+'*refcount-bits':   'int',
+'*compression-type':'Qcow2CompressionType' } }
 
 ##
 # @BlockdevCreateOptionsQed:
diff --git a/block/qcow2.h b/block/qcow2.h
index f4de0a27d5..6a8b82e6cc 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -146,8 +146,16 @@ typedef struct QCowHeader {
 
 uint32_t refcount_order;
 uint32_t header_length;
+
+/* Additional fields */
+uint8_t compression_type;
+
+/* header must be a multiple of 8 */
+uint8_t padding[7];
 } QEMU_PACKED QCowHeader;
 
+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(sizeof(QCowHeader), 8));
+
 typedef struct QEMU_PACKED QCowSnapshotHeader {
 /* header is 8 byte aligned */
 uint64_t l1_table_offset;
@@ -216,13 +224,16 @@ enum {
 QCOW2_INCOMPAT_DIRTY_BITNR  = 0,
 QCOW2_INCOMPAT_CORRUPT_BITNR= 1,
 QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
 QCOW2_INCOMPAT_DIRTY= 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_CORRUPT  = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
 QCOW2_INCOMPAT_DATA_F

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428132629.796753-1-mre...@redhat.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  SIGNpc-bios/optionrom/kvmvapic.bin
  BUILD   pc-bios/optionrom/pvh.raw
  SIGNpc-bios/optionrom/pvh.bin
/tmp/qemu-test/src/qemu-img.c:1071:27: error: implicit declaration of function 
'blk_new_with_bs' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
  ^
/tmp/qemu-test/src/qemu-img.c:1071:27: error: this function declaration is not 
a prototype [-Werror,-Wstrict-prototypes]
/tmp/qemu-test/src/qemu-img.c:1071:25: error: incompatible integer to pointer 
conversion assigning to 'BlockBackend *' (aka 'struct BlockBackend *') from 
'int' [-Werror,-Wint-conversion]
old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
^ ~
3 errors generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: qemu-img.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=1896d2b1f1044091b832be313d66ac6e', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 
'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 
'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', 
'-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-fql9ti5h/src/docker-src.2020-04-28-09.34.00.5332:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=1896d2b1f1044091b832be313d66ac6e
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-fql9ti5h/src'
make: *** [docker-run-test-debug@fedora] Error 2

real4m44.194s
user0m8.084s


The full log is available at
http://patchew.org/logs/20200428132629.796753-1-mre...@redhat.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [RFC PATCH v1 20/26] kvm: vmi: intercept live migration

2020-04-28 Thread Dr. David Alan Gilbert

* Adalbert Lazăr (ala...@bitdefender.com) wrote:
> On Tue, 28 Apr 2020 13:24:39 +0100, "Dr. David Alan Gilbert" 
>  wrote:
> > * Adalbert Lazăr (ala...@bitdefender.com) wrote:
> > > On Mon, 27 Apr 2020 20:08:55 +0100, "Dr. David Alan Gilbert" 
> > >  wrote:
> > > > * Adalbert Lazăr (ala...@bitdefender.com) wrote:
> > > > > From: Marian Rotariu 
> > > > > 
> > > > > It is possible that the introspection tool has made some changes 
> > > > > inside
> > > > > the introspected VM which can make the guest crash if the 
> > > > > introspection
> > > > > connection is suddenly closed.
> > > > > 
> > > > > When the live migration starts, for now, the introspection tool is
> > > > > signaled to remove its hooks from the introspected VM.
> > > > > 
> > > > > CC: Juan Quintela 
> > > > > CC: "Dr. David Alan Gilbert" 
> > > > > Signed-off-by: Marian Rotariu 
> > > > > Signed-off-by: Adalbert Lazăr 
> > > > 
> > > > OK, so this isn't too intrusive to the migration code; and other than
> > > > renaming 'start_live_migration_thread' to
> > > > 'start_outgoing_migration_thread' I think I'd be OK with this,
> > > > 
> > > > but it might depend what your overall aim is.
> > > > 
> > > > For example, you might be better intercepting each migration_state
> > > > change in your notifier, that's much finer grain than just the start of
> > > > migration.
> > > 
> > > Thank you, Dave.
> > > 
> > > We want to intercept the live migration and 'block' it while the guest
> > > is running (some changes made to the guest by the introspection app has
> > > to be undone while the vCPUs are in certain states).
> > > 
> > > I'm not sure what is the best way to block these kind of events
> > > (including the pause/shutdown commands). If calling main_loop_wait()
> > > is enough (patch [22/26] kvm: vmi: add 'async_unhook' property [1])
> > > then we can drop a lot of code.
> > > 
> > > The use of a notifier will be nice, but from what I understand, we can't
> > > block the migration from a notification callback.
> > 
> > Oh, if your intention is *just* to block a migration starting then you
> > can use 'migrate_add_blocker' - see hw/9pfs/9p.c for an example where
> > it's used and then removed; they use it to stop migration while the fs
> >  is mounted.  That causes an attempt to start a migration to give an
> > error (of your choosing).
> 
> One use case is to do VM introspection all the time the guest is running.
> From the user perspective, the pause/suspend/shutdown/snapshot/migrate
> commands should work regardless if the VM is currently introspected
> or not. Our first option was to delay these commands for a couple of
> seconds when the VM is introspected, while the introspection app reverts
> its changes, without blocking the vCPUs.

Ah OK, so it's not really about blocking it completely; just delaying it
a bit; in that case add_blocker is the wrong thing.

> I'll see if we can mix the migrate notifier with migrate_add_blocker(),
> or add a new migration state. To block the migration (with an error)
> is our second option, because the user doing this might not be allowed
> to stop the VM introspection.

Maybe the right thing is to do something just like
MIGRATION_STATUS_WAIT_UNPLUG, it's right near the start of the thread.
Again it's job is just to make the migration wait while it does some
stuff before it can let migration continue.

Dave

> Thank you,
> Adalbert
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428132629.796753-1-mre...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  BUILD   pc-bios/optionrom/pvh.raw
  SIGNpc-bios/optionrom/pvh.bin
/tmp/qemu-test/src/qemu-img.c: In function 'img_commit':
/tmp/qemu-test/src/qemu-img.c:1071:9: error: implicit declaration of function 
'blk_new_with_bs' [-Werror=implicit-function-declaration]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
 ^
/tmp/qemu-test/src/qemu-img.c:1071:9: error: nested extern declaration of 
'blk_new_with_bs' [-Werror=nested-externs]
/tmp/qemu-test/src/qemu-img.c:1071:25: error: assignment makes pointer from 
integer without a cast [-Werror]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
 ^
cc1: all warnings being treated as errors
make: *** [qemu-img.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=2d32da85331c4d51b4632262369586d1', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-mtvq5xk5/src/docker-src.2020-04-28-09.40.27.12753:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=2d32da85331c4d51b4632262369586d1
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-mtvq5xk5/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m58.028s
user0m8.393s


The full log is available at
http://patchew.org/logs/20200428132629.796753-1-mre...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread Eric Blake


On 4/28/20 8:26 AM, Max Reitz wrote:

Branch: https://github.com/XanClic/qemu.git fix-bdrv_make_empty-v1
Branch: https://git.xanclic.moe/XanClic/qemu.git fix-bdrv_make_empty-v1

Hi,

Right now, there is no centralized bdrv_make_empty() function.  Not only
is it bad style to call BlockDriver methods directly, it is also wrong,
unless the caller has a BdrvChild with BLK_PERM_WRITE taken.


I'm also in the middle of writing a patch series that adds a 
corresponding .bdrv_make_empty driver callback.  I'll rebase that work 
on top of this, as part of my efforts at fixing more code to rely on 
bdrv_make_empty rather than directly querying bdrv_has_zero_init[_truncate].




This series fixes that.

Note that as far as I’m aware this series shouldn’t visibly fix anything
at this point; but “block: Introduce real BdrvChildRole”
(https://lists.nongnu.org/archive/html/qemu-block/2020-02/msg00737.html)
makes the iotest break when run with -o data_file=$SOMETHING, without
this series applied beforehand.  (That is because without that series,
external data files are treated much like metadata children, so the
format driver always takes the WRITE permission if the file is writable;
but after that series, it only does so when it itself has a parent
requestion the WRITE permission.)



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 15:53 hat Eric Blake geschrieben:
> On 4/28/20 8:26 AM, Max Reitz wrote:
> > Right now, all users of bdrv_make_empty() call the BlockDriver method
> > directly.  That is not only bad style, it is also wrong, unless the
> > caller has a BdrvChild with a WRITE permission.
> > 
> > Introduce bdrv_make_empty() that verifies that it does.
> > 
> > Signed-off-by: Max Reitz 
> > ---
> >   include/block/block.h |  1 +
> >   block.c   | 23 +++
> >   2 files changed, 24 insertions(+)
> > 
> > diff --git a/include/block/block.h b/include/block/block.h
> > index b05995fe9c..d947fb4080 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, 
> > QemuOpts *opts,
> >   void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
> >   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
> >   int bdrv_commit(BlockDriverState *bs);
> > +int bdrv_make_empty(BdrvChild *c, Error **errp);
> 
> Can we please fix this to take a flags parameter?  I want to make it easier
> for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between
> callers where the image must be made empty (read as all zeroes) regardless
> of time spent, vs. made empty quickly (including if it is already all zero)
> but where the caller is prepared for the operation to fail and will write
> zeroes itself if fast bulk zeroing was not possible.

bdrv_make_empty() is not for making an image read as all zeroes, but to
make it fully unallocated so that the backing file becomes visible.

Are you confusing it with bdrv_make_zero(), which is just a wrapper
around bdrv_pwrite_zeroes() and does take flags?

Kevin

Re: [PATCH 2/4] block: Use bdrv_make_empty() where possible

2020-04-28 Thread Eric Blake


On 4/28/20 8:26 AM, Max Reitz wrote:

Signed-off-by: Max Reitz 
---
  block/replication.c | 6 ++
  block/vvfat.c   | 4 +---
  2 files changed, 3 insertions(+), 7 deletions(-)



Yes, definitely nicer :)  May have some obvious fallout to add a 0 flag 
parameter, per my request on 1/4, but that doesn't stop me from giving:


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Eric Blake


On 4/28/20 8:26 AM, Max Reitz wrote:

Right now, all users of bdrv_make_empty() call the BlockDriver method
directly.  That is not only bad style, it is also wrong, unless the
caller has a BdrvChild with a WRITE permission.

Introduce bdrv_make_empty() that verifies that it does.

Signed-off-by: Max Reitz 
---
  include/block/block.h |  1 +
  block.c   | 23 +++
  2 files changed, 24 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index b05995fe9c..d947fb4080 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts 
*opts,
  void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
  void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
  int bdrv_commit(BlockDriverState *bs);
+int bdrv_make_empty(BdrvChild *c, Error **errp);


Can we please fix this to take a flags parameter?  I want to make it 
easier for callers to request BDRV_REQ_NO_FALLBACK for distinguishing 
between callers where the image must be made empty (read as all zeroes) 
regardless of time spent, vs. made empty quickly (including if it is 
already all zero) but where the caller is prepared for the operation to 
fail and will write zeroes itself if fast bulk zeroing was not possible.




+int bdrv_make_empty(BdrvChild *c, Error **errp)
+{
+BlockDriver *drv = c->bs->drv;
+int ret;
+
+assert(c->perm & BLK_PERM_WRITE);
+
+if (!drv->bdrv_make_empty) {
+error_setg(errp, "%s does not support emptying nodes",
+   drv->format_name);
+return -ENOTSUP;
+}


And here's where we can add some automatic fallbacks, such as 
recognizing if the image already reads as all zeroes.  But those 
optimizations can come in separate patches; for YOUR patch, just getting 
the proper API in place is fine.



+
+ret = drv->bdrv_make_empty(c->bs);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to empty %s",
+ c->bs->filename);
+return ret;
+}
+
+return 0;
+}



Other than a missing flag parameter, this looks fine.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428132629.796753-1-mre...@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  aarch64-softmmu/exec-vary.o
  CC  aarch64-softmmu/tcg/tcg.o
/tmp/qemu-test/src/qemu-img.c: In function 'img_commit':
/tmp/qemu-test/src/qemu-img.c:1071:27: error: implicit declaration of function 
'blk_new_with_bs'; did you mean 'blk_get_stats'? 
[-Werror=implicit-function-declaration]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
   ^~~
   blk_get_stats
/tmp/qemu-test/src/qemu-img.c:1071:27: error: nested extern declaration of 
'blk_new_with_bs' [-Werror=nested-externs]
/tmp/qemu-test/src/qemu-img.c:1071:25: error: assignment to 'BlockBackend *' 
{aka 'struct BlockBackend *'} from 'int' makes pointer from integer without a 
cast [-Werror=int-conversion]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
 ^
cc1: all warnings being treated as errors
make: *** [/tmp/qemu-test/src/rules.mak:69: qemu-img.o] Error 1
make: *** Waiting for unfinished jobs
  CC  aarch64-softmmu/tcg/tcg-op.o
  CC  aarch64-softmmu/tcg/tcg-op-vec.o
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=7e33797e28514c48a1cce7021d8b04fd', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-dh8ik377/src/docker-src.2020-04-28-09.44.17.22581:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=7e33797e28514c48a1cce7021d8b04fd
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-dh8ik377/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real4m4.967s
user0m8.349s


The full log is available at
http://patchew.org/logs/20200428132629.796753-1-mre...@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Eric Blake


On 4/28/20 9:01 AM, Kevin Wolf wrote:


Can we please fix this to take a flags parameter?  I want to make it easier
for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between
callers where the image must be made empty (read as all zeroes) regardless
of time spent, vs. made empty quickly (including if it is already all zero)
but where the caller is prepared for the operation to fail and will write
zeroes itself if fast bulk zeroing was not possible.


bdrv_make_empty() is not for making an image read as all zeroes, but to
make it fully unallocated so that the backing file becomes visible.

Are you confusing it with bdrv_make_zero(), which is just a wrapper
around bdrv_pwrite_zeroes() and does take flags?


Yes.  Although now I'm wondering if the two should remain separate or 
should just be a single driver callback where flags can include 
BDRV_REQ_ZERO_WRITE to distinguish whether exposing the backing file vs. 
reading as all zeroes is intended, or if that is merging too much.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread Eric Blake


On 4/28/20 8:49 AM, Eric Blake wrote:

On 4/28/20 8:26 AM, Max Reitz wrote:

Branch: https://github.com/XanClic/qemu.git fix-bdrv_make_empty-v1
Branch: https://git.xanclic.moe/XanClic/qemu.git fix-bdrv_make_empty-v1

Hi,

Right now, there is no centralized bdrv_make_empty() function.  Not only
is it bad style to call BlockDriver methods directly, it is also wrong,
unless the caller has a BdrvChild with BLK_PERM_WRITE taken.


I'm also in the middle of writing a patch series that adds a 
corresponding .bdrv_make_empty driver callback.  I'll rebase that work 
on top of this, as part of my efforts at fixing more code to rely on 
bdrv_make_empty rather than directly querying 
bdrv_has_zero_init[_truncate].


Correction - I'm working on adding .bdrv_make_zero, not .bdrv_make empty 
(which already exists), although maybe it really only needs to be one 
callback instead of two if we have decent flags.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

1 2 3 4 >

1 - 100 of 327 matches

Mail list logo