date:20120614

Re: [Qemu-devel] [PATCH] ahci: add -drive support

2012-06-14 Thread Markus Armbruster

Alexander Graf  writes:

> We've had support for creating AHCI devices using -device for a while now,
> but it's cumbersome to users. We really should provide an easier way for
> them to leverage the power of AHCI!
>
> So let's introduce a new if= option to -drive, bumping it en par with virtio.

If I understand your patch correctly, this makes if=ahci work like
if=virtio and unlike if=ide:

* if=virtio: configure a new PCI device.

* if=ide: instruct the board to add an IDE device to its IDE controller.

For -M pc, the board's IDE controller happens to be piix3-ide.

For -M q35, I'd expect the board's IDE controller to be an ich9-ahci.

Once we switch to q35, if=ahci will become a redundant wart: to add
drives to the existing AHCI controller, you'll have to use if=ide.
if=ahci will create a new controller, which is generally not what you
want.  Ugh.

A few questions inline.

> Signed-off-by: Alexander Graf 
> ---
>  blockdev.c  |   25 +++--
>  blockdev.h  |1 +
>  qemu-options.hx |7 ++-
>  3 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index 622ecba..5405f6c 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -32,6 +32,7 @@ static const char *const if_name[IF_COUNT] = {
>  [IF_SD] = "sd",
>  [IF_VIRTIO] = "virtio",
>  [IF_XEN] = "xen",
> +[IF_AHCI] = "ahci",
>  };
>  
>  static const int if_max_devs[IF_COUNT] = {
> @@ -519,7 +520,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>  } else {
>  /* no id supplied -> create one */
>  dinfo->id = g_malloc0(32);
> -if (type == IF_IDE || type == IF_SCSI)
> +if (type == IF_IDE || type == IF_SCSI || type == IF_AHCI)
>  mediastr = (media == MEDIA_CDROM) ? "-cd" : "-hd";
>  if (max_devs)
>  snprintf(dinfo->id, 32, "%s%i%s%i",
> @@ -549,6 +550,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>  case IF_IDE:
>  case IF_SCSI:
>  case IF_XEN:
> +case IF_AHCI:
>  case IF_NONE:
>  switch(media) {
>   case MEDIA_DISK:
> @@ -582,6 +584,25 @@ DriveInfo *drive_init(QemuOpts *opts, int 
> default_to_scsi)
>  default:
>  abort();
>  }
> +
> +if (type == IF_AHCI) {
> +static int ahci_bus = 0;
> +char devname[] = "ahciXXX";
> +char busname[] = "ahciXXX.0";
> +snprintf(devname, sizeof(devname), "ahci%d", ahci_bus);
> +snprintf(busname, sizeof(busname), "ahci%d.0", ahci_bus++);
> +
> +/* add ahci host controller */
> +opts = qemu_opts_create(qemu_find_opts("device"), devname, 0, NULL);
> +qemu_opt_set(opts, "driver", "ich9-ahci");
> +
> +/* and attach a single ata disk to its bus */
> +opts = qemu_opts_create(qemu_find_opts("device"), NULL, 0, NULL);
> +qemu_opt_set(opts, "driver", "ide-drive");
> +qemu_opt_set(opts, "bus", busname);
> +qemu_opt_set(opts, "drive", dinfo->id);
> +}
> +

Doesn't this create a new ich9-ahci controller per -drive?

If yes, it's problematic in practice, as you'll run out of PCI slots
pretty darn fast.  That problem made us replace virtio-blk by
virtio-scsi.  Let's not re-create it.

IF_VIRTIO device option creation is done in the preceeding switch.  You
don't do that for IF_AHCI because you already do something else there:
handling the media option.  I think one switch for media and a separate
one for device options would be clearer.

>  if (!file || !*file) {
>  return dinfo;
>  }
> @@ -604,7 +625,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>  ro = 1;
>  } else if (ro == 1) {
>  if (type != IF_SCSI && type != IF_VIRTIO && type != IF_FLOPPY &&
> -type != IF_NONE && type != IF_PFLASH) {
> +type != IF_NONE && type != IF_PFLASH && type != IF_AHCI) {
>  error_report("readonly not supported by this bus type");
>  goto err;
>  }

Are you sure AHCI can handle read-only?

[...]
> diff --git a/blockdev.h b/blockdev.h
> index 260e16b..e14c1d5 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -23,6 +23,7 @@ typedef enum {
>  IF_DEFAULT = -1,/* for use with drive_add() only */
>  IF_NONE,
>  IF_IDE, IF_SCSI, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, IF_XEN,
> +IF_AHCI,
>  IF_COUNT
>  } BlockInterfaceType;
>  
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 8b66264..9527c51 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -160,7 +160,7 @@ Special files such as iSCSI devices can be specified 
> using protocol
>  specific URLs. See the section for "Device URL Syntax" for more information.
>  @item if=@var{interface}
>  This option defines on which type on interface the drive is connected.
> -Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
> +Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio, ahci.
>  @item bus=@var{bus},

[Qemu-devel] [PATCH] qapi: converted commit

2012-06-14 Thread Pavel Hrdina


Signed-off-by: Pavel Hrdina 
---
 blockdev.c   |9 -
 blockdev.h   |1 -
 hmp-commands.hx  |2 +-
 hmp.c|9 +
 hmp.h|1 +
 qapi-schema.json |   15 +++
 qmp-commands.hx  |   24 
 7 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 622ecba..d78af31 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -631,27 +631,26 @@ err:
 return NULL;
 }
 
-void do_commit(Monitor *mon, const QDict *qdict)
+void qmp_commit(const char *device, Error **errp)
 {
-const char *device = qdict_get_str(qdict, "device");
 BlockDriverState *bs;
 int ret;
 
 if (!strcmp(device, "all")) {
 ret = bdrv_commit_all();
 if (ret == -EBUSY) {
-qerror_report(QERR_DEVICE_IN_USE, device);
+error_set(errp, QERR_DEVICE_IN_USE, device);
 return;
 }
 } else {
 bs = bdrv_find(device);
 if (!bs) {
-qerror_report(QERR_DEVICE_NOT_FOUND, device);
+error_set(errp, QERR_DEVICE_NOT_FOUND, device);
 return;
 }
 ret = bdrv_commit(bs);
 if (ret == -EBUSY) {
-qerror_report(QERR_DEVICE_IN_USE, device);
+error_set(errp, QERR_DEVICE_IN_USE, device);
 return;
 }
 }
diff --git a/blockdev.h b/blockdev.h
index 260e16b..c2e52bb 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -60,6 +60,5 @@ DriveInfo *add_init_drive(const char *opts);
 
 void qmp_change_blockdev(const char *device, const char *filename,
  bool has_format, const char *format, Error **errp);
-void do_commit(Monitor *mon, const QDict *qdict);
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index f5d9d91..53d2c34 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -28,7 +28,7 @@ ETEXI
 .args_type  = "device:B",
 .params = "device|all",
 .help   = "commit changes to the disk images (if -snapshot is 
used) or backing files",
-.mhandler.cmd = do_commit,
+.mhandler.cmd = hmp_commit,
 },
 
 STEXI
diff --git a/hmp.c b/hmp.c
index 2ce8cb9..176defa 100644
--- a/hmp.c
+++ b/hmp.c
@@ -999,3 +999,12 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict)
 qmp_netdev_del(id, &err);
 hmp_handle_error(mon, &err);
 }
+
+void hmp_commit(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "device");
+Error *err = NULL;
+
+qmp_commit(device, &err);
+hmp_handle_error(mon, &err);
+}
diff --git a/hmp.h b/hmp.h
index 79d138d..99d57c0 100644
--- a/hmp.h
+++ b/hmp.h
@@ -64,5 +64,6 @@ void hmp_device_del(Monitor *mon, const QDict *qdict);
 void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
 void hmp_netdev_add(Monitor *mon, const QDict *qdict);
 void hmp_netdev_del(Monitor *mon, const QDict *qdict);
+void hmp_commit(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 3b6e346..10cb22b 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1169,6 +1169,21 @@
 { 'command': 'block_resize', 'data': { 'device': 'str', 'size': 'int' }}
 
 ##
+# @commit
+#
+# Commit changes to the disk images (if -snapshot is used) or backing files.
+#
+# @device: the name of the device or the "all" to commit all devices
+#
+# Returns: nothing on success
+#  If @device is not a valid block device, DeviceNotFound
+#  If a long-running operation is using the device, DeviceInUse
+#
+# Since: 1.1
+##
+{ 'command': 'commit', 'data': { 'device': 'str' }}
+
+##
 # @NewImageMode
 #
 # An enumeration that tells QEMU how to set the backing file path in
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 2e1a38e..8b6bfbc 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -714,6 +714,30 @@ Example:
 -> { "execute": "block_resize", "arguments": { "device": "scratch", "size": 
1073741824 } }
 <- { "return": {} }
 
+
+EQMP
+
+{
+.name   = "commit",
+.args_type  = "device:B",
+.mhandler.cmd_new = qmp_marshal_input_commit,
+},
+
+SQMP
+commit
+--
+
+Commit changes to the disk images (if -snapshot is used) or backing files.
+
+Arguments:
+
+- "device": the device's ID, or all (json-string)
+
+Example:
+
+-> { "execute": "commit", "arguments": { "device": "all" } }
+<- { "return": {} }
+
 EQMP
 
 {
-- 
1.7.7.6

Re: [Qemu-devel] [PATCH] ahci: add -drive support

2012-06-14 Thread Alexander Graf


On 14.06.2012, at 09:29, Markus Armbruster wrote:

> Alexander Graf  writes:
> 
>> We've had support for creating AHCI devices using -device for a while now,
>> but it's cumbersome to users. We really should provide an easier way for
>> them to leverage the power of AHCI!
>> 
>> So let's introduce a new if= option to -drive, bumping it en par with virtio.
> 
> If I understand your patch correctly, this makes if=ahci work like
> if=virtio and unlike if=ide:
> 
> * if=virtio: configure a new PCI device.
> 
> * if=ide: instruct the board to add an IDE device to its IDE controller.
> 
> For -M pc, the board's IDE controller happens to be piix3-ide.
> 
> For -M q35, I'd expect the board's IDE controller to be an ich9-ahci.
> 
> Once we switch to q35, if=ahci will become a redundant wart: to add
> drives to the existing AHCI controller, you'll have to use if=ide.
> if=ahci will create a new controller, which is generally not what you
> want.  Ugh.

Yeah, I couldn't come up with anything else that's not completely ugly like the 
IF_SCSI implementation.

> 
> A few questions inline.
> 
>> Signed-off-by: Alexander Graf 
>> ---
>> blockdev.c  |   25 +++--
>> blockdev.h  |1 +
>> qemu-options.hx |7 ++-
>> 3 files changed, 30 insertions(+), 3 deletions(-)
>> 
>> diff --git a/blockdev.c b/blockdev.c
>> index 622ecba..5405f6c 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -32,6 +32,7 @@ static const char *const if_name[IF_COUNT] = {
>> [IF_SD] = "sd",
>> [IF_VIRTIO] = "virtio",
>> [IF_XEN] = "xen",
>> +[IF_AHCI] = "ahci",
>> };
>> 
>> static const int if_max_devs[IF_COUNT] = {
>> @@ -519,7 +520,7 @@ DriveInfo *drive_init(QemuOpts *opts, int 
>> default_to_scsi)
>> } else {
>> /* no id supplied -> create one */
>> dinfo->id = g_malloc0(32);
>> -if (type == IF_IDE || type == IF_SCSI)
>> +if (type == IF_IDE || type == IF_SCSI || type == IF_AHCI)
>> mediastr = (media == MEDIA_CDROM) ? "-cd" : "-hd";
>> if (max_devs)
>> snprintf(dinfo->id, 32, "%s%i%s%i",
>> @@ -549,6 +550,7 @@ DriveInfo *drive_init(QemuOpts *opts, int 
>> default_to_scsi)
>> case IF_IDE:
>> case IF_SCSI:
>> case IF_XEN:
>> +case IF_AHCI:
>> case IF_NONE:
>> switch(media) {
>>  case MEDIA_DISK:
>> @@ -582,6 +584,25 @@ DriveInfo *drive_init(QemuOpts *opts, int 
>> default_to_scsi)
>> default:
>> abort();
>> }
>> +
>> +if (type == IF_AHCI) {
>> +static int ahci_bus = 0;
>> +char devname[] = "ahciXXX";
>> +char busname[] = "ahciXXX.0";
>> +snprintf(devname, sizeof(devname), "ahci%d", ahci_bus);
>> +snprintf(busname, sizeof(busname), "ahci%d.0", ahci_bus++);
>> +
>> +/* add ahci host controller */
>> +opts = qemu_opts_create(qemu_find_opts("device"), devname, 0, NULL);
>> +qemu_opt_set(opts, "driver", "ich9-ahci");
>> +
>> +/* and attach a single ata disk to its bus */
>> +opts = qemu_opts_create(qemu_find_opts("device"), NULL, 0, NULL);
>> +qemu_opt_set(opts, "driver", "ide-drive");
>> +qemu_opt_set(opts, "bus", busname);
>> +qemu_opt_set(opts, "drive", dinfo->id);
>> +}
>> +
> 
> Doesn't this create a new ich9-ahci controller per -drive?
> 
> If yes, it's problematic in practice, as you'll run out of PCI slots
> pretty darn fast.  That problem made us replace virtio-blk by
> virtio-scsi.  Let's not re-create it.

Hrm. If you have a great idea on how to implement it, I'm all open for it. 
Talking about it from a high level perspective I had the same feelings at 
first. Looking at how to implement index= and bus= for real, I quickly withdrew 
myself from the approach.

The good news is that the limitation here is only a -drive if=ahci limitation. 
It does not apply to -device. There you can still plug up to 6 ahci devices 
onto a single HBA.

> IF_VIRTIO device option creation is done in the preceeding switch.  You
> don't do that for IF_AHCI because you already do something else there:
> handling the media option.  I think one switch for media and a separate
> one for device options would be clearer.

We can do that, yeah :).

> 
>> if (!file || !*file) {
>> return dinfo;
>> }
>> @@ -604,7 +625,7 @@ DriveInfo *drive_init(QemuOpts *opts, int 
>> default_to_scsi)
>> ro = 1;
>> } else if (ro == 1) {
>> if (type != IF_SCSI && type != IF_VIRTIO && type != IF_FLOPPY &&
>> -type != IF_NONE && type != IF_PFLASH) {
>> +type != IF_NONE && type != IF_PFLASH && type != IF_AHCI) {
>> error_report("readonly not supported by this bus type");
>> goto err;
>> }
> 
> Are you sure AHCI can handle read-only?

Isn't read-only handled in generic ATA code?

> 
> [...]
>> diff --git a/blockdev.h b/blockdev.h
>> index 260e16b..e14c1d5 100644
>> --- a/blockdev.h
>> +++ b/blockdev.h
>> @@ -23,6 +23,7 @@ typedef e

Re: [Qemu-devel] [PATCH] ahci: add -drive support

2012-06-14 Thread Kevin Wolf

Am 14.06.2012 09:34, schrieb Alexander Graf:
> 
> On 14.06.2012, at 09:29, Markus Armbruster wrote:
> 
>> Alexander Graf  writes:
>>
>>> We've had support for creating AHCI devices using -device for a while now,
>>> but it's cumbersome to users. We really should provide an easier way for
>>> them to leverage the power of AHCI!
>>>
>>> So let's introduce a new if= option to -drive, bumping it en par with 
>>> virtio.
>>
>> If I understand your patch correctly, this makes if=ahci work like
>> if=virtio and unlike if=ide:
>>
>> * if=virtio: configure a new PCI device.
>>
>> * if=ide: instruct the board to add an IDE device to its IDE controller.
>>
>> For -M pc, the board's IDE controller happens to be piix3-ide.
>>
>> For -M q35, I'd expect the board's IDE controller to be an ich9-ahci.
>>
>> Once we switch to q35, if=ahci will become a redundant wart: to add
>> drives to the existing AHCI controller, you'll have to use if=ide.
>> if=ahci will create a new controller, which is generally not what you
>> want.  Ugh.
> 
> Yeah, I couldn't come up with anything else that's not completely ugly like 
> the IF_SCSI implementation.
> 
>>
>> A few questions inline.
>>
>>> Signed-off-by: Alexander Graf 
>>> ---
>>> blockdev.c  |   25 +++--
>>> blockdev.h  |1 +
>>> qemu-options.hx |7 ++-
>>> 3 files changed, 30 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/blockdev.c b/blockdev.c
>>> index 622ecba..5405f6c 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -32,6 +32,7 @@ static const char *const if_name[IF_COUNT] = {
>>> [IF_SD] = "sd",
>>> [IF_VIRTIO] = "virtio",
>>> [IF_XEN] = "xen",
>>> +[IF_AHCI] = "ahci",
>>> };
>>>
>>> static const int if_max_devs[IF_COUNT] = {
>>> @@ -519,7 +520,7 @@ DriveInfo *drive_init(QemuOpts *opts, int 
>>> default_to_scsi)
>>> } else {
>>> /* no id supplied -> create one */
>>> dinfo->id = g_malloc0(32);
>>> -if (type == IF_IDE || type == IF_SCSI)
>>> +if (type == IF_IDE || type == IF_SCSI || type == IF_AHCI)
>>> mediastr = (media == MEDIA_CDROM) ? "-cd" : "-hd";
>>> if (max_devs)
>>> snprintf(dinfo->id, 32, "%s%i%s%i",
>>> @@ -549,6 +550,7 @@ DriveInfo *drive_init(QemuOpts *opts, int 
>>> default_to_scsi)
>>> case IF_IDE:
>>> case IF_SCSI:
>>> case IF_XEN:
>>> +case IF_AHCI:
>>> case IF_NONE:
>>> switch(media) {
>>> case MEDIA_DISK:
>>> @@ -582,6 +584,25 @@ DriveInfo *drive_init(QemuOpts *opts, int 
>>> default_to_scsi)
>>> default:
>>> abort();
>>> }
>>> +
>>> +if (type == IF_AHCI) {
>>> +static int ahci_bus = 0;
>>> +char devname[] = "ahciXXX";
>>> +char busname[] = "ahciXXX.0";
>>> +snprintf(devname, sizeof(devname), "ahci%d", ahci_bus);
>>> +snprintf(busname, sizeof(busname), "ahci%d.0", ahci_bus++);
>>> +
>>> +/* add ahci host controller */
>>> +opts = qemu_opts_create(qemu_find_opts("device"), devname, 0, 
>>> NULL);
>>> +qemu_opt_set(opts, "driver", "ich9-ahci");
>>> +
>>> +/* and attach a single ata disk to its bus */
>>> +opts = qemu_opts_create(qemu_find_opts("device"), NULL, 0, NULL);
>>> +qemu_opt_set(opts, "driver", "ide-drive");
>>> +qemu_opt_set(opts, "bus", busname);
>>> +qemu_opt_set(opts, "drive", dinfo->id);
>>> +}
>>> +
>>
>> Doesn't this create a new ich9-ahci controller per -drive?
>>
>> If yes, it's problematic in practice, as you'll run out of PCI slots
>> pretty darn fast.  That problem made us replace virtio-blk by
>> virtio-scsi.  Let's not re-create it.
> 
> Hrm. If you have a great idea on how to implement it, I'm all open for it. 
> Talking about it from a high level perspective I had the same feelings at 
> first. Looking at how to implement index= and bus= for real, I quickly 
> withdrew myself from the approach.
> 
> The good news is that the limitation here is only a -drive if=ahci 
> limitation. It does not apply to -device. There you can still plug up to 6 
> ahci devices onto a single HBA.

That I get two controllers when I have two drivers is really not
something I would expect as a user. If we can, we should make it use up
any existing controllers. If we can't, maybe -device has to be good enough.

Kevin

Re: [Qemu-devel] [PATCH v2 5/6] msix: Allow full specification of MSIX layout

2012-06-14 Thread Michael S. Tsirkin

On Wed, Jun 13, 2012 at 10:51:56PM -0600, Alex Williamson wrote:
> Finally, complete the fully specified interface.  msix_add_config()
> gets moved to be closer to the setup functions where it's actually
> used.  msix_mmio_setup() gets folded into msix_init().  And
> msix_uninit() gets reworked a bit so we can call it as cleanup
> from msix_init().
> 
> Signed-off-by: Alex Williamson 

Some review comments seem to have been ignored: this patch -
uses hardcoded constants, still has duplicated parameters,
move functions in the same patch as adding functionality.
Do you intend to send another revision?

> ---
> 
>  hw/msix.c |  217 
> +++--
>  hw/msix.h |   10 ++-
>  2 files changed, 101 insertions(+), 126 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index d476d07..047646a 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -27,14 +27,6 @@
>  #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
>  #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
>  
> -/* How much space does an MSIX table need. */
> -/* The spec requires giving the table structure
> - * a 4K aligned region all by itself. */
> -#define MSIX_PAGE_SIZE 0x1000
> -/* Reserve second half of the page for pending bits */
> -#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
> -#define MSIX_MAX_ENTRIES 32
> -
>  static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
>  {
>  uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
> @@ -45,47 +37,6 @@ static MSIMessage msix_get_message(PCIDevice *dev, 
> unsigned vector)
>  return msg;
>  }
>  
> -/* Add MSI-X capability to the config space for the device. */
> -/* Given a bar and its size, add MSI-X table on top of it
> - * and fill MSI-X capability in the config space.
> - * Original bar size must be a power of 2 or 0.
> - * New bar size is returned. */
> -static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
> -   unsigned bar_nr, unsigned bar_size)
> -{
> -int config_offset;
> -uint8_t *config;
> -
> -if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1)
> -return -EINVAL;
> -if (bar_size > 0x8000)
> -return -ENOSPC;
> -
> -/* Require aligned offset for MSI-X structures */
> -if (bar_size & ~(MSIX_PAGE_SIZE - 1)) {
> -return -EINVAL;
> -}
> -
> -config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
> -   0, MSIX_CAP_LENGTH);
> -if (config_offset < 0)
> -return config_offset;
> -config = pdev->config + config_offset;
> -
> -pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
> -/* Table on top of BAR */
> -pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
> -/* Pending bits on top of that */
> -pci_set_long(config + PCI_MSIX_PBA, (bar_size + MSIX_PAGE_PENDING) |
> - bar_nr);
> -pdev->msix_cap = config_offset;
> -/* Make flags bit writable. */
> -pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
> - MSIX_MASKALL_MASK;
> -pdev->msix_function_masked = true;
> -return 0;
> -}
> -
>  static uint8_t msix_pending_mask(int vector)
>  {
>  return 1 << (vector % 8);
> @@ -240,20 +191,6 @@ static const MemoryRegionOps msix_pba_mmio_ops = {
>  },
>  };
>  
> -static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
> -{
> -uint8_t *config = d->config + d->msix_cap;
> -uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
> -uint32_t table_offset = table & ~PCI_MSIX_FLAGS_BIRMASK;
> -uint32_t pba = pci_get_long(config + PCI_MSIX_PBA);
> -uint32_t pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
> -/* TODO: for assigned devices, we'll want to make it possible to map
> - * pending bits separately in case they are in a separate bar. */
> -
> -memory_region_add_subregion(bar, table_offset, &d->msix_table_mmio);
> -memory_region_add_subregion(bar, pba_offset, &d->msix_pba_mmio);
> -}
> -
>  static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
>  {
>  int vector;
> @@ -268,11 +205,72 @@ static void msix_mask_all(struct PCIDevice *dev, 
> unsigned nentries)
>  }
>  }
>  
> -/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
> - * modified, it should be retrieved with msix_bar_size. */
> +/* Add MSI-X capability to the config space for the device. */
> +static int msix_add_config(struct PCIDevice *dev, unsigned short nentries,
> +   uint8_t table_bar, unsigned table_offset,
> +   uint8_t pba_bar, unsigned pba_offset, uint8_t pos)
> +{
> +int config_offset;
> +uint8_t *config;
> +
> +if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
> +return -EINVAL;
> +}
> +
> +config_offset = pci_add_capability(dev, PCI_CAP_ID_MSIX,
> +   pos, MSIX_CAP_LENGTH);
> +if

Re: [Qemu-devel] [PATCH v2 5/6] msix: Allow full specification of MSIX layout

2012-06-14 Thread Michael S. Tsirkin

On Thu, Jun 14, 2012 at 08:17:22AM +0200, Jan Kiszka wrote:
> On 2012-06-14 06:51, Alex Williamson wrote:
> > Finally, complete the fully specified interface.  msix_add_config()
> > gets moved to be closer to the setup functions where it's actually
> > used.  msix_mmio_setup() gets folded into msix_init().  And
> > msix_uninit() gets reworked a bit so we can call it as cleanup
> > from msix_init().
> > 
> > Signed-off-by: Alex Williamson 
> > ---
> > 
> >  hw/msix.c |  217 
> > +++--
> >  hw/msix.h |   10 ++-
> >  2 files changed, 101 insertions(+), 126 deletions(-)
> > 
> > diff --git a/hw/msix.c b/hw/msix.c
> > index d476d07..047646a 100644
> > --- a/hw/msix.c
> > +++ b/hw/msix.c
> > @@ -27,14 +27,6 @@
> >  #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
> >  #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
> >  
> > -/* How much space does an MSIX table need. */
> > -/* The spec requires giving the table structure
> > - * a 4K aligned region all by itself. */
> > -#define MSIX_PAGE_SIZE 0x1000
> > -/* Reserve second half of the page for pending bits */
> > -#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
> > -#define MSIX_MAX_ENTRIES 32
> > -
> >  static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
> >  {
> >  uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
> > @@ -45,47 +37,6 @@ static MSIMessage msix_get_message(PCIDevice *dev, 
> > unsigned vector)
> >  return msg;
> >  }
> >  
> > -/* Add MSI-X capability to the config space for the device. */
> > -/* Given a bar and its size, add MSI-X table on top of it
> > - * and fill MSI-X capability in the config space.
> > - * Original bar size must be a power of 2 or 0.
> > - * New bar size is returned. */
> > -static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
> > -   unsigned bar_nr, unsigned bar_size)
> > -{
> > -int config_offset;
> > -uint8_t *config;
> > -
> > -if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1)
> > -return -EINVAL;
> > -if (bar_size > 0x8000)
> > -return -ENOSPC;
> > -
> > -/* Require aligned offset for MSI-X structures */
> > -if (bar_size & ~(MSIX_PAGE_SIZE - 1)) {
> > -return -EINVAL;
> > -}
> > -
> > -config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
> > -   0, MSIX_CAP_LENGTH);
> > -if (config_offset < 0)
> > -return config_offset;
> > -config = pdev->config + config_offset;
> > -
> > -pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
> > -/* Table on top of BAR */
> > -pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
> > -/* Pending bits on top of that */
> > -pci_set_long(config + PCI_MSIX_PBA, (bar_size + MSIX_PAGE_PENDING) |
> > - bar_nr);
> > -pdev->msix_cap = config_offset;
> > -/* Make flags bit writable. */
> > -pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
> > -   MSIX_MASKALL_MASK;
> > -pdev->msix_function_masked = true;
> > -return 0;
> > -}
> > -
> >  static uint8_t msix_pending_mask(int vector)
> >  {
> >  return 1 << (vector % 8);
> > @@ -240,20 +191,6 @@ static const MemoryRegionOps msix_pba_mmio_ops = {
> >  },
> >  };
> >  
> > -static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
> > -{
> > -uint8_t *config = d->config + d->msix_cap;
> > -uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
> > -uint32_t table_offset = table & ~PCI_MSIX_FLAGS_BIRMASK;
> > -uint32_t pba = pci_get_long(config + PCI_MSIX_PBA);
> > -uint32_t pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
> > -/* TODO: for assigned devices, we'll want to make it possible to map
> > - * pending bits separately in case they are in a separate bar. */
> > -
> > -memory_region_add_subregion(bar, table_offset, &d->msix_table_mmio);
> > -memory_region_add_subregion(bar, pba_offset, &d->msix_pba_mmio);
> > -}
> > -
> >  static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
> >  {
> >  int vector;
> > @@ -268,11 +205,72 @@ static void msix_mask_all(struct PCIDevice *dev, 
> > unsigned nentries)
> >  }
> >  }
> >  
> > -/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size 
> > is
> > - * modified, it should be retrieved with msix_bar_size. */
> > +/* Add MSI-X capability to the config space for the device. */
> > +static int msix_add_config(struct PCIDevice *dev, unsigned short nentries,
> > +   uint8_t table_bar, unsigned table_offset,
> > +   uint8_t pba_bar, unsigned pba_offset, uint8_t 
> > pos)
> 
> Why not fold msix_add_config into msix_init and move sanity checks to
> the beginning, i.e. before resource allocations? Then you also do not
> need to call msix_uninit from msix_init. Likely a matter of taste, so
> not a must-have.

More im

Re: [Qemu-devel] [PATCH v2 6/6] msix: Fix last PCIDevice naming inconsitency

2012-06-14 Thread Michael S. Tsirkin

On Wed, Jun 13, 2012 at 10:52:06PM -0600, Alex Williamson wrote:
> The previous patches fixed almost all the inconsistent names used
> for PCIDevice in msix.c, fix the one remaining transgression.
> 
> Signed-off-by: Alex Williamson 

Except, would be better to first rename parameters and
do other mechanical changes, and then apply your diff
on top.

As it is the patch looks like it rewrites everything while
it could be a very small one.

> ---
> 
>  hw/msix.h |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/msix.h b/hw/msix.h
> index 14b1a2e..1786e27 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -11,8 +11,7 @@ int msix_init(PCIDevice *dev, unsigned short nentries,
>  int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
>  uint8_t bar_nr);
>  
> -void msix_write_config(PCIDevice *pci_dev, uint32_t address,
> -   uint32_t val, int len);
> +void msix_write_config(PCIDevice *dev, uint32_t address, uint32_t val, int 
> len);
>  
>  void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar,
>   MemoryRegion *pba_bar);

[Qemu-devel] Does kvm_fd mean KVM or VCPU fd?

2012-06-14 Thread Wei-Ren Chen

Hi all,

  While reading KVM releated code in QEMU, I found the name of one
field in CPU_COMMON (cpu-defs.h), i.e. kvm_fd, might be misleading.
See the code below,

---
int kvm_init_vcpu(CPUArchState *env)
{
ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, env->cpu_index);
env->kvm_fd = ret; /* VCPU fd? */
}
---

  I think KVM_CREATE_VCPU should return VCPU fd, right? AFAIK, in KVM
world, kvm_fd usually means the fd we get after opening "/dev/kvm".
Just want to make sure I understand the code correcly. Thanks.

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

Re: [Qemu-devel] IO performance test on the tcm-vhost scsi

2012-06-14 Thread Stefan Hajnoczi

On Wed, Jun 13, 2012 at 11:13 AM, mengcong  wrote:
>                    seq-read        seq-write       rand-read     rand-write
>                    8k     256k     8k     256k     8k   256k     8k   256k
> 
> bare-metal          67951  69802    67064  67075    1758 29284    1969 26360
> tcm-vhost-iblock    61501  66575    51775  67872    1011 22533    1851 28216
> tcm-vhost-pscsi     66479  68191    50873  67547    1008 22523    1818 28304
> virtio-blk          26284  66737    23373  65735    1724 28962    1805 27774
> scsi-disk           36013  60289    46222  62527    1663 12992    1804 27670
>
> unit: KB/s
> seq-read/write = sequential read/write
> rand-read/write = random read/write
> 8k,256k are blocksize of the IO

What strikes me is how virtio-blk performs significantly worse than
bare metal and tcm_vhost for seq-read/seq-write 8k.  The good
tcm_vhost results suggest that the overhead is not the virtio
interface itself, since tcm_vhost implements virtio-scsi.

To drill down on the tcm_vhost vs userspace performance gap we need
virtio-scsi userspace results.  QEMU needs to use the same block
device as the tcm-vhost-iblock benchmark.

Cong: Is it possible to collect the virtio-scsi userspace results
using the same block device as tcm-vhost-iblock and -drive
format=raw,aio=native,cache=none?

Stefan

Re: [Qemu-devel] [PATCH] ahci: add -drive support

2012-06-14 Thread Markus Armbruster

Alexander Graf  writes:

> On 14.06.2012, at 09:29, Markus Armbruster wrote:
>
>> Alexander Graf  writes:
>> 
>>> We've had support for creating AHCI devices using -device for a while now,
>>> but it's cumbersome to users. We really should provide an easier way for
>>> them to leverage the power of AHCI!
>>> 
>>> So let's introduce a new if= option to -drive, bumping it en par with 
>>> virtio.
>> 
>> If I understand your patch correctly, this makes if=ahci work like
>> if=virtio and unlike if=ide:
>> 
>> * if=virtio: configure a new PCI device.
>> 
>> * if=ide: instruct the board to add an IDE device to its IDE controller.
>> 
>> For -M pc, the board's IDE controller happens to be piix3-ide.
>> 
>> For -M q35, I'd expect the board's IDE controller to be an ich9-ahci.
>> 
>> Once we switch to q35, if=ahci will become a redundant wart: to add
>> drives to the existing AHCI controller, you'll have to use if=ide.
>> if=ahci will create a new controller, which is generally not what you
>> want.  Ugh.
>
> Yeah, I couldn't come up with anything else that's not completely ugly like 
> the IF_SCSI implementation.

Here's a non-ugly solution: finish the q35 job, and if=ide just works :)

Except for old VMs that still have piix3-ide.  And for those VMs an easy
way to add AHCI controllers and drives isn't exactly a priority.

>> A few questions inline.
>> 
>>> Signed-off-by: Alexander Graf 
>>> ---
>>> blockdev.c  |   25 +++--
>>> blockdev.h  |1 +
>>> qemu-options.hx |7 ++-
>>> 3 files changed, 30 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/blockdev.c b/blockdev.c
>>> index 622ecba..5405f6c 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -32,6 +32,7 @@ static const char *const if_name[IF_COUNT] = {
>>> [IF_SD] = "sd",
>>> [IF_VIRTIO] = "virtio",
>>> [IF_XEN] = "xen",
>>> +[IF_AHCI] = "ahci",
>>> };
>>> 
>>> static const int if_max_devs[IF_COUNT] = {
>>> @@ -519,7 +520,7 @@ DriveInfo *drive_init(QemuOpts *opts, int 
>>> default_to_scsi)
>>> } else {
>>> /* no id supplied -> create one */
>>> dinfo->id = g_malloc0(32);
>>> -if (type == IF_IDE || type == IF_SCSI)
>>> +if (type == IF_IDE || type == IF_SCSI || type == IF_AHCI)
>>> mediastr = (media == MEDIA_CDROM) ? "-cd" : "-hd";
>>> if (max_devs)
>>> snprintf(dinfo->id, 32, "%s%i%s%i",
>>> @@ -549,6 +550,7 @@ DriveInfo *drive_init(QemuOpts *opts, int 
>>> default_to_scsi)
>>> case IF_IDE:
>>> case IF_SCSI:
>>> case IF_XEN:
>>> +case IF_AHCI:
>>> case IF_NONE:
>>> switch(media) {
>>> case MEDIA_DISK:
>>> @@ -582,6 +584,25 @@ DriveInfo *drive_init(QemuOpts *opts, int 
>>> default_to_scsi)
>>> default:
>>> abort();
>>> }
>>> +
>>> +if (type == IF_AHCI) {
>>> +static int ahci_bus = 0;
>>> +char devname[] = "ahciXXX";
>>> +char busname[] = "ahciXXX.0";
>>> +snprintf(devname, sizeof(devname), "ahci%d", ahci_bus);
>>> +snprintf(busname, sizeof(busname), "ahci%d.0", ahci_bus++);
>>> +
>>> +/* add ahci host controller */
>>> +opts = qemu_opts_create(qemu_find_opts("device"), devname, 0, 
>>> NULL);
>>> +qemu_opt_set(opts, "driver", "ich9-ahci");
>>> +
>>> +/* and attach a single ata disk to its bus */
>>> +opts = qemu_opts_create(qemu_find_opts("device"), NULL, 0, NULL);
>>> +qemu_opt_set(opts, "driver", "ide-drive");
>>> +qemu_opt_set(opts, "bus", busname);
>>> +qemu_opt_set(opts, "drive", dinfo->id);
>>> +}
>>> +
>> 
>> Doesn't this create a new ich9-ahci controller per -drive?
>> 
>> If yes, it's problematic in practice, as you'll run out of PCI slots
>> pretty darn fast.  That problem made us replace virtio-blk by
>> virtio-scsi.  Let's not re-create it.
>
> Hrm. If you have a great idea on how to implement it, I'm all open for it. 
> Talking about it from a high level perspective I had the same feelings at 
> first. Looking at how to implement index= and bus= for real, I quickly 
> withdrew myself from the approach.

I'm afraid the sane way to do this is going to be complicated, just like
if=scsi.  I understand your reluctance to do that; I feel the same.
That's why I'm suggesting to make the problem go away via -M q35.

> The good news is that the limitation here is only a -drive if=ahci 
> limitation. It does not apply to -device. There you can still plug up to 6 
> ahci devices onto a single HBA.

A good convenience option covers a useful subset of common cases neatly.

Your -drive if=ahci covers one common case: add the first AHCI drive
(automatically adding the controller for it as well).  It doesn't cover
the common case of adding a second AHCI drive (to the same controller,
of course).  If I have to resort to -device for something as simple as
that, then I wonder why it's worth having -drive if=ahci at all.

>> IF_VIRTIO device option creation is done in the preceeding swi

Re: [Qemu-devel] [PATCH 0/3] qom: refactor Interfaces

2012-06-14 Thread Paolo Bonzini

Il 13/06/2012 22:54, Anthony Liguori ha scritto:
> The interface implementation was pretty busted.  The way it created Objects 
> for
> each interface was extremely clumbsy and brittle.
> 
> This is a new implementation that does something quite a bit more natural.  It
> simply modifies classes such that they can affectively have more than one 
> super
> class.
> 
> Interfaces never get instantiated.  Instead an object's class just refers to 
> its
> parent class and it's implemented interfaces.
> 
> This should solve the issues Peter's run into and also eliminate the recursive
> call to object_new() (its no longer necessary to allocate anything when 
> creating
> an object).
> 
> This also comes with a test case for object.
> 

I gave it only a quick look, but the approach is much more sane.

Acked-by: Paolo Bonzini 

Paolo

Re: [Qemu-devel] Does kvm_fd mean KVM or VCPU fd?

2012-06-14 Thread Paolo Bonzini

Il 14/06/2012 10:24, 陳韋任 (Wei-Ren Chen) ha scritto:
> Hi all,
> 
>   While reading KVM releated code in QEMU, I found the name of one
> field in CPU_COMMON (cpu-defs.h), i.e. kvm_fd, might be misleading.
> See the code below,
> 
> ---
> int kvm_init_vcpu(CPUArchState *env)
> {
> ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, env->cpu_index);
> env->kvm_fd = ret; /* VCPU fd? */
> }
> ---
> 
>   I think KVM_CREATE_VCPU should return VCPU fd, right? AFAIK, in KVM
> world, kvm_fd usually means the fd we get after opening "/dev/kvm".
> Just want to make sure I understand the code correcly. Thanks.

This is the kvm_fd inside struct CPUState, so it's per-CPU.

Paolo

Re: [Qemu-devel] Broken Microblaze timer

2012-06-14 Thread Paolo Bonzini

Il 14/06/2012 04:29, Peter Crosthwaite ha scritto:
> Obviously this sucks as a patch, but without this hack, the system
> freezes on boot. I managed to ascertain that its coming from the
> ptimer used by the system timer (hw/xilinx_timer.c). The CPU is either
> halted and never resumes, or the timer is flooding with halt requests
> and the CPU never gets another look in.
> 
> The question is, is this a failure in ptimer, xilinx_timer or the
> async framework? Can ptimer be misused such that the CPU is locked up?

Yes, this looks like the ptimer is flooding the iothread so that the CPU
does not get an occasion to run.

Perhaps you want to limit the rate of the hw/xilinx_timer.c to something
like 1000 Hz or something like that.

Paolo

Re: [Qemu-devel] [PATCH 0/2] qemu-iotests: Test COW from backing file

2012-06-14 Thread Paolo Bonzini

Il 13/06/2012 17:34, Kevin Wolf ha scritto:
> Thanks to Paolo who caught a bug that qemu-iotest would have let passed (well,
> almost, the zero cluster test case did accidentally catch it). I was surprised
> to find that there's no specific test for COW from backing files. Fixing this
> now...
> 
> Kevin Wolf (2):
>   qemu-iotests: Some backing file COW tests
>   qemu-iotests: COW with many AIO requests on the same cluster
> 
>  tests/qemu-iotests/037 |  119 ++
>  tests/qemu-iotests/037.out |  645 +++
>  tests/qemu-iotests/038 |  133 +++
>  tests/qemu-iotests/038.out |  909 
> 
>  tests/qemu-iotests/group   |2 +
>  5 files changed, 1808 insertions(+), 0 deletions(-)
>  create mode 100755 tests/qemu-iotests/037
>  create mode 100644 tests/qemu-iotests/037.out
>  create mode 100755 tests/qemu-iotests/038
>  create mode 100644 tests/qemu-iotests/038.out
> 

Reviewed-by: Paolo Bonzini 

(not the .out files :))

Paolo

Re: [Qemu-devel] [PATCH] block: Prevent /dev/fd/X filename from being detected as floppy

2012-06-14 Thread Paolo Bonzini

Il 13/06/2012 16:30, root ha scritto:
> From: Corey Bryant 
> 
> Reported-by: Kevin Wolf 
> Signed-off-by: Corey Bryant 
> ---
>  block/raw-posix.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index d8eff2f..68886cd 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -946,9 +946,11 @@ static int floppy_probe_device(const char *filename)
>  int prio = 0;
>  struct floppy_struct fdparam;
>  struct stat st;
> +const char *p;
>  
> -if (strstart(filename, "/dev/fd", NULL))
> +if (strstart(filename, "/dev/fd", &p) && p[0] != '/') {
>  prio = 50;
> +}
>  
>  fd = qemu_open(filename, O_RDONLY | O_NONBLOCK);
>  if (fd < 0) {
> 

Reviewed-by: Paolo Bonzini

Re: [Qemu-devel] IO performance test on the tcm-vhost scsi

2012-06-14 Thread Cong Meng

On Thu, 2012-06-14 at 09:30 +0100, Stefan Hajnoczi wrote:
> On Wed, Jun 13, 2012 at 11:13 AM, mengcong  wrote:
> >seq-readseq-write   rand-read rand-write
> >8k 256k 8k 256k 8k   256k 8k   256k
> > 
> > bare-metal  67951  6980267064  670751758 292841969 26360
> > tcm-vhost-iblock61501  6657551775  678721011 225331851 28216
> > tcm-vhost-pscsi 66479  6819150873  675471008 225231818 28304
> > virtio-blk  26284  6673723373  657351724 289621805 27774
> > scsi-disk   36013  6028946222  625271663 129921804 27670
> 
> >
> > unit: KB/s
> > seq-read/write = sequential read/write
> > rand-read/write = random read/write
> > 8k,256k are blocksize of the IO
> 
> What strikes me is how virtio-blk performs significantly worse than
> bare metal and tcm_vhost for seq-read/seq-write 8k.  The good
> tcm_vhost results suggest that the overhead is not the virtio
> interface itself, since tcm_vhost implements virtio-scsi.
> 
> To drill down on the tcm_vhost vs userspace performance gap we need
> virtio-scsi userspace results.  QEMU needs to use the same block
> device as the tcm-vhost-iblock benchmark.
> 
> Cong: Is it possible to collect the virtio-scsi userspace results
> using the same block device as tcm-vhost-iblock and -drive
> format=raw,aio=native,cache=none?
> 

virtio-scsi-raw 43065  6972952052  673781757 294192024 28135

qemu \
-drive file=/dev/sdb,format=raw,if=none,id=sdb,cache=none,aio=native \
-device virtio-scsi-pci,id=mcbus \
-device scsi-disk,drive=sdb

there is only one scsi HBA. 
/dev/sdb is the disk on which all tests have been done. 

Is this what you want?

Cong Meng


> Stefan
>

Re: [Qemu-devel] IO performance test on the tcm-vhost scsi

2012-06-14 Thread Cong Meng

On Wed, 2012-06-13 at 12:08 -0700, Nicholas A. Bellinger wrote:
> On Wed, 2012-06-13 at 18:13 +0800, mengcong wrote:
> > Hi folks, I did an IO performance test on the tcm-vhost scsi. I want to 
> > share 
> > the test result data here.
> > 
> > 
> > seq-readseq-write   rand-read rand-write
> > 8k 256k 8k 256k 8k   256k 8k   256k
> > 
> > bare-metal  67951  6980267064  670751758 292841969 26360
> > tcm-vhost-iblock61501  6657551775  678721011 225331851 28216
> > tcm-vhost-pscsi 66479  6819150873  675471008 225231818 28304
> > virtio-blk  26284  6673723373  657351724 289621805 27774
> > scsi-disk   36013  6028946222  625271663 129921804 27670
> > 
> > unit: KB/s
> > seq-read/write = sequential read/write
> > rand-read/write = random read/write
> > 8k,256k are blocksize of the IO
> > 
> > In tcm-vhost-iblock test, the emulate_write_cache attr was enabled.
> > In virtio-blk test, cache=none,aio=native were set.
> > In scsi-disk test, cache=none,aio=native were set, and LSI HBA was used.
> > 
> > I also tried to do the test with a scsi-generic LUN (pass through the 
> > physical partition /dev/sgX device). But I couldn't setup it
> > successfully. It's a pity.
> > 
> > Benchmark tool: fio, with ioengine=aio,direct=1,iodepth=8 set for all tests.
> > kvm vm: 2 cpus and 2G ram
> > 
> 
> These initial performance results look quite promising for virtio-scsi.
> 
> I'd be really interested to see how a raw flash block device backend
> that locally can do ~100K 4k mixed R/W random IOPs compares with
> virtio-scsi guest performance as the random small block fio workload
> increases..
flash block == Solid state disk? I have no one on hand. 
> 
> Also note there is a bottleneck wrt to random small block I/O
> performance (per LUN) on the Linux/SCSI initiator side that is effecting
> things here.  We've run into this limitation numerous times with using
> SCSI LLDs as backend TCM devices, and I usually recommend using iblock
> export with raw block flash backends for achieving the best small block
> random I/O performance results.  A number of high performance flash
> storage folks do something similar with raw block access (Jen's CC'ed)
> 
> As per Stefan's earlier question, how does virtio-scsi to QEMU SCSI
> userspace compare with these results..?  Is there a reason why these
> where not included in the initial results..?
> 
This should be a mistake I made. I will do this pattern later.

> Thanks Meng!
> 
> --nab
>

[Qemu-devel] Compiling static

2012-06-14 Thread Davide Ferraretto


I want compile qemu with --static:
./configure --static --target-list=i386-linux-user,arm-linux-user 
--python=/usr/bin/python2.7 --prefix=/install_qemu


Qemu returns:
/usr/bin/ld: cannot find -lssl3
/usr/bin/ld: cannot find -lsmime3
/usr/bin/ld: cannot find -lnss3
/usr/bin/ld: cannot find -lnssutil3
collect2: error: ld returned 1 exit status

How resolve??

[Qemu-devel] Compiling static

2012-06-14 Thread Davide Ferraretto

I want compile qemu with --static: ./configure --static 
--target-list=i386-linux-user,arm-linux-user --python=/usr/bin/python2.7 
--prefix=/install_qemu



Qemu returns: /usr/bin/ld: cannot find -lssl3 /usr/bin/ld: cannot find 
-lsmime3 /usr/bin/ld: cannot find -lnss3 /usr/bin/ld: cannot find 
-lnssutil3 collect2: error: ld returned 1 exit status



How resolve??

Re: [Qemu-devel] [PATCH v1 3/4] vl.c: allow for repeated -sd arguments

2012-06-14 Thread Igor Mitsyanko


On 06/14/2012 06:20 AM, Peter A. G. Crosthwaite wrote:

Allows for repeating of -sd arugments in the same way as -pflash and -mtdblock.

Signed-off-by: Peter A. G. Crosthwaite
---
changed from v3:
fixed commit msg typo

  vl.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/vl.c b/vl.c
index 204d85b..b6d624b 100644
--- a/vl.c
+++ b/vl.c
@@ -2431,7 +2431,7 @@ int main(int argc, char **argv, char **envp)
  drive_add(IF_MTD, -1, optarg, MTD_OPTS);
  break;
  case QEMU_OPTION_sd:
-drive_add(IF_SD, 0, optarg, SD_OPTS);
+drive_add(IF_SD, -1, optarg, SD_OPTS);
  break;
  case QEMU_OPTION_pflash:
  drive_add(IF_PFLASH, -1, optarg, PFLASH_OPTS);

Nice. Sometimes its really annoying to type full "-drive .." command

Acked-by: Igor Mitsyanko

Re: [Qemu-devel] Does kvm_fd mean KVM or VCPU fd?

2012-06-14 Thread Wei-Ren Chen

> >   I think KVM_CREATE_VCPU should return VCPU fd, right? AFAIK, in KVM
> > world, kvm_fd usually means the fd we get after opening "/dev/kvm".
> > Just want to make sure I understand the code correcly. Thanks.
> 
> This is the kvm_fd inside struct CPUState, so it's per-CPU.

  Okay, then it's VCPU fd. Don't know rename it or add a comment to indicate
it's a VCPU fd is a good idea. ;)

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

Re: [Qemu-devel] [PATCH v2 1/6] qerror: add MAX_KEYCODES 16

2012-06-14 Thread Amos Kong


On 12/06/12 01:25, Luiz Capitulino wrote:




Hi Luiz, Anthony


 BTW, why is there a 16 keycode limit?


'Sendkey' command was added by this commit 
a3a91a355bc6107b7d06d722fb97d2b80065ee0b

Limitation of keycodes number (16) was also introduced here,
and I didn't find the root reason.




That's a good point. This comes form the array used by the original
implementation to store the keycodes.

Amos, is it still needed now that we're using qapi lists?



I try to drop this limitation from monitor.c [1], then execute

(qemu) sendkey 1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-0

kbd_put_keycode() are called for all (20) keycodes,
but only '123456789012345' can be sent to guest.


It seems we need to notice user when inputted keys are more than 16.


[1] 
https://github.com/kongove/QEMU/commit/a198cdcce3ce4c1632221ac7f1e7c4e8efcd9c82




--
Amos.

Re: [Qemu-devel] [PATCH v2 4/6] msix: Split PBA into it's own MemoryRegion

2012-06-14 Thread Michael S. Tsirkin

On Wed, Jun 13, 2012 at 10:51:47PM -0600, Alex Williamson wrote:
> These don't have to be contiguous.  Size them to only what
> they need and use separate MemoryRegions for the vector
> table and PBA.
> 
> Signed-off-by: Alex Williamson 

Why is this still using NATIVE?

> ---
> 
>  hw/msix.c |  116 
> ++---
>  hw/pci.h  |   15 ++--
>  2 files changed, 83 insertions(+), 48 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index a4cdfb0..d476d07 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -37,7 +37,7 @@
>  
>  static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
>  {
> -uint8_t *table_entry = dev->msix_table_page + vector * 
> PCI_MSIX_ENTRY_SIZE;
> +uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
>  MSIMessage msg;
>  
>  msg.address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
> @@ -86,16 +86,6 @@ static int msix_add_config(struct PCIDevice *pdev, 
> unsigned short nentries,
>  return 0;
>  }
>  
> -static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
> -   unsigned size)
> -{
> -PCIDevice *dev = opaque;
> -unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> -void *page = dev->msix_table_page;
> -
> -return pci_get_long(page + offset);
> -}
> -
>  static uint8_t msix_pending_mask(int vector)
>  {
>  return 1 << (vector % 8);
> @@ -103,7 +93,7 @@ static uint8_t msix_pending_mask(int vector)
>  
>  static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
>  {
> -return dev->msix_table_page + MSIX_PAGE_PENDING + vector / 8;
> +return dev->msix_pba + vector / 8;
>  }
>  
>  static int msix_is_pending(PCIDevice *dev, int vector)
> @@ -124,7 +114,7 @@ static void msix_clr_pending(PCIDevice *dev, int vector)
>  static bool msix_vector_masked(PCIDevice *dev, int vector, bool fmask)
>  {
>  unsigned offset = vector * PCI_MSIX_ENTRY_SIZE + 
> PCI_MSIX_ENTRY_VECTOR_CTRL;
> -return fmask || dev->msix_table_page[offset] & 
> PCI_MSIX_ENTRY_CTRL_MASKBIT;
> +return fmask || dev->msix_table[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
>  }
>  
>  static bool msix_is_masked(PCIDevice *dev, int vector)
> @@ -203,27 +193,46 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
>  }
>  }
>  
> -static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> -uint64_t val, unsigned size)
> +static uint64_t msix_table_mmio_read(void *opaque, target_phys_addr_t addr,
> + unsigned size)
>  {
>  PCIDevice *dev = opaque;
> -unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> -int vector = offset / PCI_MSIX_ENTRY_SIZE;
> -bool was_masked;
>  
> -/* MSI-X page includes a read-only PBA and a writeable Vector Control. */
> -if (vector >= dev->msix_entries_nr) {
> -return;
> -}
> +return pci_get_long(dev->msix_table + addr);
> +}
> +
> +static void msix_table_mmio_write(void *opaque, target_phys_addr_t addr,
> +  uint64_t val, unsigned size)
> +{
> +PCIDevice *dev = opaque;
> +int vector = addr / PCI_MSIX_ENTRY_SIZE;
> +bool was_masked;
>  
>  was_masked = msix_is_masked(dev, vector);
> -pci_set_long(dev->msix_table_page + offset, val);
> +pci_set_long(dev->msix_table + addr, val);
>  msix_handle_mask_update(dev, vector, was_masked);
>  }
>  
> -static const MemoryRegionOps msix_mmio_ops = {
> -.read = msix_mmio_read,
> -.write = msix_mmio_write,
> +static const MemoryRegionOps msix_table_mmio_ops = {
> +.read = msix_table_mmio_read,
> +.write = msix_table_mmio_write,
> +.endianness = DEVICE_NATIVE_ENDIAN,
> +.valid = {
> +.min_access_size = 4,
> +.max_access_size = 4,
> +},
> +};
> +
> +static uint64_t msix_pba_mmio_read(void *opaque, target_phys_addr_t addr,
> +   unsigned size)
> +{
> +PCIDevice *dev = opaque;
> +
> +return pci_get_long(dev->msix_pba + addr);
> +}
> +
> +static const MemoryRegionOps msix_pba_mmio_ops = {
> +.read = msix_pba_mmio_read,
>  .endianness = DEVICE_NATIVE_ENDIAN,
>  .valid = {
>  .min_access_size = 4,
> @@ -235,11 +244,14 @@ static void msix_mmio_setup(PCIDevice *d, MemoryRegion 
> *bar)
>  {
>  uint8_t *config = d->config + d->msix_cap;
>  uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
> -uint32_t offset = table & ~(MSIX_PAGE_SIZE - 1);
> +uint32_t table_offset = table & ~PCI_MSIX_FLAGS_BIRMASK;
> +uint32_t pba = pci_get_long(config + PCI_MSIX_PBA);
> +uint32_t pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
>  /* TODO: for assigned devices, we'll want to make it possible to map
>   * pending bits separately in case they are in a separate bar. */
>  
> -memory_region_add_subregion(bar, offset, &d->msix_mmio);
> +memory_region_add_subregion(bar, table_o

Re: [Qemu-devel] [PATCH v2 1/6] msix: Add simple BAR allocation MSIX setup functions

2012-06-14 Thread Michael S. Tsirkin

On Wed, Jun 13, 2012 at 10:51:19PM -0600, Alex Williamson wrote:
> msi_init() takes over a BAR without really specifying or allowing
> specification of how it does so.  Instead, let's split it into
> two interfaces, one fully specified, and one trivially easy.  This
> implements the latter.  msix_init_exclusive_bar() takes over
> allocating and filling a PCI BAR _exclusively_ for the use of MSIX.
> When used, the matching msi_uninit_exclusive_bar() should be used
> to tear it down.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
>  hw/msix.c |   43 +++
>  hw/msix.h |3 +++
>  hw/pci.h  |2 ++
>  3 files changed, 48 insertions(+)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index b64f109..a4cdfb0 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -299,6 +299,41 @@ err_config:
>  return ret;
>  }
>  
> +int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
> +uint8_t bar_nr)
> +{
> +int ret;
> +char *name;
> +
> +/*
> + * Migration compatibility dictates that this remains a 4k
> + * BAR with the vector table in the lower half and PBA in
> + * the upper half.
> + */
> +if (nentries * PCI_MSIX_ENTRY_SIZE > 2048) {
> +return -EINVAL;
> +}
> +
> +if (asprintf(&name, "%s MSIX BAR", dev->name) == -1) {
> +return -ENOMEM;
> +}
> +

I think we should avoid spaces in names.
Also let's use lower case. And BAR is confusing IMO:
it's not a base address register, it's the region.
So I would prefer simply: "%s-msix".


> +memory_region_init(&dev->msix_exclusive_bar, name, 4096);
> +
> +free(name);
> +
> +ret = msix_init(dev, nentries, &dev->msix_exclusive_bar, bar_nr, 4096);
> +if (ret) {
> +memory_region_destroy(&dev->msix_exclusive_bar);
> +return ret;
> +}
> +
> +pci_register_bar(dev, bar_nr, PCI_BASE_ADDRESS_SPACE_MEMORY,
> + &dev->msix_exclusive_bar);
> +
> +return 0;
> +}
> +
>  static void msix_free_irq_entries(PCIDevice *dev)
>  {
>  int vector;
> @@ -329,6 +364,14 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>  return 0;
>  }
>  
> +void msix_uninit_exclusive_bar(PCIDevice *dev)
> +{
> +if (msix_present(dev)) {
> +msix_uninit(dev, &dev->msix_exclusive_bar);
> +memory_region_destroy(&dev->msix_exclusive_bar);
> +}
> +}
> +
>  void msix_save(PCIDevice *dev, QEMUFile *f)
>  {
>  unsigned n = dev->msix_entries_nr;
> diff --git a/hw/msix.h b/hw/msix.h
> index e5a488d..bed6bfb 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -7,11 +7,14 @@
>  int msix_init(PCIDevice *pdev, unsigned short nentries,
>MemoryRegion *bar,
>unsigned bar_nr, unsigned bar_size);
> +int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
> +uint8_t bar_nr);
>  
>  void msix_write_config(PCIDevice *pci_dev, uint32_t address,
> uint32_t val, int len);
>  
>  int msix_uninit(PCIDevice *d, MemoryRegion *bar);
> +void msix_uninit_exclusive_bar(PCIDevice *dev);
>  
>  unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
>  
> diff --git a/hw/pci.h b/hw/pci.h
> index 4c96268..d517a54 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -226,6 +226,8 @@ struct PCIDevice {
>  
>  /* Space to store MSIX table */
>  uint8_t *msix_table_page;
> +/* MemoryRegion container for msix exclusive BAR setup */
> +MemoryRegion msix_exclusive_bar;
>  /* MMIO index used to map MSIX table and pending bit entries. */
>  MemoryRegion msix_mmio;
>  /* Reference-count for entries actually in use by driver. */

Re: [Qemu-devel] Compiling static

2012-06-14 Thread Dunrong Huang

2012/6/14 Davide Ferraretto :
> I want compile qemu with --static:
> ./configure --static --target-list=i386-linux-user,arm-linux-user
> --python=/usr/bin/python2.7 --prefix=/install_qemu
>
> Qemu returns:
> /usr/bin/ld: cannot find -lssl3
> /usr/bin/ld: cannot find -lsmime3
> /usr/bin/ld: cannot find -lnss3
> /usr/bin/ld: cannot find -lnssutil3
> collect2: error: ld returned 1 exit status
Note those libraries(libssl3.so, libsmime3 ...) belong to nss, did you
install nss header files and libraries?
Moreover, because you want to compile qemu statically, you must
install static nss libraries.
> How resolve??
>



-- 
linuxer, emacser and pythoner living in beijing
blog: http://mathslinux.org
twitter: https://twitter.com/mathslinux
google+: https://plus.google.com/118129852578326338750

[Qemu-devel] [PATCH] qcow2: fix autoclear image header update

2012-06-14 Thread Stefan Hajnoczi

The autoclear feature bits can be used for qcow2 file format features
that are safe to "drop" by old programs that do not understand the
feature.  Upon opening the image file unknown autoclear feature bits are
cleared and the image file header is rewritten, but this was happening
too early in the code when critical header fields were not yet loaded.

Process autoclear feature bits after all necessary header information
has been loaded.

Signed-off-by: Stefan Hajnoczi 
---
 block/qcow2.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c2e49cd..79201fc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -298,14 +298,6 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 goto fail;
 }
 
-if (!bs->read_only && s->autoclear_features != 0) {
-s->autoclear_features = 0;
-ret = qcow2_update_header(bs);
-if (ret < 0) {
-goto fail;
-}
-}
-
 /* Check support for various header values */
 if (header.refcount_order != 4) {
 report_unsupported(bs, "%d bit reference counts",
@@ -411,6 +403,15 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 goto fail;
 }
 
+/* Clear unknown autoclear feature bits */
+if (!bs->read_only && s->autoclear_features != 0) {
+s->autoclear_features = 0;
+ret = qcow2_update_header(bs);
+if (ret < 0) {
+goto fail;
+}
+}
+
 /* Initialise locks */
 qemu_co_mutex_init(&s->lock);
 
-- 
1.7.10

Re: [Qemu-devel] [PATCH 1/6 v10] docs: spec for add-cow file format

2012-06-14 Thread Kevin Wolf

Am 14.06.2012 05:06, schrieb Dong Xu Wang:
> On Wed, Jun 13, 2012 at 11:10 PM, Eric Blake  wrote:
>> On 06/13/2012 08:36 AM, Dong Xu Wang wrote:
>>> Introduce a new file format:add-cow. The usage can be found at this patch.
>>>
>>> Signed-off-by: Dong Xu Wang 
>>> ---
>>>  docs/specs/add-cow.txt |   87 
>>> 
>>>  1 files changed, 87 insertions(+), 0 deletions(-)
>>>  create mode 100644 docs/specs/add-cow.txt

>>> +
>>> +== Header ==
>>> +
>>> +The Header is included in the first bytes:
>>> +
>>> +Byte0 -  7: magic
>>> +add-cow magic string ("ADD_COW\xff")
>>> +
>>> +8 -  11:version
>>> +Version number (only valid value is 1 now)
>>> +
>>> +12 - 15:backing_filename_offset
>>> +Offset in the add-cow file at which the backing 
>>> file name
>>> +is stored (NB: The string is not null terminated). 
>>> 0 if the
>>> +image doesn't have a backing file.
>>
>> Mention that if this is not 0, then it must be between 36 and 4094 (a
>> file name must be at least 1 byte).  What are the semantics if the
>> filename is relative?
> 
> relative filename is ok, I tested it just now.

I believe Eric wanted to know what a relative path means, i.e. that it's
relative to the image file rather than relative to the working directory.

>>> +
>>> +16 - 19:backing_filename_size
>>> +Length of the backing file name in bytes. 
>>> Undefined if the
>>> +image doesn't have a backing file.
>>
>> Better to require 0 if backing_filename_offset is 0, than to leave this
>> field undefined; also if backing_filename_offset is non-zero, then this
>> must be non-zero.  Must be less than 4096-36 to fit in the reserved part
>> of the header.
>>
> 
> Okay.

Does an add-cow image without a backing file even make sense?

>>> +28 - 35:features
>>> +Currently only 2 feature bits are used:
>>> +Feature bits:
>>> +The image uses a backing file:
>>> +* ADD_COW_F_BACKING_FILE = 0x01.
>>> +The backing file's format is raw:
>>> +* ADD_COW_F_BACKING_FORMAT_NO_PROBE = 0x02.
>>
>> Should this follow the qcow2v3 proposal of splitting into mandatory vs.
>> optional feature bits?
>>
>> I agree that ADD_COW_F_BACKING_FORMAT_NO_PROBE is sufficient to avoid
>> security implications, but do we want the extra flexibility of
>> specifying the backing format file format rather than just requiring
>> probes on all but raw?
> 
> Kevin, or Stefan, can you give some comments for this? thanks.

I tend to agree that a format name is better than relying on probing.

Also, I think we need the same thing for image_file. add-cow is not only
useful for raw images, but also for other image format types for which
we don't support backing files.

>>> +== Reserved ==
>>> +
>>> +Byte36 - 4095:  Reserved field:
>>> +It is used to make sure COW bitmap field starts at 
>>> the
>>> +4096th byte, backing_file name and image_file name 
>>> will
>>> +be stored here.
>>
>> Do we want to keep a fixed-size header, or should we be planning on the
>> possibility of future extensions requiring enough other header
>> extensions that a variable-sized header would be wiser?  That is, I'm
>> fine with requiring that the header be a multiple of 4k, but maybe it
>> would make sense to have a mandatory header field that states how many
>> header pages are present before the COW bitmap begins.  In the first
>> round of implementation, this header field can be required to be 1 (that
>> is, for now, we require exactly 4k header), but having the field would
>> let us change in the future to a design with an 8k header to hold more
>> metadata as needed.

I have the impression that this simple add-cow hack is starting to get
seriously overengineered... :-)

Kevin

Re: [Qemu-devel] [PATCH 4/6] qemu-img: add-cow will not support convert

2012-06-14 Thread Kevin Wolf

Am 13.06.2012 16:36, schrieb Dong Xu Wang:
> add-cow file can't live alone, must together will image_file and backing_file.
> If we implement qemu-img convert operation for add-cow file format, we must
> create image_file and backing_file manually, that will be confused for users,
> so we just ignore add-cow format while doing converting.
> 
> Signed-off-by: Dong Xu Wang 

NACK.

This stupid "let's drop the feature, it might confuse users" attitude is
known from Gnome, but not from qemu.

There's no technical reason to forbid it and anyone who manages to
create a valid add-cow image will also be able to specify the very same
options to a convert command. Also, having image format specific code in
qemu-img is evil.

Kevin

Re: [Qemu-devel] [PATCH 1/7] Add spent time for migration

2012-06-14 Thread Orit Wasserman

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> We add time spent for migration to the output of "info migrate"
> command.  'total_time' means time since the start fo migration if
> migration is 'active', and total time of migration if migration is
> completed.  As we are also interested in transferred ram when
> migration completes, adding all ram statistics
> 
> Signed-off-by: Juan Quintela 
> ---
>  hmp.c|2 ++
>  migration.c  |   11 +++
>  migration.h  |1 +
>  qapi-schema.json |   12 +---
>  4 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/hmp.c b/hmp.c
> index bb0952e..25a128b 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -142,6 +142,8 @@ void hmp_info_migrate(Monitor *mon)
> info->ram->remaining >> 10);
>  monitor_printf(mon, "total ram: %" PRIu64 " kbytes\n",
> info->ram->total >> 10);
> +monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n",
> +   info->ram->total_time);
>  }
> 
>  if (info->has_disk) {
> diff --git a/migration.c b/migration.c
> index 3f485d3..599bb6c 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -131,6 +131,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>  info->ram->transferred = ram_bytes_transferred();
>  info->ram->remaining = ram_bytes_remaining();
>  info->ram->total = ram_bytes_total();
> +info->ram->total_time = qemu_get_clock_ms(rt_clock)
> +- s->total_time;
> 
>  if (blk_mig_active()) {
>  info->has_disk = true;
> @@ -143,6 +145,13 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>  case MIG_STATE_COMPLETED:
>  info->has_status = true;
>  info->status = g_strdup("completed");
> +
> +info->has_ram = true;
> +info->ram = g_malloc0(sizeof(*info->ram));
> +info->ram->transferred = ram_bytes_transferred();
> +info->ram->remaining = 0;
> +info->ram->total = ram_bytes_total();
> +info->ram->total_time = s->total_time;
>  break;
>  case MIG_STATE_ERROR:
>  info->has_status = true;
> @@ -260,6 +269,7 @@ static void migrate_fd_put_ready(void *opaque)
>  } else {
>  migrate_fd_completed(s);
>  }
> +s->total_time = qemu_get_clock_ms(rt_clock) - s->total_time;
>  if (s->state != MIG_STATE_COMPLETED) {
>  if (old_vm_running) {
>  vm_start();
> @@ -373,6 +383,7 @@ static MigrationState *migrate_init(int blk, int inc)
> 
>  s->bandwidth_limit = bandwidth_limit;
>  s->state = MIG_STATE_SETUP;
> +s->total_time = qemu_get_clock_ms(rt_clock);
> 
>  return s;
>  }
> diff --git a/migration.h b/migration.h
> index 2e9ca2e..165b27b 100644
> --- a/migration.h
> +++ b/migration.h
> @@ -33,6 +33,7 @@ struct MigrationState
>  void *opaque;
>  int blk;
>  int shared;
> +int64_t total_time;
>  };
> 
>  void process_incoming_migration(QEMUFile *f);
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 2ca7195..c5a2e99 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -238,10 +238,15 @@
>  #
>  # @total: total amount of bytes involved in the migration process
>  #
> +# @total_time: tota0l amount of ms since migration started.  If
> +#migration has ended, it returns the total migration
> +#time. (since 1.2)
> +#
>  # Since: 0.14.0.
>  ##
>  { 'type': 'MigrationStats',
> -  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' } }
> +  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' ,
> +   'total_time': 'int' } }
> 
>  ##
>  # @MigrationInfo
> @@ -253,8 +258,9 @@
>  #  'cancelled'. If this field is not returned, no migration process
>  #  has been initiated
>  #
> -# @ram: #optional @MigrationStats containing detailed migration status,
> -#   only returned if status is 'active'
> +# @ram: #optional @MigrationStats containing detailed migration
> +#   status, only returned if status is 'active' or
> +#   'completed'. 'comppleted' (since 1.2)
>  #
>  # @disk: #optional @MigrationStats containing detailed disk migration
>  #status, only returned if status is 'active' and it is a block

Reviewed-by: Orit Wasserman

Re: [Qemu-devel] [PATCH 5/6] add-cow: support snapshot_blkdev

2012-06-14 Thread Kevin Wolf

Am 13.06.2012 16:36, schrieb Dong Xu Wang:
> add-cow will let raw file support snapshot_blkdev indirectly.
> 
> Signed-off-by: Dong Xu Wang 

Paolo, what do you think about this magic?

I think I can see its use especially for HMP because it's quite
convenient, but on the other hand it assumes a fixed image path for the
new raw image. This isn't going to work for block devices, for example.

If we don't do it this way, we need to allow passing image creation
options to the snapshotting command so that you can pass a precreated
image file.

Kevin

> ---
>  blockdev.c  |   31 +++
>  docs/live-block-ops.txt |   10 +-
>  2 files changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 622ecba..2d89e5e 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -783,15 +783,38 @@ void qmp_transaction(BlockdevActionList *dev_list, 
> Error **errp)
>  
>  /* create new image w/backing file */
>  if (mode != NEW_IMAGE_MODE_EXISTING) {
> -ret = bdrv_img_create(new_image_file, format,
> +if (strcmp(format, "add-cow")) {
> +ret = bdrv_img_create(new_image_file, format,
>states->old_bs->filename,
>states->old_bs->drv->format_name,
>NULL, -1, flags);
> -if (ret) {
> -error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
> -goto delete_and_fail;
> +} else {
> +char image_file[1024];
> +char option[1024];
> +uint64_t size;
> +
> +bdrv_get_geometry(states->old_bs, &size);
> +size *= BDRV_SECTOR_SIZE;
> +
> +sprintf(image_file, "%s.raw", new_image_file);
> +
> +ret = bdrv_img_create(image_file, "raw", NULL,
> +  NULL, NULL, size, flags);
> +if (ret) {
> +error_set(errp, QERR_UNDEFINED_ERROR);
> +return;
> +}
> +sprintf(option, "image_file=%s.raw", new_image_file);
> +ret = bdrv_img_create(new_image_file, format,
> +  states->old_bs->filename,
> +  states->old_bs->drv->format_name,
> +  option, -1, flags);
>  }
>  }
> +if (ret) {
> +error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
> +goto delete_and_fail;
> +}
>  
>  /* We will manually add the backing_hd field to the bs later */
>  states->new_bs = bdrv_new("");
> diff --git a/docs/live-block-ops.txt b/docs/live-block-ops.txt
> index a257087..c97344b 100644
> --- a/docs/live-block-ops.txt
> +++ b/docs/live-block-ops.txt
> @@ -2,7 +2,8 @@ LIVE BLOCK OPERATIONS
>  =
>  
>  High level description of live block operations. Note these are not
> -supported for use with the raw format at the moment.
> +supported for use with the raw format at the moment, but we can use
> +add-cow as metadata to suport raw format.
>  
>  Snapshot live merge
>  ===
> @@ -56,3 +57,10 @@ into that image. Example:
>  (qemu) block_stream ide0-hd0
>  
>  
> +
> +Raw is not supported, but we can use add-cow in the 1st step:
> +
> +(qemu) snapshot_blkdev ide0-hd0 /new-path/disk.img add-cow
> +
> +It will create a raw file named disk.img.raw, with the same virtual size of
> +ide0-hd0 first, and then create disk.img.

Re: [Qemu-devel] [PATCH 2/7] Add tracepoints for savevm section start/end

2012-06-14 Thread Orit Wasserman

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> This allows to know how long each section takes to save.
> 
> An awk script like this tells us sections that takes more that 10ms
> 
> $1 ~ /savevm_state_iterate_end/ {
>   /* Print savevm_section_end line when > 10ms duration */
>   if ($2 > 1) {
>   printf("%s times_missing=%u\n", $0, times_missing++);
>   }
> }
> 
> Signed-off-by: Juan Quintela 
> ---
>  savevm.c |8 
>  trace-events |6 ++
>  2 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/savevm.c b/savevm.c
> index 2d18bab..779bd22 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -85,6 +85,7 @@
>  #include "cpus.h"
>  #include "memory.h"
>  #include "qmp-commands.h"
> +#include "trace.h"
> 
>  #define SELF_ANNOUNCE_ROUNDS 5
> 
> @@ -1624,11 +1625,14 @@ int qemu_savevm_state_iterate(QEMUFile *f)
>  if (se->save_live_state == NULL)
>  continue;
> 
> +trace_savevm_section_start();
>  /* Section type */
>  qemu_put_byte(f, QEMU_VM_SECTION_PART);
>  qemu_put_be32(f, se->section_id);
> 
>  ret = se->save_live_state(f, QEMU_VM_SECTION_PART, se->opaque);
> +trace_savevm_section_end(se->section_id);
> +
>  if (ret <= 0) {
>  /* Do not proceed to the next vmstate before this one reported
> completion of the current stage. This serializes the migration
> @@ -1658,11 +1662,13 @@ int qemu_savevm_state_complete(QEMUFile *f)
>  if (se->save_live_state == NULL)
>  continue;
> 
> +trace_savevm_section_start();
>  /* Section type */
>  qemu_put_byte(f, QEMU_VM_SECTION_END);
>  qemu_put_be32(f, se->section_id);
> 
>  ret = se->save_live_state(f, QEMU_VM_SECTION_END, se->opaque);
> +trace_savevm_section_end(se->section_id);
>  if (ret < 0) {
>  return ret;
>  }
> @@ -1674,6 +1680,7 @@ int qemu_savevm_state_complete(QEMUFile *f)
>   if (se->save_state == NULL && se->vmsd == NULL)
>   continue;
> 
> +trace_savevm_section_start();
>  /* Section type */
>  qemu_put_byte(f, QEMU_VM_SECTION_FULL);
>  qemu_put_be32(f, se->section_id);
> @@ -1687,6 +1694,7 @@ int qemu_savevm_state_complete(QEMUFile *f)
>  qemu_put_be32(f, se->version_id);
> 
>  vmstate_save(f, se);
> +trace_savevm_section_end(se->section_id);
>  }
> 
>  qemu_put_byte(f, QEMU_VM_EOF);
> diff --git a/trace-events b/trace-events
> index 87cb96c..73d5f81 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -741,6 +741,11 @@ displaysurface_resize(void *display_state, void 
> *display_surface, int width, int
>  # vga.c
>  ppm_save(const char *filename, void *display_surface) "%s surface=%p"
> 
> +# savevm.c
> +
> +savevm_section_start(void) ""
> +savevm_section_end(unsigned int section_id) "section_id %u"
> +
>  # hw/qxl.c
>  disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
>  disable qxl_io_write_vga(int qid, const char *mode, uint32_t addr, uint32_t 
> val) "%d %s addr=%u val=%u"
> @@ -805,3 +810,4 @@ qxl_render_blit_guest_primary_initialized(void) ""
>  qxl_render_blit(int32_t stride, int32_t left, int32_t right, int32_t top, 
> int32_t bottom) "stride=%d [%d, %d, %d, %d]"
>  qxl_render_guest_primary_resized(int32_t width, int32_t height, int32_t 
> stride, int32_t bytes_pp, int32_t bits_pp) "%dx%d, stride %d, bpp %d, depth 
> %d"
>  qxl_render_update_area_done(void *cookie) "%p"
> +
extra ws

Reviewed-by: Orit Wasserman

Re: [Qemu-devel] [PATCH 3/7] No need to iterate if we already are over the limit

2012-06-14 Thread Orit Wasserman

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> If buffers are full, don't iterate, just exit.
> 
> Signed-off-by: Juan Quintela 
> ---
>  savevm.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/savevm.c b/savevm.c
> index 779bd22..6bc71b1 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1625,6 +1625,9 @@ int qemu_savevm_state_iterate(QEMUFile *f)
>  if (se->save_live_state == NULL)
>  continue;
> 
> +if (qemu_file_rate_limit(f)) {
> +return 0;
> +}
>  trace_savevm_section_start();
>  /* Section type */
>  qemu_put_byte(f, QEMU_VM_SECTION_PART);

Reviewed-by: Orit Wasserman

Re: [Qemu-devel] [PATCH] qcow2: fix autoclear image header update

2012-06-14 Thread Kevin Wolf

Am 14.06.2012 12:42, schrieb Stefan Hajnoczi:
> The autoclear feature bits can be used for qcow2 file format features
> that are safe to "drop" by old programs that do not understand the
> feature.  Upon opening the image file unknown autoclear feature bits are
> cleared and the image file header is rewritten, but this was happening
> too early in the code when critical header fields were not yet loaded.
> 
> Process autoclear feature bits after all necessary header information
> has been loaded.
> 
> Signed-off-by: Stefan Hajnoczi 

Ouch...

You don't have a qemu-iotests case to offer for the feature bit
handling, by any chance?

Kevin

Re: [Qemu-devel] [PATCH 3/6] add-cow file format

2012-06-14 Thread Paolo Bonzini

I just took a quick look at the flush code.

Il 13/06/2012 16:36, Dong Xu Wang ha scritto:
> 
> +bool add_cow_cache_set_writethrough(BlockDriverState *bs, AddCowCache *c,
> +bool enable)
> +{
> +bool old = c->writethrough;
> +
> +if (!old && enable) {
> +add_cow_cache_flush(bs, c);
> +}
> +
> +c->writethrough = enable;
> +return old;
> +}

Writethrough mode should not be needed anymore in 1.2 if the new
implementation of writethrough is added.

And anyway...

> +if (changed) {
> +ret = add_cow_cache_flush(bs, s->bitmap_cache);
> +}

... here you're treating the cache essentially as writethrough.  Is this
flush necessary?

> +qemu_co_mutex_unlock(&s->lock);
> +qemu_iovec_destroy(&hd_qiov);
> +return ret;
> +}
> +
> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
> +{
> +BDRVAddCowState *s = bs->opaque;
> +int sector_per_byte = SECTORS_PER_CLUSTER * 8;
> +int ret;
> +int64_t bitmap_size =
> +(size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
> +
> +ret = bdrv_truncate(bs->file,
> +ADD_COW_BITMAP_POS + bitmap_size);
> +if (ret < 0) {
> +return ret;
> +}
> +bdrv_truncate(s->image_hd, size);
> +return 0;
> +}
> +
> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
> +{
> +BDRVAddCowState *s = bs->opaque;
> +int ret;
> +
> +qemu_co_mutex_lock(&s->lock);
> +ret = add_cow_cache_flush(bs, s->bitmap_cache);
> +qemu_co_mutex_unlock(&s->lock);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +return bdrv_co_flush(bs->file);

Flushing bs->file here is not needed anymore...

> +}
> +
> +static QEMUOptionParameter add_cow_create_options[] = {
> +{
> +.name = BLOCK_OPT_SIZE,
> +.type = OPT_SIZE,
> +.help = "Virtual disk size"
> +},
> +{
> +.name = BLOCK_OPT_BACKING_FILE,
> +.type = OPT_STRING,
> +.help = "File name of a base image"
> +},
> +{
> +.name = BLOCK_OPT_IMAGE_FILE,
> +.type = OPT_STRING,
> +.help = "File name of a image file"
> +},
> +{
> +.name = BLOCK_OPT_BACKING_FMT,
> +.type = OPT_STRING,
> +.help = "Image format of the base image"
> +},
> +{ NULL }
> +};
> +
> +static BlockDriver bdrv_add_cow = {
> +.format_name= "add-cow",
> +.instance_size  = sizeof(BDRVAddCowState),
> +.bdrv_probe = add_cow_probe,
> +.bdrv_open  = add_cow_open,
> +.bdrv_close = add_cow_close,
> +.bdrv_create= add_cow_create,
> +.bdrv_co_readv  = add_cow_co_readv,
> +.bdrv_co_writev = add_cow_co_writev,
> +.bdrv_truncate  = bdrv_add_cow_truncate,
> +.bdrv_co_is_allocated   = add_cow_is_allocated,
> +
> +.create_options = add_cow_create_options,
> +.bdrv_co_flush_to_disk  = add_cow_co_flush,

... and this should be bdrv_co_flush_to_os.

Paolo

Re: [Qemu-devel] [PATCH 4/7] Only TCG needs TLB handling

2012-06-14 Thread Orit Wasserman

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> Refactor the code that is only needed for tcg to an static function.
> Call that only when tcg is enabled.  We can't refactor to a dummy
> function in the kvm case, as qemu can be compiled at the same time
> with tcg and kvm.
> 
> Signed-off-by: Juan Quintela 
> ---
>  exec.c |   31 +--
>  1 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index a0494c7..b6c7675 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1943,11 +1943,29 @@ void tb_flush_jmp_cache(CPUArchState *env, 
> target_ulong addr)
>  TB_JMP_PAGE_SIZE * sizeof(TranslationBlock *));
>  }
> 
> +static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t end,
> +  uintptr_t length)
> +{
> +uintptr_t start1;
> +
> +/* we modify the TLB cache so that the dirty bit will be set again
> +   when accessing the range */
> +start1 = (uintptr_t)qemu_safe_ram_ptr(start);
> +/* Check that we don't span multiple blocks - this breaks the
> +   address comparisons below.  */
> +if ((uintptr_t)qemu_safe_ram_ptr(end - 1) - start1
> +!= (end - 1) - start) {
> +abort();
> +}
> +cpu_tlb_reset_dirty_all(start1, length);
> +
> +}
> +
>  /* Note: start and end must be within the same ram block.  */
>  void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
>   int dirty_flags)
>  {
> -uintptr_t length, start1;
> +uintptr_t length;
> 
>  start &= TARGET_PAGE_MASK;
>  end = TARGET_PAGE_ALIGN(end);
> @@ -1957,16 +1975,9 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, 
> ram_addr_t end,
>  return;
>  cpu_physical_memory_mask_dirty_range(start, length, dirty_flags);
> 
> -/* we modify the TLB cache so that the dirty bit will be set again
> -   when accessing the range */
> -start1 = (uintptr_t)qemu_safe_ram_ptr(start);
> -/* Check that we don't span multiple blocks - this breaks the
> -   address comparisons below.  */
> -if ((uintptr_t)qemu_safe_ram_ptr(end - 1) - start1
> -!= (end - 1) - start) {
> -abort();
> +if (tcg_enabled()) {
> +tlb_reset_dirty_range_all(start, end, length);
>  }
> -cpu_tlb_reset_dirty_all(start1, length);
>  }
> 
>  int cpu_physical_memory_set_dirty_tracking(int enable)

Reviewed-by: Orit Wasserman

Re: [Qemu-devel] [PATCH] qcow2: fix autoclear image header update

2012-06-14 Thread Stefan Hajnoczi

On Thu, Jun 14, 2012 at 12:06 PM, Kevin Wolf  wrote:
> Am 14.06.2012 12:42, schrieb Stefan Hajnoczi:
>> The autoclear feature bits can be used for qcow2 file format features
>> that are safe to "drop" by old programs that do not understand the
>> feature.  Upon opening the image file unknown autoclear feature bits are
>> cleared and the image file header is rewritten, but this was happening
>> too early in the code when critical header fields were not yet loaded.
>>
>> Process autoclear feature bits after all necessary header information
>> has been loaded.
>>
>> Signed-off-by: Stefan Hajnoczi 
>
> Ouch...
>
> You don't have a qemu-iotests case to offer for the feature bit
> handling, by any chance?

Nope.  I'll cook one up, should be easy with your qcow2.py module.

Stefan

Re: [Qemu-devel] [PATCH 5/6] add-cow: support snapshot_blkdev

2012-06-14 Thread Paolo Bonzini

Il 14/06/2012 12:59, Kevin Wolf ha scritto:
> Am 13.06.2012 16:36, schrieb Dong Xu Wang:
>> add-cow will let raw file support snapshot_blkdev indirectly.
>>
>> Signed-off-by: Dong Xu Wang 
> 
> Paolo, what do you think about this magic?

Besides the obvious layering violation (it would be better to add a new
method bdrv_ext_snapshot_create perhaps) I don't see very much a problem
in it.  Passing image creation options sounds like a good idea that we
can add in the future as an extension.

But honestly, I don't really see the point of add-cow in general...  The
raw image is anyway not usable without a pass through qemu-io convert,
and mirroring will also allow collapsing an image to raw (with the
persistent dirty bitmap playing the role of the add-cow metadata).

> I think I can see its use especially for HMP because it's quite
> convenient, but on the other hand it assumes a fixed image path for the
> new raw image. This isn't going to work for block devices, for example.

True, but then probably you would use mode='existing', because you need
to pre-create the logical volume.

> If we don't do it this way, we need to allow passing image creation
> options to the snapshotting command so that you can pass a precreated
> image file.

This sounds like a useful extension anyway, except that passing an
unstructured string for image creation options is ugly...  Perhaps we
can base a better implementation of options on Laszlo's QemuOpts visitor.

Paolo
> 
> Kevin
> 
>> ---
>>  blockdev.c  |   31 +++
>>  docs/live-block-ops.txt |   10 +-
>>  2 files changed, 36 insertions(+), 5 deletions(-)
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index 622ecba..2d89e5e 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -783,15 +783,38 @@ void qmp_transaction(BlockdevActionList *dev_list, 
>> Error **errp)
>>  
>>  /* create new image w/backing file */
>>  if (mode != NEW_IMAGE_MODE_EXISTING) {
>> -ret = bdrv_img_create(new_image_file, format,
>> +if (strcmp(format, "add-cow")) {
>> +ret = bdrv_img_create(new_image_file, format,
>>states->old_bs->filename,
>>states->old_bs->drv->format_name,
>>NULL, -1, flags);
>> -if (ret) {
>> -error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
>> -goto delete_and_fail;
>> +} else {
>> +char image_file[1024];
>> +char option[1024];
>> +uint64_t size;
>> +
>> +bdrv_get_geometry(states->old_bs, &size);
>> +size *= BDRV_SECTOR_SIZE;
>> +
>> +sprintf(image_file, "%s.raw", new_image_file);
>> +
>> +ret = bdrv_img_create(image_file, "raw", NULL,
>> +  NULL, NULL, size, flags);
>> +if (ret) {
>> +error_set(errp, QERR_UNDEFINED_ERROR);
>> +return;
>> +}
>> +sprintf(option, "image_file=%s.raw", new_image_file);
>> +ret = bdrv_img_create(new_image_file, format,
>> +  states->old_bs->filename,
>> +  states->old_bs->drv->format_name,
>> +  option, -1, flags);
>>  }
>>  }
>> +if (ret) {
>> +error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
>> +goto delete_and_fail;
>> +}
>>  
>>  /* We will manually add the backing_hd field to the bs later */
>>  states->new_bs = bdrv_new("");
>> diff --git a/docs/live-block-ops.txt b/docs/live-block-ops.txt
>> index a257087..c97344b 100644
>> --- a/docs/live-block-ops.txt
>> +++ b/docs/live-block-ops.txt
>> @@ -2,7 +2,8 @@ LIVE BLOCK OPERATIONS
>>  =
>>  
>>  High level description of live block operations. Note these are not
>> -supported for use with the raw format at the moment.
>> +supported for use with the raw format at the moment, but we can use
>> +add-cow as metadata to suport raw format.
>>  
>>  Snapshot live merge
>>  ===
>> @@ -56,3 +57,10 @@ into that image. Example:
>>  (qemu) block_stream ide0-hd0
>>  
>>  
>> +
>> +Raw is not supported, but we can use add-cow in the 1st step:
>> +
>> +(qemu) snapshot_blkdev ide0-hd0 /new-path/disk.img add-cow
>> +
>> +It will create a raw file named disk.img.raw, with the same virtual size of
>> +ide0-hd0 first, and then create disk.img.
> 
> 
>

Re: [Qemu-devel] [PATCH 5/7] Only calculate expected_time for stage 2

2012-06-14 Thread Orit Wasserman

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> ram_save_remaining() is an expensive operation when there is a lot of memory.
> So we only call the function when we need it.
> 
> Signed-off-by: Juan Quintela 
> ---
>  arch_init.c |   10 ++
>  1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 988adca..76a3d4e 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -295,7 +295,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
>  ram_addr_t addr;
>  uint64_t bytes_transferred_last;
>  double bwidth = 0;
> -uint64_t expected_time = 0;
>  int ret;
> 
>  if (stage < 0) {
> @@ -372,9 +371,12 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
> 
>  qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> 
> -expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
> -
> -return (stage == 2) && (expected_time <= migrate_max_downtime());
> +if (stage == 2) {
> +uint64_t expected_time;
> +expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
> +return expected_time <= migrate_max_downtime();
> +}
> +return 0;
>  }
> 
>  static inline void *host_from_stream_offset(QEMUFile *f,

Reviewed-by: Orit Wasserman

Re: [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long

2012-06-14 Thread Orit Wasserman

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> cheking each 64 pages is a random magic number as good as any other.
s/cheking/checking
> We don't want to test too many times, but on the other hand,
> qemu_get_clock_ns() is not so expensive either.  We want to be sure
> that we spent less than 50ms (half of buffered_file timer), if we
> spent more than 100ms, all the accounting got wrong.
> 
> Signed-off-by: Juan Quintela 
> ---
>  arch_init.c |   15 +++
>  1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 76a3d4e..2aa77ff 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -296,6 +296,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
>  uint64_t bytes_transferred_last;
>  double bwidth = 0;
>  int ret;
> +int i;
> 
>  if (stage < 0) {
>  memory_global_dirty_log_stop();
> @@ -335,6 +336,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
>  bytes_transferred_last = bytes_transferred;
>  bwidth = qemu_get_clock_ns(rt_clock);
> 
> +i = 0;
>  while ((ret = qemu_file_rate_limit(f)) == 0) {
>  int bytes_sent;
> 
> @@ -343,6 +345,19 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
>  if (bytes_sent == 0) { /* no more blocks */
>  break;
>  }
> +/* we want to check in the 1st loop, just in case it was the 1st time
> +   and we had to sync the dirty bitmap.
> +   qemu_get_clock_ns() is a bit expensive, so we only check each some
> +   iterations
> +*/
> +if ((i & 63) == 0) {
> +uint64_t t1 = (qemu_get_clock_ns(rt_clock) - bwidth) / 100;
> +if (t1 > 50) { /* 50ms, half buffered_file limit */
can't we use a constant ?
> +printf("big delay %ld milliseconds, %d iterations\n", t1, i);
printf ? 

Orit
> +break;
> +}
> +}
> +i++;
>  }
> 
>  if (ret < 0) {

Re: [Qemu-devel] [PATCH 7/7] Maintaing number of dirty pages

2012-06-14 Thread Orit Wasserman

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> Calculate the number of dirty pages takes a lot on hosts with lots
> of memory.  Just maintain how many pages are dirty.  Only sync bitmaps
> if number is small enough.
> 
> Signed-off-by: Juan Quintela 
> ---
>  arch_init.c |   15 +--
>  cpu-all.h   |1 +
>  exec-obsolete.h |8 
>  exec.c  |2 ++
>  4 files changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 2aa77ff..77c3580 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -224,20 +224,7 @@ static uint64_t bytes_transferred;
> 
>  static ram_addr_t ram_save_remaining(void)
>  {
> -RAMBlock *block;
> -ram_addr_t count = 0;
> -
> -QLIST_FOREACH(block, &ram_list.blocks, next) {
> -ram_addr_t addr;
> -for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
> -if (memory_region_get_dirty(block->mr, addr, TARGET_PAGE_SIZE,
> -DIRTY_MEMORY_MIGRATION)) {
> -count++;
> -}
> -}
> -}
> -
> -return count;
> +return ram_list.dirty_pages;
>  }
> 
>  uint64_t ram_bytes_remaining(void)
> diff --git a/cpu-all.h b/cpu-all.h
> index 028528f..897210f 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -502,6 +502,7 @@ typedef struct RAMBlock {
>  typedef struct RAMList {
>  uint8_t *phys_dirty;
>  QLIST_HEAD(, RAMBlock) blocks;
> +uint64_t dirty_pages;
>  } RAMList;
>  extern RAMList ram_list;
> 
> diff --git a/exec-obsolete.h b/exec-obsolete.h
> index 792c831..2e54ac9 100644
> --- a/exec-obsolete.h
> +++ b/exec-obsolete.h
> @@ -75,6 +75,10 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t 
> start,
> 
>  static inline void cpu_physical_memory_set_dirty(ram_addr_t addr)
>  {
> +if (!cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE,
> +   MIGRATION_DIRTY_FLAG)) {
> +ram_list.dirty_pages++;
> +}
>  ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] = 0xff;
>  }
> 
> @@ -112,6 +116,10 @@ static inline void 
> cpu_physical_memory_mask_dirty_range(ram_addr_t start,
>  mask = ~dirty_flags;
>  p = ram_list.phys_dirty + (start >> TARGET_PAGE_BITS);
>  for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) {
> +if (cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE,
> +  MIGRATION_DIRTY_FLAG & 
> dirty_flags)) {
> +ram_list.dirty_pages--;
> +}
>  *p++ &= mask;
>  }
>  }
> diff --git a/exec.c b/exec.c
> index b6c7675..be4865c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2687,6 +2687,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
> void *host,
>  memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
> 0xff, size >> TARGET_PAGE_BITS);
> 
> +ram_list.dirty_pages += size >> TARGET_PAGE_BITS;
> +
>  if (kvm_enabled())
>  kvm_setup_guest_memory(new_block->host, size);
> 
Juan,
Don't we need to update cpu_physical_memory_set_dirty_range ?

Orit

Re: [Qemu-devel] [PATCH 5/6] add-cow: support snapshot_blkdev

2012-06-14 Thread Kevin Wolf

Am 14.06.2012 13:18, schrieb Paolo Bonzini:
> Il 14/06/2012 12:59, Kevin Wolf ha scritto:
>> Am 13.06.2012 16:36, schrieb Dong Xu Wang:
>>> add-cow will let raw file support snapshot_blkdev indirectly.
>>>
>>> Signed-off-by: Dong Xu Wang 
>>
>> Paolo, what do you think about this magic?
> 
> Besides the obvious layering violation (it would be better to add a new
> method bdrv_ext_snapshot_create perhaps) I don't see very much a problem
> in it.  Passing image creation options sounds like a good idea that we
> can add in the future as an extension.
> 
> But honestly, I don't really see the point of add-cow in general...  The
> raw image is anyway not usable without a pass through qemu-io convert,
> and mirroring will also allow collapsing an image to raw (with the
> persistent dirty bitmap playing the role of the add-cow metadata).

Well, the idea was that you stream into add-cow and once the streaming
has completed, you can just drop the bitmap.

There might be some overlap with mirroring, though when we discussed
introducing add-cow, mirrors were not yet on the table. One difference
that remains is that with streaming the guest only writes to the target
image and can't have any problem with convergence.

>> I think I can see its use especially for HMP because it's quite
>> convenient, but on the other hand it assumes a fixed image path for the
>> new raw image. This isn't going to work for block devices, for example.
> 
> True, but then probably you would use mode='existing', because you need
> to pre-create the logical volume.

I think it might be convenient to have the raw volume precreated (you
need to do that anyway), but create the COW bitmap only during the
snapshot command. But yeah, not really important.

>> If we don't do it this way, we need to allow passing image creation
>> options to the snapshotting command so that you can pass a precreated
>> image file.
> 
> This sounds like a useful extension anyway, except that passing an
> unstructured string for image creation options is ugly...  Perhaps we
> can base a better implementation of options on Laszlo's QemuOpts visitor.

Yes, I wouldn't want to use a string here, we should use something
structured. Image creation still uses the old-style options instead of
QemuOpts, but maybe this is the opportunity to convert it.

Kevin

Re: [Qemu-devel] IO performance test on the tcm-vhost scsi

2012-06-14 Thread Stefan Hajnoczi

On Thu, Jun 14, 2012 at 05:45:22PM +0800, Cong Meng wrote:
> On Thu, 2012-06-14 at 09:30 +0100, Stefan Hajnoczi wrote:
> > On Wed, Jun 13, 2012 at 11:13 AM, mengcong  wrote:
> > >seq-readseq-write   rand-read 
> > > rand-write
> > >8k 256k 8k 256k 8k   256k 8k   256k
> > > 
> > > bare-metal  67951  6980267064  670751758 292841969 
> > > 26360
> > > tcm-vhost-iblock61501  6657551775  678721011 225331851 
> > > 28216
> > > tcm-vhost-pscsi 66479  6819150873  675471008 225231818 
> > > 28304
> > > virtio-blk  26284  6673723373  657351724 289621805 
> > > 27774
> > > scsi-disk   36013  6028946222  625271663 129921804 
> > > 27670
> > 
> > >
> > > unit: KB/s
> > > seq-read/write = sequential read/write
> > > rand-read/write = random read/write
> > > 8k,256k are blocksize of the IO
> > 
> > What strikes me is how virtio-blk performs significantly worse than
> > bare metal and tcm_vhost for seq-read/seq-write 8k.  The good
> > tcm_vhost results suggest that the overhead is not the virtio
> > interface itself, since tcm_vhost implements virtio-scsi.
> > 
> > To drill down on the tcm_vhost vs userspace performance gap we need
> > virtio-scsi userspace results.  QEMU needs to use the same block
> > device as the tcm-vhost-iblock benchmark.
> > 
> > Cong: Is it possible to collect the virtio-scsi userspace results
> > using the same block device as tcm-vhost-iblock and -drive
> > format=raw,aio=native,cache=none?
> > 
> 
> virtio-scsi-raw 43065  6972952052  673781757 294192024 28135
> 
> qemu \
> -drive file=/dev/sdb,format=raw,if=none,id=sdb,cache=none,aio=native \
> -device virtio-scsi-pci,id=mcbus \
> -device scsi-disk,drive=sdb
> 
> there is only one scsi HBA. 
> /dev/sdb is the disk on which all tests have been done. 
> 
> Is this what you want?

Perfect, thanks.  virtio-scsi userspace is much better than virtio-blk
here.  That's unexpected since they both use the QEMU block layer.  If
anything, I would have expected virtio-blk to be faster!

I wonder if the request patterns being sent through virtio-blk and
virtio-scsi are different.  Asias discovered that the guest I/O
scheduler and request merging makes a big difference between QEMU and
native KVM tool performance.  It could be the same thing here which
causes virtio-blk and virtio-scsi userspace to produce quite different
results.

The second question is why is tcm_vhost faster than virtio-scsi
userspace.

Stefan

Re: [Qemu-devel] [PATCH v2] qapi: converted commit

2012-06-14 Thread Eric Blake

On 06/14/2012 01:35 AM, Pavel Hrdina wrote:
> Signed-off-by: Pavel Hrdina 
> ---

> +++ b/qapi-schema.json
> @@ -1169,6 +1169,21 @@
>  { 'command': 'block_resize', 'data': { 'device': 'str', 'size': 'int' }}
>  
>  ##
> +# @commit
> +#
> +# Commit changes to the disk images (if -snapshot is used) or backing files.
> +#
> +# @device: the name of the device or the "all" to commit all devices
> +#
> +# Returns: nothing on success
> +#  If @device is not a valid block device, DeviceNotFound
> +#  If a long-running operation is using the device, DeviceInUse
> +#
> +# Since: 1.2
> +##
> +{ 'command': 'commit', 'data': { 'device': 'str' }}

Should we use this as an opportunity to make the command more powerful?
 For example, integrating this with the 'transaction' command or a block
job queried by 'query-block-jobs' to track its progress would be useful.
 Also, suppose I have A <- B <- C.  Does 'commit' only do one layer (C
into B), or all layers (B and C into A)?  That argues that we need an
optional parameter that says how deep to commit (committing C into B
only to repeat and commit B into A is more time-consuming than directly
committing both B and C into A to start with).  When a commit is
complete, which file is backing the device - is it still C (which
continues to diverge, but now from the point of the commit) or does qemu
pivot things to have the device now backed by B (and C can be discarded,
particularly true if changes are now going into B which invalidate C).

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] IO performance test on the tcm-vhost scsi

2012-06-14 Thread Paolo Bonzini

Il 14/06/2012 14:07, Stefan Hajnoczi ha scritto:
> Perfect, thanks.  virtio-scsi userspace is much better than virtio-blk
> here.  That's unexpected since they both use the QEMU block layer.  If
> anything, I would have expected virtio-blk to be faster!

Yes, I would have expected something similar.  A blktrace would be
useful here because Asias measured the opposite---virtio-scsi being much
slower than virtio-blk.

> The second question is why is tcm_vhost faster than virtio-scsi
> userspace.

I would expect a difference on more high-end benchmarks (i.e. lots of
I/O to lots of disks), similar to vhost-blk.  In this simple case I
wonder how much it is due to the vagaries of the I/O scheduler, or even
statistical noise.

Paolo

[Qemu-devel] [PATCH v2 0/3] qcow2: fix autoclear image header update

2012-06-14 Thread Stefan Hajnoczi

The code path to clear unknown autoclear feature bits when opening a qcow2
image file was broken.  This series includes the fix along with the qemu-iotest
to verify the issue and prevent regressions in the future.

v2:
 * Add qemu-iotest [Kevin]

Stefan Hajnoczi (3):
  qcow2: fix autoclear image header update
  qemu-iotests: add qcow2.py set-feature-bit command
  qemu-iotests: add 036 autoclear feature bit test

 block/qcow2.c   |   17 ++-
 tests/qemu-iotests/036  |   68 +++
 tests/qemu-iotests/036.out  |   52 +
 tests/qemu-iotests/group|1 +
 tests/qemu-iotests/qcow2.py |   23 +++
 5 files changed, 153 insertions(+), 8 deletions(-)
 create mode 100755 tests/qemu-iotests/036
 create mode 100644 tests/qemu-iotests/036.out

-- 
1.7.10

[Qemu-devel] [PATCH v2 1/3] qcow2: fix autoclear image header update

2012-06-14 Thread Stefan Hajnoczi

The autoclear feature bits can be used for qcow2 file format features
that are safe to "drop" by old programs that do not understand the
feature.  Upon opening the image file unknown autoclear feature bits are
cleared and the image file header is rewritten, but this was happening
too early in the code when critical header fields were not yet loaded.

Process autoclear feature bits after all necessary header information
has been loaded.

Signed-off-by: Stefan Hajnoczi 
---
 block/qcow2.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d66de58..aa3f176 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -298,14 +298,6 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 goto fail;
 }
 
-if (!bs->read_only && s->autoclear_features != 0) {
-s->autoclear_features = 0;
-ret = qcow2_update_header(bs);
-if (ret < 0) {
-goto fail;
-}
-}
-
 /* Check support for various header values */
 if (header.refcount_order != 4) {
 report_unsupported(bs, "%d bit reference counts",
@@ -411,6 +403,15 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 goto fail;
 }
 
+/* Clear unknown autoclear feature bits */
+if (!bs->read_only && s->autoclear_features != 0) {
+s->autoclear_features = 0;
+ret = qcow2_update_header(bs);
+if (ret < 0) {
+goto fail;
+}
+}
+
 /* Initialise locks */
 qemu_co_mutex_init(&s->lock);
 
-- 
1.7.10

[Qemu-devel] [PATCH v2 3/3] qemu-iotests: add 036 autoclear feature bit test

2012-06-14 Thread Stefan Hajnoczi

This new test validates the autoclear feature bit behavior.  When QEMU
opens a qcow2v3 image file with an unknown autoclear feature bit the bit
should be cleared in the image file header.

Signed-off-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/036 |   68 
 tests/qemu-iotests/036.out |   52 +
 tests/qemu-iotests/group   |1 +
 3 files changed, 121 insertions(+)
 create mode 100755 tests/qemu-iotests/036
 create mode 100644 tests/qemu-iotests/036.out

diff --git a/tests/qemu-iotests/036 b/tests/qemu-iotests/036
new file mode 100755
index 000..329533e
--- /dev/null
+++ b/tests/qemu-iotests/036
@@ -0,0 +1,68 @@
+#!/bin/bash
+#
+# Test that qcow2 unknown autoclear feature bits are cleared
+#
+# Copyright (C) 2011 Red Hat, Inc.
+# Copyright IBM, Corp. 2010
+#
+# Based on test 031.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=stefa...@linux.vnet.ibm.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.pattern
+
+# This tests qcow2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto generic
+_supported_os Linux
+
+# Only qcow2v3 and later supports feature bits
+IMGOPTS="compat=1.1"
+
+echo === Create image with unknown autoclear feature bit ===
+echo
+_make_test_img 64M
+./qcow2.py $TEST_IMG set-feature-bit autoclear 63
+./qcow2.py $TEST_IMG dump-header
+
+echo
+echo === Repair image ===
+echo
+$QEMU_IMG check -r all $TEST_IMG
+./qcow2.py $TEST_IMG dump-header
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
new file mode 100644
index 000..6953e37
--- /dev/null
+++ b/tests/qemu-iotests/036.out
@@ -0,0 +1,52 @@
+QA output created by 036
+=== Create image with unknown autoclear feature bit ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+magic 0x514649fb
+version   3
+backing_file_offset   0x0
+backing_file_size 0x0
+cluster_bits  16
+size  67108864
+crypt_method  0
+l1_size   1
+l1_table_offset   0x3
+refcount_table_offset 0x1
+refcount_table_clusters   1
+nb_snapshots  0
+snapshot_offset   0x0
+incompatible_features 0x0
+compatible_features   0x0
+autoclear_features0x8000
+refcount_order4
+header_length 104
+
+
+=== Repair image ===
+
+No errors were found on the image.
+magic 0x514649fb
+version   3
+backing_file_offset   0x0
+backing_file_size 0x0
+cluster_bits  16
+size  67108864
+crypt_method  0
+l1_size   1
+l1_table_offset   0x3
+refcount_table_offset 0x1
+refcount_table_clusters   1
+nb_snapshots  0
+snapshot_offset   0x0
+incompatible_features 0x0
+compatible_features   0x0
+autoclear_features0x0
+refcount_order4
+header_length 104
+
+Header extension:
+magic 0x6803f857
+length0
+data  ''
+
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 36ebf1a..f9b0f5b 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -42,3 +42,4 @@
 033 rw auto
 034 rw auto backing
 035 rw auto quick
+036 rw auto quick
-- 
1.7.10

[Qemu-devel] [PATCH v2 2/3] qemu-iotests: add qcow2.py set-feature-bit command

2012-06-14 Thread Stefan Hajnoczi

This new command sets feature bits in the image file header:

  qcow2.py set-feature-bit incompatible|compatible|autoclear 

The bit number must be in the range [0, 64).

Signed-off-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/qcow2.py |   23 +++
 1 file changed, 23 insertions(+)

diff --git a/tests/qemu-iotests/qcow2.py b/tests/qemu-iotests/qcow2.py
index e27196a..97f3770 100755
--- a/tests/qemu-iotests/qcow2.py
+++ b/tests/qemu-iotests/qcow2.py
@@ -181,10 +181,33 @@ def cmd_del_header_ext(fd, magic):
 
 h.update(fd)
 
+def cmd_set_feature_bit(fd, group, bit):
+try:
+bit = int(bit, 0)
+if bit < 0 or bit >= 64:
+raise ValueError
+except:
+print "'%s' is not a valid bit number in range [0, 64)" % bit
+sys.exit(1)
+
+h = QcowHeader(fd)
+if group == 'incompatible':
+h.incompatible_features |= 1 << bit
+elif group == 'compatible':
+h.compatible_features |= 1 << bit
+elif group == 'autoclear':
+h.autoclear_features |= 1 << bit
+else:
+print "'%s' is not a valid group, try 'incompatible', 'compatible', or 
'autoclear'" % group
+sys.exit(1)
+
+h.update(fd)
+
 cmds = [
 [ 'dump-header',cmd_dump_header,0, 'Dump image header and header 
extensions' ],
 [ 'add-header-ext', cmd_add_header_ext, 2, 'Add a header extension' ],
 [ 'del-header-ext', cmd_del_header_ext, 1, 'Delete a header extension' ],
+[ 'set-feature-bit', cmd_set_feature_bit, 2, 'Set a feature bit'],
 ]
 
 def main(filename, cmd, args):
-- 
1.7.10

Re: [Qemu-devel] [PATCH 1/3] tests: fix dependency inclusion

2012-06-14 Thread Andreas Färber

Am 13.06.2012 22:55, schrieb Anthony Liguori:
> Signed-off-by: Anthony Liguori 
> ---
>  Makefile |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 32550cb..93046bf 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -406,4 +406,4 @@ Makefile: $(GENERATED_HEADERS)
>  
>  # Include automatically generated dependency files
>  -include $(wildcard *.d audio/*.d slirp/*.d block/*.d net/*.d ui/*.d 
> qapi/*.d)
> --include $(wildcard qga/*.d hw/*.d hw/usb/*.d)
> +-include $(wildcard qga/*.d hw/*.d hw/usb/*.d tests/*.d)

The equivalent fix for qom/*.d was denied me just recently...

I don't see that solved through Paolo's Makefile redesign though, my
local patch still applies.

Andreas
-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 1/3] tests: fix dependency inclusion

2012-06-14 Thread Paolo Bonzini

Il 14/06/2012 15:08, Andreas Färber ha scritto:
>> >  # Include automatically generated dependency files
>> >  -include $(wildcard *.d audio/*.d slirp/*.d block/*.d net/*.d ui/*.d 
>> > qapi/*.d)
>> > --include $(wildcard qga/*.d hw/*.d hw/usb/*.d)
>> > +-include $(wildcard qga/*.d hw/*.d hw/usb/*.d tests/*.d)
> The equivalent fix for qom/*.d was denied me just recently...
> 
> I don't see that solved through Paolo's Makefile redesign though, my
> local patch still applies.

Yeah, let's just do this for now (which applies to your patch as well,
Andreas).  We'll fix it right for 1.2, but that's no reason to inflict
pain upon ourselves until then.

Paolo

Re: [Qemu-devel] [PATCH] trace: added ability to comment out events in the list

2012-06-14 Thread Stefan Hajnoczi

On Thu, Jun 14, 2012 at 02:41:40PM +1000, Alexey Kardashevskiy wrote:
> It is convenient for debug to be able to switch on/off some events easily.
> The only possibility now is to remove event name from the file completely
> and type it again when we want it back.
> 
> The patch adds '#' symbol handling as a comment specifier.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  trace/control.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)

Thanks, applied to the tracing patches tree:
https://github.com/stefanha/qemu/commits/tracing

Stefan

Re: [Qemu-devel] wiki account

2012-06-14 Thread Stefan Hajnoczi

On Thu, Jun 14, 2012 at 3:31 AM, Peter Crosthwaite
 wrote:
> I wish to upload my test images to the wiki, and I see the process for
> getting an account is to contact via email. Can someone create me an
> account?

Done.  I have emailed you the details.

For anyone else with an existing wiki account wondering how to do this:
Login in with your existing account, go to "Special pages", go to
"Login/create account".

Stefan

Re: [Qemu-devel] [PATCH v2 2/4] qapi: Add passfd QMP command

2012-06-14 Thread Luiz Capitulino

On Wed, 13 Jun 2012 18:07:43 -0400
Corey Bryant  wrote:

> 
> 
> On 06/13/2012 04:47 PM, Eric Blake wrote:
> > On 06/13/2012 02:25 PM, Corey Bryant wrote:
> >
> >>> Also, getfd automatically closes a fd if an existing fdname is passed
> >>> again.
> >>> I don't think this is a good behavior, I think pass-fd should fail
> >>> instead
> >>> (note that we can't fix getfd though).
> >>>
> >>
> >> I agree.  It makes sense to fail rather than blindly closing the
> >> existing fd.  It can be closed explicitly with closefd if the user wants
> >> it closed.
> >
> > Hmm - what happens if I do 'pass-fd name', learn that qemu is using fd
> > 42, then do 'getfd name'?  I silently wipe out fd 42 and replace it with
> > the new fd passed in by getfd.  Which means my use of /dev/fd/42 will
> > now be broken.
> >
> > Obviously that means that 'getfd' should NOT be used by any application
> > using 'pass-fd', and that libvirt should NOT be reusing names (I think
> > the latter is already true).  But I agree that for back-compat we can't
> > get rid of the current (evil) semantics of a duplicated 'getfd'.
> 
> Yes, users need to be careful and understand how the commands work.  I 
> don't think it's a hard rule that 'getfd' can't be used by an 
> application that uses 'pass-fd'.  If it were, we could put the fds on 
> separate lists:
> 
>   struct Monitor {
>   ...
>   QLIST_HEAD(,mon_fd_t) fds;
> +QLIST_HEAD(,mon_fd_t) pass_fds;
>   };

We'd a different closefd command if we do this.

> But I don't think this is necessary, so I'll plan on documenting them well.

Agreed, I don't think this is necessary.

[Qemu-devel] [PATCH v5 1/3] monitor: remove unused do_info_trace

2012-06-14 Thread Harsh Prateek Bora

Going forward with simpletrace v2 variable size trace records, we cannot
have a generic function to print trace event info and therefore this
interface becomes invalid.

As per Stefan Hajnoczi:

"This command is only available from the human monitor.  It's not very
useful because it historically hasn't been able to pretty-print events
or show them in the right order (we use a ringbuffer but it prints
them out from index 0).

Therefore, I don't think we're under any obligation to keep this
command around.  No one has complained about it's limitations - I
think this is a sign that no one has used it.  I'd be okay with a
patch that removes it."

Ref: http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01268.html

Signed-off-by: Harsh Prateek Bora 
---
 monitor.c  |   16 
 trace/simple.c |   18 --
 trace/simple.h |1 -
 3 files changed, 35 deletions(-)

diff --git a/monitor.c b/monitor.c
index a3bc2c7..1fb00ab 100644
--- a/monitor.c
+++ b/monitor.c
@@ -795,13 +795,6 @@ static void do_info_cpu_stats(Monitor *mon)
 }
 #endif
 
-#if defined(CONFIG_TRACE_SIMPLE)
-static void do_info_trace(Monitor *mon)
-{
-st_print_trace((FILE *)mon, &monitor_fprintf);
-}
-#endif
-
 static void do_trace_print_events(Monitor *mon)
 {
 trace_print_events((FILE *)mon, &monitor_fprintf);
@@ -2568,15 +2561,6 @@ static mon_cmd_t info_cmds[] = {
 .help   = "show roms",
 .mhandler.info = do_info_roms,
 },
-#if defined(CONFIG_TRACE_SIMPLE)
-{
-.name   = "trace",
-.args_type  = "",
-.params = "",
-.help   = "show current contents of trace buffer",
-.mhandler.info = do_info_trace,
-},
-#endif
 {
 .name   = "trace-events",
 .args_type  = "",
diff --git a/trace/simple.c b/trace/simple.c
index b4a3c6e..b64bcf4 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -291,24 +291,6 @@ void st_print_trace_file_status(FILE *stream, int 
(*stream_printf)(FILE *stream,
   trace_file_name, trace_fp ? "on" : "off");
 }
 
-void st_print_trace(FILE *stream, int (*stream_printf)(FILE *stream, const 
char *fmt, ...))
-{
-unsigned int i;
-
-for (i = 0; i < TRACE_BUF_LEN; i++) {
-TraceRecord record;
-
-if (!get_trace_record(i, &record)) {
-continue;
-}
-stream_printf(stream, "Event %" PRIu64 " : %" PRIx64 " %" PRIx64
-  " %" PRIx64 " %" PRIx64 " %" PRIx64 " %" PRIx64 "\n",
-  record.event, record.x1, record.x2,
-  record.x3, record.x4, record.x5,
-  record.x6);
-}
-}
-
 void st_flush_trace_buffer(void)
 {
 flush_trace_file(true);
diff --git a/trace/simple.h b/trace/simple.h
index 466e75b..6b5358c 100644
--- a/trace/simple.h
+++ b/trace/simple.h
@@ -29,7 +29,6 @@ void trace3(TraceEventID event, uint64_t x1, uint64_t x2, 
uint64_t x3);
 void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, 
uint64_t x4);
 void trace5(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, 
uint64_t x4, uint64_t x5);
 void trace6(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, 
uint64_t x4, uint64_t x5, uint64_t x6);
-void st_print_trace(FILE *stream, fprintf_function stream_printf);
 void st_print_trace_file_status(FILE *stream, fprintf_function stream_printf);
 void st_set_trace_file_enabled(bool enable);
 bool st_set_trace_file(const char *file);
-- 
1.7.10.2

[Qemu-devel] [PATCH v5 3/3] Update simpletrace.py for new log format

2012-06-14 Thread Harsh Prateek Bora

Support new tracelog format for multiple arguments and strings.

Signed-off-by: Harsh Prateek Bora 
---
 scripts/simpletrace.py |  116 +++-
 1 file changed, 75 insertions(+), 41 deletions(-)

diff --git a/scripts/simpletrace.py b/scripts/simpletrace.py
index f55e5e6..9b4419f 100755
--- a/scripts/simpletrace.py
+++ b/scripts/simpletrace.py
@@ -12,53 +12,69 @@
 import struct
 import re
 import inspect
+from tracetool import _read_events, Event
+from tracetool.backend.simple import is_string
 
 header_event_id = 0x
 header_magic= 0xf2b177cb0aa429b4
-header_version  = 0
 dropped_event_id = 0xfffe
 
-trace_fmt = '='
-trace_len = struct.calcsize(trace_fmt)
-event_re  = re.compile(r'(disable\s+)?([a-zA-Z0-9_]+)\(([^)]*)\).*')
+log_header_fmt = '=QQQ'
+rec_header_fmt = '=QQII'
 
-def parse_events(fobj):
-"""Parse a trace-events file into {event_num: (name, arg1, ...)}."""
-
-def get_argnames(args):
-"""Extract argument names from a parameter list."""
-return tuple(arg.split()[-1].lstrip('*') for arg in args.split(','))
-
-events = {dropped_event_id: ('dropped', 'count')}
-event_num = 0
-for line in fobj:
-m = event_re.match(line.strip())
-if m is None:
-continue
-
-disable, name, args = m.groups()
-events[event_num] = (name,) + get_argnames(args)
-event_num += 1
-return events
+def read_header(fobj, hfmt):
+'''Read a trace record header'''
+hlen = struct.calcsize(hfmt)
+hdr = fobj.read(hlen)
+if len(hdr) != hlen:
+return None
+return struct.unpack(hfmt, hdr)
 
-def read_record(fobj):
+def get_record(edict, rechdr, fobj):
 """Deserialize a trace record from a file into a tuple (event_num, 
timestamp, arg1, ..., arg6)."""
-s = fobj.read(trace_len)
-if len(s) != trace_len:
+if rechdr is None:
 return None
-return struct.unpack(trace_fmt, s)
+rec = (rechdr[0], rechdr[1])
+if rechdr[0] != dropped_event_id:
+event_id = rechdr[0]
+event = edict[event_id]
+for type, name in event.args:
+if is_string(type):
+l = fobj.read(4)
+(len,) = struct.unpack('=L', l)
+s = fobj.read(len)
+rec = rec + (s,)
+else:
+(value,) = struct.unpack('=Q', fobj.read(8))
+rec = rec + (value,)
+else:
+(value,) = struct.unpack('=Q', fobj.read(8))
+rec = rec + (value,)
+return rec
+
+
+def read_record(edict, fobj):
+"""Deserialize a trace record from a file into a tuple (event_num, 
timestamp, arg1, ..., arg6)."""
+rechdr = read_header(fobj, rec_header_fmt)
+return get_record(edict, rechdr, fobj) # return tuple of record elements
 
-def read_trace_file(fobj):
+def read_trace_file(edict, fobj):
 """Deserialize trace records from a file, yielding record tuples 
(event_num, timestamp, arg1, ..., arg6)."""
-header = read_record(fobj)
+header = read_header(fobj, log_header_fmt)
 if header is None or \
header[0] != header_event_id or \
-   header[1] != header_magic or \
-   header[2] != header_version:
-raise ValueError('not a trace file or incompatible version')
+   header[1] != header_magic:
+raise ValueError('Not a valid trace file!')
+if header[2] != 0 and \
+   header[2] != 2:
+raise ValueError('Unknown version of tracelog format!')
+
+log_version = header[2]
+if log_version == 0:
+raise ValueError('Older log format, not supported with this Qemu 
release!')
 
 while True:
-rec = read_record(fobj)
+rec = read_record(edict, fobj)
 if rec is None:
 break
 
@@ -89,16 +105,29 @@ class Analyzer(object):
 def process(events, log, analyzer):
 """Invoke an analyzer on each event in a log."""
 if isinstance(events, str):
-events = parse_events(open(events, 'r'))
+events = _read_events(open(events, 'r'))
 if isinstance(log, str):
 log = open(log, 'rb')
 
+enabled_events = []
+dropped_event = Event.build("Dropped_Event(uint64_t num_events_dropped)")
+edict = {dropped_event_id: dropped_event}
+
+for e in events:
+if 'disable' not in e.properties:
+enabled_events.append(e)
+for num, event in enumerate(enabled_events):
+edict[num] = event
+
 def build_fn(analyzer, event):
-fn = getattr(analyzer, event[0], None)
+if isinstance(event, str):
+return analyzer.catchall
+
+fn = getattr(analyzer, event.name, None)
 if fn is None:
 return analyzer.catchall
 
-event_argcount = len(event) - 1
+event_argcount = len(event.args)
 fn_argcount = len(inspect.getargspec(fn)[0]) - 1
 if fn_argcount == event_argcount + 1:
 # Include timestamp as first argument
@@ -10

[Qemu-devel] [PATCH v5 0/3] Simpletrace v2: Support multiple args, strings.

2012-06-14 Thread Harsh Prateek Bora

Existing simpletrace backend allows to trace at max 6 args and does not
support strings. This newer tracelog format gets rid of fixed size records
and therefore allows to trace variable number of args including strings.

Sample trace:
v9fs_version 0.000 tag=0x id=0x64 msize=0x2000 version=9P2000.L
v9fs_version_return 6.705 tag=0x id=0x64 msize=0x2000 version=9P2000.L
v9fs_attach 174.467 tag=0x1 id=0x68 fid=0x0 afid=0x
uname=nobody aname=
v9fs_attach_return 4720.454 tag=0x1 id=0x68 type=0xff80
version=0x4f2a4dd0  path=0x220ea6

v5:
- Addressed Stefan's review comments on v4 series.
- Inroduced code to handle corrupted records due to racing tracers.

v4:
- removed unused safe_strlen interface (missed in v3).

v3:
- Addressed Stefan's review comments on v2 series

v2:
- moved elimination of st_print_trace to 1/3 patch
- use sizeof(TraceRecord) for #define ST_REC_HDR_LEN

v1:
- Simpletrace v2 for the new (pythonized) tracetool


Harsh Prateek Bora (3):
  monitor: remove unused do_info_trace
  Simpletrace v2: Support multiple arguments, strings.
  Update simpletrace.py for new log format

 monitor.c   |   16 --
 scripts/simpletrace.py  |  116 ++-
 scripts/tracetool/backend/simple.py |   84 ---
 trace/simple.c  |  280 +--
 trace/simple.h  |   38 -
 5 files changed, 338 insertions(+), 196 deletions(-)

-- 
1.7.10.2

[Qemu-devel] [PATCH v5 2/3] Simpletrace v2: Support multiple arguments, strings.

2012-06-14 Thread Harsh Prateek Bora

Existing simpletrace backend allows to trace at max 6 args and does not
support strings. This newer tracelog format gets rid of fixed size records
and therefore allows to trace variable number of args including strings.

Sample trace with strings:
v9fs_version 0.000 tag=0x id=0x64 msize=0x2000 version=9P2000.L
v9fs_version_return 6.705 tag=0x id=0x64 msize=0x2000 version=9P2000.L

Signed-off-by: Harsh Prateek Bora 
---
 scripts/tracetool/backend/simple.py |   84 ---
 trace/simple.c  |  262 ++-
 trace/simple.h  |   37 -
 3 files changed, 263 insertions(+), 120 deletions(-)

diff --git a/scripts/tracetool/backend/simple.py 
b/scripts/tracetool/backend/simple.py
index fbb5717..24e02ac 100644
--- a/scripts/tracetool/backend/simple.py
+++ b/scripts/tracetool/backend/simple.py
@@ -15,9 +15,16 @@ __email__  = "stefa...@linux.vnet.ibm.com"
 
 from tracetool import out
 
+def is_string(arg):
+strtype = ('const char*', 'char*', 'const char *', 'char *')
+if arg.lstrip().startswith(strtype):
+return True
+else:
+return False
 
 def c(events):
 out('#include "trace.h"',
+'#include "trace/simple.h"',
 '',
 'TraceEvent trace_list[] = {')
 
@@ -26,30 +33,69 @@ def c(events):
 name = e.name,
 )
 
-out('};')
-
-def h(events):
-out('#include "trace/simple.h"',
+out('};',
 '')
 
-for num, e in enumerate(events):
-if len(e.args):
-argstr = e.args.names()
-arg_prefix = ', (uint64_t)(uintptr_t)'
-cast_args = arg_prefix + arg_prefix.join(argstr)
-simple_args = (str(num) + cast_args)
-else:
-simple_args = str(num)
+for num, event in enumerate(events):
+sizes = []
+for type_, name in event.args:
+if is_string(type_):
+sizes.append("4 + (" + name + " ? strlen(" + name + ") : 0)")
+else:
+sizes.append("8")
+sizestr = " + ".join(sizes)
+if len(event.args) == 0:
+sizestr = '0'
 
-out('static inline void trace_%(name)s(%(args)s)',
+out('void trace_%(name)s(%(args)s)',
 '{',
-'trace%(argc)d(%(trace_args)s);',
-'}',
-name = e.name,
-args = e.args,
-argc = len(e.args),
-trace_args = simple_args,
+'TraceBufferRecord rec;',
+'',
+'if (!trace_list[%(event_id)s].state) {',
+'return;',
+'}',
+'',
+'if (trace_record_start(&rec, %(event_id)s, %(size_str)s)) {',
+'return; /* Trace Buffer Full, Event Dropped ! */',
+'}',
+name = event.name,
+args = event.args,
+event_id = num,
+size_str = sizestr,
 )
 
+if len(event.args) > 0:
+for type_, name in event.args:
+# string
+if is_string(type_):
+out('trace_record_write_str(&rec, %(name)s);',
+name = name,
+   )
+# pointer var (not string)
+elif type_.endswith('*'):
+out('trace_record_write_u64(&rec, (uint64_t)(uint64_t 
*)%(name)s);',
+name = name,
+   )
+# primitive data type
+else:
+out('trace_record_write_u64(&rec, 
(uint64_t)%(name)s);',
+   name = name,
+   )
+
+out('trace_record_finish(&rec);',
+'}',
+'')
+
+
+def h(events):
+out('#include "trace/simple.h"',
+'')
+
+for event in events:
+out('void trace_%(name)s(%(args)s);',
+name = event.name,
+args = event.args,
+)
+out('')
 out('#define NR_TRACE_EVENTS %d' % len(events))
 out('extern TraceEvent trace_list[NR_TRACE_EVENTS];')
diff --git a/trace/simple.c b/trace/simple.c
index b64bcf4..4bf90f3 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -27,7 +27,7 @@
 #define HEADER_MAGIC 0xf2b177cb0aa429b4ULL
 
 /** Trace file version number, bump if format changes */
-#define HEADER_VERSION 0
+#define HEADER_VERSION 2
 
 /** Records were dropped event ID */
 #define DROPPED_EVENT_ID (~(uint64_t)0 - 1)
@@ -35,23 +35,6 @@
 /** Trace record is valid */
 #define TRACE_RECORD_VALID ((uint64_t)1 << 63)
 
-/** Trace buffer entry */
-typedef struct {
-uint64_t event;
-uint64_t timestamp_ns;
-uint64_t x1;
-uint64_t x2;
-uint64_t x3;
-uint64_t x4;
-uint64_t x5;
-uint64_t x6;
-} TraceRecord;
-
-enum {
-TRACE_BUF_LEN = 4096,
-TRACE_BUF_FLUSH_THRESHOLD = TRACE_BUF_LEN / 4,
-};
-
 /*
  * Trace records are written out by a dedicated thr

Re: [Qemu-devel] Compiling static

2012-06-14 Thread Andreas Färber

Am 14.06.2012 12:03, schrieb Davide Ferraretto:
> I want compile qemu with --static:
> ./configure --static --target-list=i386-linux-user,arm-linux-user
> --python=/usr/bin/python2.7 --prefix=/install_qemu
> 
> Qemu returns:
> /usr/bin/ld: cannot find -lssl3
> /usr/bin/ld: cannot find -lsmime3
> /usr/bin/ld: cannot find -lnss3
> /usr/bin/ld: cannot find -lnssutil3
> collect2: error: ld returned 1 exit status
> 
> How resolve??

I reported that as part of our v1.1 packaging: Someone needs to provide
a patch to configure to only test the libraries actually used for
linux-user / bsd-user.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH] block: Prevent /dev/fd/X filename from being detected as floppy

2012-06-14 Thread Corey Bryant




On 06/14/2012 05:24 AM, Paolo Bonzini wrote:

Il 13/06/2012 16:30, root ha scritto:

From: Corey Bryant 

Reported-by: Kevin Wolf 
Signed-off-by: Corey Bryant 
---
  block/raw-posix.c |4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index d8eff2f..68886cd 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -946,9 +946,11 @@ static int floppy_probe_device(const char *filename)
  int prio = 0;
  struct floppy_struct fdparam;
  struct stat st;
+const char *p;

-if (strstart(filename, "/dev/fd", NULL))
+if (strstart(filename, "/dev/fd", &p) && p[0] != '/') {
  prio = 50;
+}

  fd = qemu_open(filename, O_RDONLY | O_NONBLOCK);
  if (fd < 0) {



Reviewed-by: Paolo Bonzini 



Thanks!

--
Regards,
Corey

Re: [Qemu-devel] [PATCH 08/10] qxl: add vgamem_size_mb and vgamem_size

2012-06-14 Thread Alon Levy

On Tue, Jun 12, 2012 at 09:51:12AM +0200, Gerd Hoffmann wrote:
> From: Alon Levy 
> 

Please don't push this, it breaks both drivers, leaving n_modes 0 in the
rom. I'll send a fixed version.

> In preperation for supporting a larger framebuffer for multiple monitors
> on a single card, add a property to qxl vgamem_size_mb, and corresponding
> byte sized vgamem_size, and use instead of VGA_RAM_SIZE.
> 
> [ kraxel: simplify property handling, add sanity checks ]
> [ kraxel: fix mode copying ]
> 
> Signed-off-by: Alon Levy 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/qxl.c |   72 -
>  hw/qxl.h |2 +
>  2 files changed, 44 insertions(+), 30 deletions(-)
> 
> diff --git a/hw/qxl.c b/hw/qxl.c
> index c40cf55..c9028dd 100644
> --- a/hw/qxl.c
> +++ b/hw/qxl.c
> @@ -27,8 +27,6 @@
>  
>  #include "qxl.h"
>  
> -#define VGA_RAM_SIZE (8192 * 1024)
> -
>  /*
>   * NOTE: SPICE_RING_PROD_ITEM accesses memory on the pci bar and as
>   * such can be changed by the guest, so to avoid a guest trigerrable
> @@ -116,20 +114,16 @@ static QXLMode qxl_modes[] = {
>  QXL_MODE_EX(1600, 1200),
>  QXL_MODE_EX(1680, 1050),
>  QXL_MODE_EX(1920, 1080),
> -#if VGA_RAM_SIZE >= (16 * 1024 * 1024)
>  /* these modes need more than 8 MB video memory */
>  QXL_MODE_EX(1920, 1200),
>  QXL_MODE_EX(1920, 1440),
>  QXL_MODE_EX(2048, 1536),
>  QXL_MODE_EX(2560, 1440),
>  QXL_MODE_EX(2560, 1600),
> -#endif
> -#if VGA_RAM_SIZE >= (32 * 1024 * 1024)
>  /* these modes need more than 16 MB video memory */
>  QXL_MODE_EX(2560, 2048),
>  QXL_MODE_EX(2800, 2100),
>  QXL_MODE_EX(3200, 2400),
> -#endif
>  };
>  
>  static PCIQXLDevice *qxl0;
> @@ -286,6 +280,7 @@ static inline uint32_t msb_mask(uint32_t val)
>  static ram_addr_t qxl_rom_size(void)
>  {
>  uint32_t rom_size = sizeof(QXLRom) + sizeof(QXLModes) + 
> sizeof(qxl_modes);
> +
>  rom_size = MAX(rom_size, TARGET_PAGE_SIZE);
>  rom_size = msb_mask(rom_size * 2 - 1);
>  return rom_size;
> @@ -298,8 +293,8 @@ static void init_qxl_rom(PCIQXLDevice *d)
>  uint32_t ram_header_size;
>  uint32_t surface0_area_size;
>  uint32_t num_pages;
> -uint32_t fb, maxfb = 0;
> -int i;
> +uint32_t fb;
> +int i, n;
>  
>  memset(rom, 0, d->rom_size);
>  
> @@ -314,26 +309,25 @@ static void init_qxl_rom(PCIQXLDevice *d)
>  rom->slots_end = NUM_MEMSLOTS - 1;
>  rom->n_surfaces= cpu_to_le32(NUM_SURFACES);
>  
> -modes->n_modes = cpu_to_le32(ARRAY_SIZE(qxl_modes));
> -for (i = 0; i < modes->n_modes; i++) {
> +for (i = 0, n = 0; i < modes->n_modes; i++) {
>  fb = qxl_modes[i].y_res * qxl_modes[i].stride;
> -if (maxfb < fb) {
> -maxfb = fb;
> +if (fb > d->vgamem_size) {
> +continue;
>  }
> -modes->modes[i].id  = cpu_to_le32(i);
> -modes->modes[i].x_res   = cpu_to_le32(qxl_modes[i].x_res);
> -modes->modes[i].y_res   = cpu_to_le32(qxl_modes[i].y_res);
> -modes->modes[i].bits= cpu_to_le32(qxl_modes[i].bits);
> -modes->modes[i].stride  = cpu_to_le32(qxl_modes[i].stride);
> -modes->modes[i].x_mili  = cpu_to_le32(qxl_modes[i].x_mili);
> -modes->modes[i].y_mili  = cpu_to_le32(qxl_modes[i].y_mili);
> -modes->modes[i].orientation = cpu_to_le32(qxl_modes[i].orientation);
> -}
> -if (maxfb < VGA_RAM_SIZE && d->id == 0)
> -maxfb = VGA_RAM_SIZE;
> +modes->modes[n].id  = cpu_to_le32(i);
> +modes->modes[n].x_res   = cpu_to_le32(qxl_modes[i].x_res);
> +modes->modes[n].y_res   = cpu_to_le32(qxl_modes[i].y_res);
> +modes->modes[n].bits= cpu_to_le32(qxl_modes[i].bits);
> +modes->modes[n].stride  = cpu_to_le32(qxl_modes[i].stride);
> +modes->modes[n].x_mili  = cpu_to_le32(qxl_modes[i].x_mili);
> +modes->modes[n].y_mili  = cpu_to_le32(qxl_modes[i].y_mili);
> +modes->modes[n].orientation = cpu_to_le32(qxl_modes[i].orientation);
> +n++;
> +}
> +modes->n_modes = cpu_to_le32(n);
>  
>  ram_header_size= ALIGN(sizeof(QXLRam), 4096);
> -surface0_area_size = ALIGN(maxfb, 4096);
> +surface0_area_size = ALIGN(d->vgamem_size, 4096);
>  num_pages  = d->vga.vram_size;
>  num_pages -= ram_header_size;
>  num_pages -= surface0_area_size;
> @@ -1205,6 +1199,16 @@ static void qxl_create_guest_primary(PCIQXLDevice 
> *qxl, int loadvm,
>  {
>  QXLDevSurfaceCreate surface;
>  QXLSurfaceCreate *sc = &qxl->guest_primary.surface;
> +int size;
> +int requested_height = le32_to_cpu(sc->height);
> +int requested_stride = le32_to_cpu(sc->stride);
> +
> +size = abs(requested_stride) * requested_height;
> +if (size > qxl->vgamem_size) {
> +qxl_set_guest_bug(qxl, "%s: requested primary larger then 
> framebuffer"

[Qemu-devel] libguestfs now uses virtio-scsi, supports large numbers of disks

2012-06-14 Thread Richard W.M. Jones

I switched libguestfs over to using virtio-scsi.  One immediate
benefit is support for large numbers of disks: up to 255 because we're
using 1 target / disk and we reserve one disk for the appliance, in
theory more could be supported if we used LUNs.

This email just contains some notes that may be useful for others
trying to use large numbers of disks, and/or virtio-scsi, and/or qemu,
and/or Linux guests, all together.

* The command line syntax is convoluted.  You need to first of all
  define the virtio-scsi bus once:

-device virtio-scsi-pci,id=scsi

  and then, for each disk, you need these two parameters:

-drive file=disk.img,cache=off,format=raw,id=hd0,if=none
-device scsi-hd,drive=hd0

  (adjust 'id' and 'drive' to be a unique name for each disk)

* Linux probes the SCSI bus asynchronously.

  This was a problem for us because our initramfs doesn't contain
  udev; it originally insmod'd the virtio-scsi.ko module and expected
  the disks to be present (note that this works for virtio-blk).

  Possible solutions include: wait for the disks to become present, or
  use udev, or do something with devtmpfs, or read out the events from
  the kernel like udev does.  We modified our initramfs to wait for
  the disk to be seen under /sys/block, that being the easiest change
  to make at the moment.

  Also scsi_wait_scan is apparently deprecated.

* This describes the naming scheme for > 26 disks under Linux:
  
https://rwmj.wordpress.com/2011/01/09/how-are-linux-drives-named-beyond-drive-26-devsdz/

* Devices are named /dev/sdX instead of /dev/vdX (largely an
  improvement).

* With 256 disks, on fast hardware, with KVM, it takes a Linux guest
  about 3 seconds to probe the disks, and udev approx another 3
  seconds to process its event queue.

* With 256 disks, on fast hardware, but using TCG, it takes a Linux
  guest over 7 minutes to probe the disks.  This causes udev to
  timeout while processing the coldstart events, unless you increase
  the `udevadm settle --timeout' in the startup scripts.

  I don't understand why TCG is so much slower, since TCG itself isn't
  100 times slower normally.

* The overall libguestfs startup time (ie. with two disks) is down
  fractionally to 3.6 seconds, which is nice.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

[Qemu-devel] [PATCH] qxl: fix broken n_modes

2012-06-14 Thread Alon Levy

Signed-off-by: Alon Levy 
---
Hi Gerd,

 Unfortunately I didn't test my previous patch "qxl: add vgamem_size_mb and 
vgamem_size" with the current drivers. This patch is not merged since I didn't 
know if it will reach before/after anothony pull. If before, then please merge 
it into the bad patch.

 Thanks in advance,
Alon

 hw/qxl.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index 4c68769..458b9fc 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -322,7 +322,7 @@ static void init_qxl_rom(PCIQXLDevice *d)
 rom->slots_end = NUM_MEMSLOTS - 1;
 rom->n_surfaces= cpu_to_le32(NUM_SURFACES);
 
-for (i = 0; i < modes->n_modes; i++) {
+for (i = 0; i < ARRAY_SIZE(qxl_modes); i++) {
 fb = qxl_modes[i].y_res * qxl_modes[i].stride;
 if (fb <= d->vgamem_size) {
 n_modes++;
-- 
1.7.10.1

Re: [Qemu-devel] [PATCH] block: Prevent /dev/fd/X filename from being detected as floppy

2012-06-14 Thread Kevin Wolf

Am 14.06.2012 15:39, schrieb Corey Bryant:
> 
> 
> On 06/14/2012 05:24 AM, Paolo Bonzini wrote:
>> Il 13/06/2012 16:30, root ha scritto:
>>> From: Corey Bryant 
>>>
>>> Reported-by: Kevin Wolf 
>>> Signed-off-by: Corey Bryant 
>>> ---
>>>   block/raw-posix.c |4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>>> index d8eff2f..68886cd 100644
>>> --- a/block/raw-posix.c
>>> +++ b/block/raw-posix.c
>>> @@ -946,9 +946,11 @@ static int floppy_probe_device(const char *filename)
>>>   int prio = 0;
>>>   struct floppy_struct fdparam;
>>>   struct stat st;
>>> +const char *p;
>>>
>>> -if (strstart(filename, "/dev/fd", NULL))
>>> +if (strstart(filename, "/dev/fd", &p) && p[0] != '/') {
>>>   prio = 50;
>>> +}
>>>
>>>   fd = qemu_open(filename, O_RDONLY | O_NONBLOCK);
>>>   if (fd < 0) {
>>>
>>
>> Reviewed-by: Paolo Bonzini 
>>
> 
> Thanks!

I tried to apply this earlier today, but it depends on the other patch
series, so I couldn't. Can you just include it as patch 5 when you send
the next version?

Oh, and you'll probably want to fix your mailing setup to put into From:
something different than root@localhost.localdomain...

Kevin

Re: [Qemu-devel] [PATCH v5 2/3] Simpletrace v2: Support multiple arguments, strings.

2012-06-14 Thread Harsh Bora


Hi Stefan,
Please see comment below:

On 06/14/2012 07:02 PM, Harsh Prateek Bora wrote:

[...]


-static bool get_trace_record(unsigned int idx, TraceRecord *record)
+static bool get_trace_record(unsigned int idx, TraceRecord **recordptr)
  {
-if (!(trace_buf[idx].event&  TRACE_RECORD_VALID)) {
+uint8_t rec_hdr[sizeof(TraceRecord)];
+uint64_t event_flag = 0;
+TraceRecord *record = (TraceRecord *) rec_hdr;
+/* read the event flag to see if its a valid record */
+read_from_buffer(idx, rec_hdr, sizeof(event_flag));
+
+if (!(record->event&  TRACE_RECORD_VALID)) {
  return false;
  }

  __sync_synchronize(); /* read memory barrier before accessing record */
-
-*record = trace_buf[idx];
-record->event&= ~TRACE_RECORD_VALID;
-return true;
+/* read the record header to know record length */
+read_from_buffer(idx, rec_hdr, sizeof(TraceRecord));
+*recordptr = g_malloc(record->length);
+/* make a copy of record to avoid being overwritten */
+read_from_buffer(idx, (uint8_t *)*recordptr, record->length);
+__sync_synchronize(); /* memory barrier before clearing valid flag */
+(*recordptr)->event&= ~TRACE_RECORD_VALID;
+/* reset record event validity flag on global trace buffer */
+write_to_buffer(idx,&event_flag, sizeof(event_flag));
+if ((*recordptr)->event<  NR_TRACE_EVENTS) {
+return true;
+} else {
+/* XXX: Ideally this should not happen, but its possible !
+ * See comments in trace_record_start for more details.
+ * We dont know where to start next and therefore resetting
+ * writeout_idx to trace_idx. Also, as we dont know how many
+ * events dropped in between, at least increment the count by one.
+ */
+g_atomic_int_inc((gint *)&dropped_events);
+g_free(*recordptr);
+writeout_idx = trace_idx;
+return false;
+}
  }



[...]



-void trace1(TraceEventID event, uint64_t x1)
+int trace_record_start(TraceBufferRecord *rec, TraceEventID event, size_t 
datasize)
  {
-trace(event, x1, 0, 0, 0, 0, 0);
-}
+unsigned int idx, rec_off, old_idx, new_idx;
+uint32_t rec_len = sizeof(TraceRecord) + datasize;
+uint64_t timestamp_ns = get_clock();
+
+while (1) {
+old_idx = trace_idx;
+smp_rmb();
+new_idx = old_idx + rec_len;
+
+if (new_idx - writeout_idx>  TRACE_BUF_LEN) {
+/* Trace Buffer Full, Event dropped ! */
+g_atomic_int_inc((gint *)&dropped_events);
+return -ENOSPC;
+}

-void trace2(TraceEventID event, uint64_t x1, uint64_t x2)
-{
-trace(event, x1, x2, 0, 0, 0, 0);
-}
+/* XXX: If threads race here, take care in get_trace_record */



I see that this comment is invalid, but there is something corrupting 
the trace buffer. If I remove the check for event_id < NR_TRACE_EVENTS 
in get_trace_record, it is finding corrupted records after about 10K 
records being traced, strangely with valid flag set set.

I do not see any obvious reason for this, do you ?

regards,
Harsh



-void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3)
-{
-trace(event, x1, x2, x3, 0, 0, 0);
+if (g_atomic_int_compare_and_exchange((gint *)&trace_idx,
+  old_idx, new_idx)) {
+break;
+}
+}
+
+idx = old_idx % TRACE_BUF_LEN;
+/*  To check later if threshold crossed */
+rec->next_tbuf_idx = new_idx % TRACE_BUF_LEN;
+
+rec_off = idx;
+rec_off = write_to_buffer(rec_off, (uint8_t*)&event, sizeof(event));
+rec_off = write_to_buffer(rec_off, (uint8_t*)×tamp_ns, 
sizeof(timestamp_ns));
+rec_off = write_to_buffer(rec_off, (uint8_t*)&rec_len, sizeof(rec_len));
+
+rec->tbuf_idx = idx;
+rec->rec_off  = (idx + sizeof(TraceRecord)) % TRACE_BUF_LEN;
+return 0;
  }

-void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, 
uint64_t x4)
+static void read_from_buffer(unsigned int idx, void *dataptr, size_t size)
  {
-trace(event, x1, x2, x3, x4, 0, 0);
+uint8_t *data_ptr = dataptr;
+uint32_t x = 0;
+while (x<  size) {
+if (idx>= TRACE_BUF_LEN) {
+idx = idx % TRACE_BUF_LEN;
+}
+data_ptr[x++] = trace_buf[idx++];
+}
  }

-void trace5(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, 
uint64_t x4, uint64_t x5)
+static unsigned int write_to_buffer(unsigned int idx, void *dataptr, size_t 
size)
  {
-trace(event, x1, x2, x3, x4, x5, 0);
+uint8_t *data_ptr = dataptr;
+uint32_t x = 0;
+while (x<  size) {
+if (idx>= TRACE_BUF_LEN) {
+idx = idx % TRACE_BUF_LEN;
+}
+trace_buf[idx++] = data_ptr[x++];
+}
+return idx; /* most callers wants to know where to write next */
  }

-void trace6(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, 
uint64_t x4, uint64_t x5, uint64_t x6)
+void trace_record_finish(TraceBufferRec

Re: [Qemu-devel] [PATCH v2 2/6] ivshmem: Convert to msix_init_exclusive_bar() interface

2012-06-14 Thread Alex Williamson

On Thu, 2012-06-14 at 08:01 +0200, Jan Kiszka wrote:
> On 2012-06-14 06:51, Alex Williamson wrote:
> > Trivial conversion, failed to have an uninit before and after.
> 
> Need not be in this series, but we should fix that trivial bug nevertheless.

Given the exit(1) on failure to setup msix, I'm guessing ivshmem isn't
well tuned of hotplug, so I'm not sure how trivial it is.  Thanks,

Alex

> > 
> > Signed-off-by: Alex Williamson 
> > ---
> > 
> >  hw/ivshmem.c |   10 +++---
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> > index 05559b6..8b49eee 100644
> > --- a/hw/ivshmem.c
> > +++ b/hw/ivshmem.c
> > @@ -70,7 +70,6 @@ typedef struct IVShmemState {
> >   */
> >  MemoryRegion bar;
> >  MemoryRegion ivshmem;
> > -MemoryRegion msix_bar;
> >  uint64_t ivshmem_size; /* size of shared memory region */
> >  int shm_fd; /* shared memory file descriptor */
> >  
> > @@ -563,16 +562,13 @@ static uint64_t ivshmem_get_size(IVShmemState * s) {
> >  
> >  static void ivshmem_setup_msi(IVShmemState * s)
> >  {
> > -memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
> > -if (!msix_init(&s->dev, s->vectors, &s->msix_bar, 1, 0)) {
> > -pci_register_bar(&s->dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
> > - &s->msix_bar);
> > -IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
> > -} else {
> > +if (msix_init_exclusive_bar(&s->dev, s->vectors, 1)) {
> >  IVSHMEM_DPRINTF("msix initialization failed\n");
> >  exit(1);
> >  }
> >  
> > +IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
> > +
> >  /* allocate QEMU char devices for receiving interrupts */
> >  s->eventfd_table = g_malloc0(s->vectors * sizeof(EventfdEntry));
> >  
> > 
>

[Qemu-devel] [PATCH 3/5] scsi: Add basic support for SCSI media changer commands.

2012-06-14 Thread Christian Borntraeger

From: Christian Hoff 

This adds basic support for SCSI media changer commands.
Not all commands are supported as of now, but enough to cover
basic functionality.

Signed-off-by: Christian Hoff 
Signed-off-by: Christian Borntraeger 
---
 hw/scsi-bus.c  |   59 ++-
 hw/scsi-defs.h |4 +++
 2 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 0e484d2..a93d3da 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -724,7 +724,6 @@ static int scsi_req_length(SCSICommand *cmd, SCSIDevice 
*dev, uint8_t *buf)
 case SET_CD_SPEED:
 case SET_LIMITS:
 case WRITE_LONG_10:
-case MOVE_MEDIUM:
 case UPDATE_BLOCK:
 case RESERVE_TRACK:
 case SET_READ_AHEAD:
@@ -852,6 +851,43 @@ static int scsi_req_stream_length(SCSICommand *cmd, 
SCSIDevice *dev, uint8_t *bu
 return 0;
 }
 
+static int scsi_req_medium_changer_length(SCSICommand *cmd, SCSIDevice *dev, 
uint8_t *buf)
+{
+switch (buf[0]) {
+/* medium changer commands */
+case EXCHANGE_MEDIUM:
+cmd->xfer = 0;
+cmd->len = 12;
+break;
+case INITIALIZE_ELEMENT_STATUS:
+cmd->xfer = 0;
+cmd->len = 6;
+break;
+case INITIALIZE_ELEMENT_STATUS_WITH_RANGE:
+cmd->xfer = 0;
+cmd->len = 10;
+break;
+case MOVE_MEDIUM:
+cmd->xfer = 0;
+cmd->len = 12;
+break;
+case POSITION_TO_ELEMENT:
+cmd->xfer = 0;
+cmd->len = 10;
+break;
+case READ_ELEMENT_STATUS:
+cmd->xfer = buf[9] | (buf[8] << 8) | (buf[7] << 16);
+cmd->len = 12;
+break;
+
+/* generic commands */
+default:
+return scsi_req_length(cmd, dev, buf);
+}
+return 0;
+}
+
+
 static void scsi_cmd_xfer_mode(SCSICommand *cmd)
 {
 switch (cmd->buf[0]) {
@@ -928,11 +964,18 @@ int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, 
uint8_t *buf)
 {
 int rc;
 
-if (dev->type == TYPE_TAPE) {
+switch (dev->type) {
+case TYPE_TAPE:
 rc = scsi_req_stream_length(cmd, dev, buf);
-} else {
+break;
+case TYPE_MEDIUM_CHANGER:
+rc = scsi_req_medium_changer_length(cmd, dev, buf);
+break;
+default:
 rc = scsi_req_length(cmd, dev, buf);
+break;
 }
+
 if (rc != 0)
 return rc;
 
@@ -1110,7 +1153,8 @@ static const char *scsi_command_name(uint8_t cmd)
 [ REQUEST_SENSE] = "REQUEST_SENSE",
 [ FORMAT_UNIT  ] = "FORMAT_UNIT",
 [ READ_BLOCK_LIMITS] = "READ_BLOCK_LIMITS",
-[ REASSIGN_BLOCKS  ] = "REASSIGN_BLOCKS",
+[ REASSIGN_BLOCKS  ] = "REASSIGN_BLOCKS/INITIALIZE ELEMENT 
STATUS",
+/* LOAD_UNLOAD and INITIALIZE_ELEMENT_STATUS use the same operation 
code */
 [ READ_6   ] = "READ_6",
 [ WRITE_6  ] = "WRITE_6",
 [ SET_CAPACITY ] = "SET_CAPACITY",
@@ -1135,7 +1179,8 @@ static const char *scsi_command_name(uint8_t cmd)
 [ READ_CAPACITY_10 ] = "READ_CAPACITY_10",
 [ READ_10  ] = "READ_10",
 [ WRITE_10 ] = "WRITE_10",
-[ SEEK_10  ] = "SEEK_10",
+[ SEEK_10  ] = "SEEK_10/POSITION_TO_ELEMENT",
+/* SEEK_10 and POSITION_TO_ELEMENT use the same operation code */
 [ WRITE_VERIFY_10  ] = "WRITE_VERIFY_10",
 [ VERIFY_10] = "VERIFY_10",
 [ SEARCH_HIGH  ] = "SEARCH_HIGH",
@@ -1146,7 +1191,8 @@ static const char *scsi_command_name(uint8_t cmd)
 /* READ_POSITION and PRE_FETCH use the same operation code */
 [ SYNCHRONIZE_CACHE] = "SYNCHRONIZE_CACHE",
 [ LOCK_UNLOCK_CACHE] = "LOCK_UNLOCK_CACHE",
-[ READ_DEFECT_DATA ] = "READ_DEFECT_DATA",
+[ READ_DEFECT_DATA ] = 
"READ_DEFECT_DATA/INITIALIZE_ELEMENT_STATUS_WITH_RANGE",
+/* READ_DEFECT_DATA and INITIALIZE_ELEMENT_STATUS_WITH_RANGE use the 
same operation code */
 [ MEDIUM_SCAN  ] = "MEDIUM_SCAN",
 [ COMPARE  ] = "COMPARE",
 [ COPY_VERIFY  ] = "COPY_VERIFY",
@@ -1190,6 +1236,7 @@ static const char *scsi_command_name(uint8_t cmd)
 [ REPORT_LUNS  ] = "REPORT_LUNS",
 [ BLANK] = "BLANK",
 [ MOVE_MEDIUM  ] = "MOVE_MEDIUM",
+[ EXCHANGE_MEDIUM  ] = "EXCHANGE MEDIUM",
 [ LOAD_UNLOAD  ] = "LOAD_UNLOAD",
 [ READ_12  ] = "READ_12",
 [ WRITE_12 ] = "WRITE_12",
diff --git a/hw/scsi-defs.h b/hw/scsi-defs.h
index 2b0db4b..2c40855 100644
--- a/hw/scsi-defs.h
+++ b/hw/scsi-defs.h
@@ -29,6 +29,7 @@
 #define REQUEST_SENSE 0x03
 #define FORMAT_UNIT   0x04
 #define READ_BLOCK_LIMITS 0x05
+#define INI

[Qemu-devel] [PATCH 4/5] scsi: Fix transfer length for READ POSITION commands.

2012-06-14 Thread Christian Borntraeger

From: Christian Hoff 

The transfer length depends on the specific service action
code, as defined in the SCSI stream commands spec section 7.7.
Up to now only the extended form was supported.

Signed-off-by: Christian Hoff 
Signed-off-by: Christian Borntraeger 
---
 hw/scsi-bus.c  |   16 +++-
 hw/scsi-defs.h |8 
 2 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index a93d3da..9854321 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -839,7 +839,21 @@ static int scsi_req_stream_length(SCSICommand *cmd, 
SCSIDevice *dev, uint8_t *bu
 cmd->xfer = buf[13] | (buf[12] << 8);
 break;
 case READ_POSITION:
-cmd->xfer = buf[8] | (buf[7] << 8);
+switch (buf[1] & 0x1f) /* operation code */ {
+case SHORT_FORM_BLOCK_ID:
+case SHORT_FORM_VENDOR_SPECIFIC:
+cmd->xfer = 20;
+break;
+case LONG_FORM:
+cmd->xfer = 32;
+break;
+case EXTENDED_FORM:
+cmd->xfer = buf[8] | (buf[7] << 8);
+break;
+default:
+return -1;
+}
+
 break;
 case FORMAT_UNIT:
 cmd->xfer = buf[4] | (buf[3] << 8);
diff --git a/hw/scsi-defs.h b/hw/scsi-defs.h
index 2c40855..57d0866 100644
--- a/hw/scsi-defs.h
+++ b/hw/scsi-defs.h
@@ -144,6 +144,14 @@
 #define SAI_READ_CAPACITY_16  0x10
 
 /*
+ * READ POSITION service action codes
+ */
+#define SHORT_FORM_BLOCK_ID  0x00
+#define SHORT_FORM_VENDOR_SPECIFIC 0x01
+#define LONG_FORM0x06
+#define EXTENDED_FORM0x08
+
+/*
  *  SAM Status codes
  */
 
-- 
1.7.0.4

[Qemu-devel] [PATCH 1/5] scsi: Fix data length == SCSI_SENSE_BUF_SIZE

2012-06-14 Thread Christian Borntraeger

From: Christian Hoff 

Fix the edge case where the sense data length is exactly the same
as SCSI_SENSE_BUF_SIZE.
This makes SCSI requests work that use all of the available 95 byte
sense data.

Signed-off-by: Christian Hoff 
Signed-off-by: Christian Borntraeger 
---
 hw/scsi-bus.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 64e709e..d1779a2 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -1295,7 +1295,7 @@ void scsi_req_complete(SCSIRequest *req, int status)
 assert(req->status == -1);
 req->status = status;
 
-assert(req->sense_len < sizeof(req->sense));
+assert(req->sense_len <= sizeof(req->sense));
 if (status == GOOD) {
 req->sense_len = 0;
 }
-- 
1.7.0.4

[Qemu-devel] [PATCH 2/5] scsi: Fix LOAD_UNLOAD

2012-06-14 Thread Christian Borntraeger

From: Christian Hoff 

Change operation code of LOAD_UNLOAD command to 0x1b as described in
section 7.3 of the SCSI Stream Commands spec.

Signed-off-by: Christian Hoff 
Signed-off-by: Christian Borntraeger 
---
 hw/scsi-bus.c  |6 +++---
 hw/scsi-defs.h |2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index d1779a2..0e484d2 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -721,7 +721,6 @@ static int scsi_req_length(SCSICommand *cmd, SCSIDevice 
*dev, uint8_t *buf)
 case SYNCHRONIZE_CACHE_16:
 case LOCATE_16:
 case LOCK_UNLOCK_CACHE:
-case LOAD_UNLOAD:
 case SET_CD_SPEED:
 case SET_LIMITS:
 case WRITE_LONG_10:
@@ -833,7 +832,7 @@ static int scsi_req_stream_length(SCSICommand *cmd, 
SCSIDevice *dev, uint8_t *bu
 }
 break;
 case REWIND:
-case START_STOP:
+case LOAD_UNLOAD:
 cmd->len = 6;
 cmd->xfer = 0;
 break;
@@ -1128,7 +1127,8 @@ static const char *scsi_command_name(uint8_t cmd)
 [ COPY ] = "COPY",
 [ ERASE] = "ERASE",
 [ MODE_SENSE   ] = "MODE_SENSE",
-[ START_STOP   ] = "START_STOP",
+[ START_STOP   ] = "START_STOP/LOAD_UNLOAD",
+/* LOAD_UNLOAD and START_STOP use the same operation code */
 [ RECEIVE_DIAGNOSTIC   ] = "RECEIVE_DIAGNOSTIC",
 [ SEND_DIAGNOSTIC  ] = "SEND_DIAGNOSTIC",
 [ ALLOW_MEDIUM_REMOVAL ] = "ALLOW_MEDIUM_REMOVAL",
diff --git a/hw/scsi-defs.h b/hw/scsi-defs.h
index 354ed7b..2b0db4b 100644
--- a/hw/scsi-defs.h
+++ b/hw/scsi-defs.h
@@ -44,6 +44,7 @@
 #define COPY  0x18
 #define ERASE 0x19
 #define MODE_SENSE0x1a
+#define LOAD_UNLOAD   0x1b
 #define START_STOP0x1b
 #define RECEIVE_DIAGNOSTIC0x1c
 #define SEND_DIAGNOSTIC   0x1d
@@ -114,7 +115,6 @@
 #define MAINTENANCE_IN0xa3
 #define MAINTENANCE_OUT   0xa4
 #define MOVE_MEDIUM   0xa5
-#define LOAD_UNLOAD   0xa6
 #define SET_READ_AHEAD0xa7
 #define READ_12   0xa8
 #define WRITE_12  0xaa
-- 
1.7.0.4

Re: [Qemu-devel] [PATCH 0/5] scsi related fixes

2012-06-14 Thread Paolo Bonzini

Il 14/06/2012 15:55, Christian Borntraeger ha scritto:
> Paolo,
> 
> this patch series was created during bringup of virtio-scsi on s390x.
> We used a tape library as a test vehicle.

Awesome!

> so here is the set of patches that allowed us to use it via virtio-scsi.
> Any comments? Ok to apply?

All looks good, I'll check them more carefully against the spec in a
couple of weeks and include them in the scsi branch.

Paolo

Re: [Qemu-devel] [PATCH] block: Prevent /dev/fd/X filename from being detected as floppy

2012-06-14 Thread Corey Bryant




On 06/14/2012 09:51 AM, Kevin Wolf wrote:

Am 14.06.2012 15:39, schrieb Corey Bryant:



On 06/14/2012 05:24 AM, Paolo Bonzini wrote:

Il 13/06/2012 16:30, root ha scritto:

From: Corey Bryant 

Reported-by: Kevin Wolf 
Signed-off-by: Corey Bryant 
---
   block/raw-posix.c |4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index d8eff2f..68886cd 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -946,9 +946,11 @@ static int floppy_probe_device(const char *filename)
   int prio = 0;
   struct floppy_struct fdparam;
   struct stat st;
+const char *p;

-if (strstart(filename, "/dev/fd", NULL))
+if (strstart(filename, "/dev/fd", &p) && p[0] != '/') {
   prio = 50;
+}

   fd = qemu_open(filename, O_RDONLY | O_NONBLOCK);
   if (fd < 0) {



Reviewed-by: Paolo Bonzini 



Thanks!


I tried to apply this earlier today, but it depends on the other patch
series, so I couldn't. Can you just include it as patch 5 when you send
the next version?


Ok sure I'll do that.



Oh, and you'll probably want to fix your mailing setup to put into From:
something different than root@localhost.localdomain...


Ugh.. I'm sorry about that.



Kevin



--
Regards,
Corey

Re: [Qemu-devel] [PATCH v5 3/4] fdc: rewrite seek and DSKCHG bit handling

2012-06-14 Thread Kevin Wolf

Am 13.06.2012 15:43, schrieb Pavel Hrdina:
> This bit is cleared on every successful seek to a different track (cylinder).
> The seek is also called on revalidate or on read/write/format commands which
> also clear the DSKCHG bit.
> 
> Signed-off-by: Pavel Hrdina 
> ---
>  hw/fdc.c |   68 
> +++---
>  1 files changed, 34 insertions(+), 34 deletions(-)

Nice cleanup! Looks good generally, but this and patch 4 need test cases
as well before I can apply them.

> @@ -1004,30 +998,39 @@ static int fdctrl_seek_to_next_sect(FDCtrl *fdctrl, 
> FDrive *cur_drv)
> fd_sector(cur_drv));
>  /* XXX: cur_drv->sect >= cur_drv->last_sect should be an
> error in fact */
> -if (cur_drv->sect >= cur_drv->last_sect ||
> -cur_drv->sect == fdctrl->eot) {
> -cur_drv->sect = 1;
> +uint8_t new_head = cur_drv->head;
> +uint8_t new_track = cur_drv->track;
> +uint8_t new_sect = cur_drv->sect;
> +
> +int ret = 1;
> +
> +if (new_sect >= cur_drv->last_sect ||
> +new_sect == fdctrl->eot) {
> +new_sect = 1;
>  if (FD_MULTI_TRACK(fdctrl->data_state)) {
> -if (cur_drv->head == 0 &&
> +if (new_head == 0 &&
>  (cur_drv->flags & FDISK_DBL_SIDES) != 0) {
> -cur_drv->head = 1;
> +new_head = 1;
>  } else {
> -cur_drv->head = 0;
> -cur_drv->track++;
> +new_head = 0;
> +new_track++;
>  if ((cur_drv->flags & FDISK_DBL_SIDES) == 0)
> -return 0;
> +ret = 0;

Please use the chance to add braces here.

Can you also add a small header comment to the function that explains
what the return value means? Initially I read 0 as error, but in fact it
seems only to mean "request completed".

Kevin

Re: [Qemu-devel] [PATCH 4/6] qemu-img: add-cow will not support convert

2012-06-14 Thread Dong Xu Wang

On Thu, Jun 14, 2012 at 6:51 PM, Kevin Wolf  wrote:
> Am 13.06.2012 16:36, schrieb Dong Xu Wang:
>> add-cow file can't live alone, must together will image_file and 
>> backing_file.
>> If we implement qemu-img convert operation for add-cow file format, we must
>> create image_file and backing_file manually, that will be confused for users,
>> so we just ignore add-cow format while doing converting.
>>
>> Signed-off-by: Dong Xu Wang 
>
> NACK.
>
> This stupid "let's drop the feature, it might confuse users" attitude is
> known from Gnome, but not from qemu.
>
> There's no technical reason to forbid it and anyone who manages to
> create a valid add-cow image will also be able to specify the very same
> options to a convert command. Also, having image format specific code in
> qemu-img is evil.
>

If I implement add-cow convert command, I am wondering which method should
I use:
1) create add-cow, and its backing_file, and its image_file,  then we
need 3 files.
2) create add-cow(with all bitmap marked to allocated), and its
image_file, then we
need 2 files.

2) will be easier, I should let .add-cow file can live only with
image_file, without backing_file.

I think both 1) and 2) need add code to qemu-img.c. Or I will have to create
image_file automaticly in add_cow_create function.

Can you give some comments on how to implement convert? Thanks.

> Kevin
>

Re: [Qemu-devel] [PATCH v5 0/4] fdc: fix/rewrite seek, media_changed and interrupt handling

2012-06-14 Thread Kevin Wolf

Am 13.06.2012 15:43, schrieb Pavel Hrdina:
> The fd_seek will return 'track is invalid' if there is no media in drive.
> 
> Implied seek should have the same behavior as normal seek. We will use the 
> fd_seek
> function instead of a direct modification of head, track and sector values.
> The result value is used only while read/write commands were issued.
> 
> Fixed sense interrupt status command.
> 
> v5:
> - correction of misspelled words
> 
> v4:
> - fixed coding style
> - better bit clearing in 'fdctrl_handle_sense_interrupt_status'
> 
> v3:
> - added fdc-test
> - rewrited seek and media_changed handling
> - fixed interrupt status handling
> 
> v2:
> - better way to detect media missing
> 
> Pavel Hrdina (4):
>   fdc: fix implied seek while there is no media in drive
>   fdc-test: introduced qtest read_without_media
>   fdc: rewrite seek and DSKCHG bit handling
>   fdc: fix interrupt handling
> 
>  hw/fdc.c |  106 
> ++
>  tests/fdc-test.c |   66 +
>  2 files changed, 125 insertions(+), 47 deletions(-)
> 

Thanks, I applied patches 1 and 2. For the rest I'm waiting for a v6
that includes test cases.

Kevin

Re: [Qemu-devel] [PATCH 4/6] qemu-img: add-cow will not support convert

2012-06-14 Thread Kevin Wolf

Am 14.06.2012 16:06, schrieb Dong Xu Wang:
> On Thu, Jun 14, 2012 at 6:51 PM, Kevin Wolf  wrote:
>> Am 13.06.2012 16:36, schrieb Dong Xu Wang:
>>> add-cow file can't live alone, must together will image_file and 
>>> backing_file.
>>> If we implement qemu-img convert operation for add-cow file format, we must
>>> create image_file and backing_file manually, that will be confused for 
>>> users,
>>> so we just ignore add-cow format while doing converting.
>>>
>>> Signed-off-by: Dong Xu Wang 
>>
>> NACK.
>>
>> This stupid "let's drop the feature, it might confuse users" attitude is
>> known from Gnome, but not from qemu.
>>
>> There's no technical reason to forbid it and anyone who manages to
>> create a valid add-cow image will also be able to specify the very same
>> options to a convert command. Also, having image format specific code in
>> qemu-img is evil.
>>
> 
> If I implement add-cow convert command, I am wondering which method should
> I use:
> 1) create add-cow, and its backing_file, and its image_file,  then we
> need 3 files.
> 2) create add-cow(with all bitmap marked to allocated), and its
> image_file, then we
> need 2 files.
> 
> 2) will be easier, I should let .add-cow file can live only with
> image_file, without backing_file.
> 
> I think both 1) and 2) need add code to qemu-img.c. Or I will have to create
> image_file automaticly in add_cow_create function.
> 
> Can you give some comments on how to implement convert? Thanks.

Just leave it alone and it will work.

qemu-img convert takes the same options as qemu-img create. So like for
any other image you specify the backing file with -b or -o backing_file,
and for add-cow images to be successfully created you also need to
specify -o image_file.

Kevin

Re: [Qemu-devel] [PATCH v2 5/6] msix: Allow full specification of MSIX layout

2012-06-14 Thread Alex Williamson

On Thu, 2012-06-14 at 08:17 +0200, Jan Kiszka wrote:
> On 2012-06-14 06:51, Alex Williamson wrote:
>   
> > -/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size 
> > is
> > - * modified, it should be retrieved with msix_bar_size. */
> > +/* Add MSI-X capability to the config space for the device. */
> > +static int msix_add_config(struct PCIDevice *dev, unsigned short nentries,
> > +   uint8_t table_bar, unsigned table_offset,
> > +   uint8_t pba_bar, unsigned pba_offset, uint8_t 
> > pos)
> 
> Why not fold msix_add_config into msix_init and move sanity checks to
> the beginning, i.e. before resource allocations? Then you also do not
> need to call msix_uninit from msix_init. Likely a matter of taste, so
> not a must-have.

Yep, that works nicely.  fixed.

> > +{
> > +int config_offset;
> > +uint8_t *config;
> > +
> > +if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
> > +return -EINVAL;
> > +}
> > +
> > +config_offset = pci_add_capability(dev, PCI_CAP_ID_MSIX,
> > +   pos, MSIX_CAP_LENGTH);
> > +if (config_offset < 0) {
> > +return config_offset;
> > +}
> > +
> > +config = dev->config + config_offset;
> > +
> > +pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
> > +pci_set_long(config + PCI_MSIX_TABLE, table_offset | table_bar);
> > +pci_set_long(config + PCI_MSIX_PBA, pba_offset | pba_bar);
> > +
> > +dev->msix_cap = config_offset;
> > +
> > +/* Make flags bit writable. */
> > +dev->wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
> > +   MSIX_MASKALL_MASK;
> > +
> > +dev->msix_function_masked = true;
> > +
> > +return 0;
> > +}
> > +
> > +/* Clean up resources for the device. */
> > +void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion 
> > *pba_bar)
> > +{
> > +if (!(dev->cap_present & QEMU_PCI_CAP_MSIX)) {
> 
> msix_present()

Fixed.

> > +return;
> > +}
> > +
> > +pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
> > +dev->msix_cap = 0;
> > +dev->msix_entries_nr = 0;
> > +
> > +memory_region_del_subregion(pba_bar, &dev->msix_pba_mmio);
> > +memory_region_destroy(&dev->msix_pba_mmio);
> > +g_free(dev->msix_pba);
> > +dev->msix_pba = NULL;
> > +
> > +memory_region_del_subregion(table_bar, &dev->msix_table_mmio);
> > +memory_region_destroy(&dev->msix_table_mmio);
> > +g_free(dev->msix_table);
> > +dev->msix_table = NULL;
> > +
> > +g_free(dev->msix_entry_used);
> > +dev->msix_entry_used = NULL;
> > +dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
> > +}
> > +
> > +/* Initialize the MSI-X structures */
> >  int msix_init(struct PCIDevice *dev, unsigned short nentries,
> > -  MemoryRegion *bar,
> > -  unsigned bar_nr, unsigned bar_size)
> > +  MemoryRegion *table_bar, uint8_t table_bar_nr,
> > +  unsigned table_offset, MemoryRegion *pba_bar,
> > +  uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos)
> >  {
> >  int ret;
> >  unsigned table_size, pba_size;
> > @@ -281,43 +279,41 @@ int msix_init(struct PCIDevice *dev, unsigned short 
> > nentries,
> >  if (!msi_supported) {
> >  return -ENOTSUP;
> >  }
> > -if (nentries > MSIX_MAX_ENTRIES)
> > -return -EINVAL;
> >  
> >  table_size = nentries * PCI_MSIX_ENTRY_SIZE;
> >  pba_size = QEMU_ALIGN_UP(nentries, 64) / 8;
> >  
> > -dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
> > -sizeof *dev->msix_entry_used);
> > +/* Sanity test: table & pba don't overlap, fit within BARs, min 
> > aligned */
> > +if ((table_bar_nr == pba_bar_nr &&
> > + ranges_overlap(table_offset, table_size, pba_offset, pba_size)) ||
> > +table_offset + table_size > memory_region_size(table_bar) ||
> > +pba_offset + pba_size > memory_region_size(pba_bar) ||
> > +(table_offset | pba_offset) & PCI_MSIX_FLAGS_BIRMASK) {
> > +return -EINVAL;
> > +}
> >  
> >  dev->msix_table = g_malloc0(table_size);
> >  dev->msix_pba = g_malloc0(pba_size);
> > +dev->msix_entry_used = g_malloc0(nentries * sizeof 
> > *dev->msix_entry_used);
> > +dev->msix_entries_nr = nentries;
> > +dev->cap_present |= QEMU_PCI_CAP_MSIX;
> > +
> >  msix_mask_all(dev, nentries);
> >  
> >  memory_region_init_io(&dev->msix_table_mmio, &msix_table_mmio_ops, dev,
> >"msix", table_size);
> > +memory_region_add_subregion(table_bar, table_offset, 
> > &dev->msix_table_mmio);
> > +
> >  memory_region_init_io(&dev->msix_pba_mmio, &msix_pba_mmio_ops, dev,
> >"msix-pba", pba_size);
> > +memory_region_add_subregion(pba_bar, pba_offset, &dev->msix_pba_mmio);
> >  
> > -dev->msix_entr

Re: [Qemu-devel] [PATCH v2 5/6] msix: Allow full specification of MSIX layout

2012-06-14 Thread Alex Williamson

On Thu, 2012-06-14 at 11:10 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 13, 2012 at 10:51:56PM -0600, Alex Williamson wrote:
> > Finally, complete the fully specified interface.  msix_add_config()
> > gets moved to be closer to the setup functions where it's actually
> > used.  msix_mmio_setup() gets folded into msix_init().  And
> > msix_uninit() gets reworked a bit so we can call it as cleanup
> > from msix_init().
> > 
> > Signed-off-by: Alex Williamson 
> 
> Some review comments seem to have been ignored: this patch -
> uses hardcoded constants, still has duplicated parameters,
> move functions in the same patch as adding functionality.
> Do you intend to send another revision?

The comments weren't ignored, I replied to them.  Please check your
mail.  In the interim, I sent a new series.  Thanks,

Alex

> > ---
> > 
> >  hw/msix.c |  217 
> > +++--
> >  hw/msix.h |   10 ++-
> >  2 files changed, 101 insertions(+), 126 deletions(-)
> > 
> > diff --git a/hw/msix.c b/hw/msix.c
> > index d476d07..047646a 100644
> > --- a/hw/msix.c
> > +++ b/hw/msix.c
> > @@ -27,14 +27,6 @@
> >  #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
> >  #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
> >  
> > -/* How much space does an MSIX table need. */
> > -/* The spec requires giving the table structure
> > - * a 4K aligned region all by itself. */
> > -#define MSIX_PAGE_SIZE 0x1000
> > -/* Reserve second half of the page for pending bits */
> > -#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
> > -#define MSIX_MAX_ENTRIES 32
> > -
> >  static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
> >  {
> >  uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
> > @@ -45,47 +37,6 @@ static MSIMessage msix_get_message(PCIDevice *dev, 
> > unsigned vector)
> >  return msg;
> >  }
> >  
> > -/* Add MSI-X capability to the config space for the device. */
> > -/* Given a bar and its size, add MSI-X table on top of it
> > - * and fill MSI-X capability in the config space.
> > - * Original bar size must be a power of 2 or 0.
> > - * New bar size is returned. */
> > -static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
> > -   unsigned bar_nr, unsigned bar_size)
> > -{
> > -int config_offset;
> > -uint8_t *config;
> > -
> > -if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1)
> > -return -EINVAL;
> > -if (bar_size > 0x8000)
> > -return -ENOSPC;
> > -
> > -/* Require aligned offset for MSI-X structures */
> > -if (bar_size & ~(MSIX_PAGE_SIZE - 1)) {
> > -return -EINVAL;
> > -}
> > -
> > -config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
> > -   0, MSIX_CAP_LENGTH);
> > -if (config_offset < 0)
> > -return config_offset;
> > -config = pdev->config + config_offset;
> > -
> > -pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
> > -/* Table on top of BAR */
> > -pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
> > -/* Pending bits on top of that */
> > -pci_set_long(config + PCI_MSIX_PBA, (bar_size + MSIX_PAGE_PENDING) |
> > - bar_nr);
> > -pdev->msix_cap = config_offset;
> > -/* Make flags bit writable. */
> > -pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
> > -   MSIX_MASKALL_MASK;
> > -pdev->msix_function_masked = true;
> > -return 0;
> > -}
> > -
> >  static uint8_t msix_pending_mask(int vector)
> >  {
> >  return 1 << (vector % 8);
> > @@ -240,20 +191,6 @@ static const MemoryRegionOps msix_pba_mmio_ops = {
> >  },
> >  };
> >  
> > -static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
> > -{
> > -uint8_t *config = d->config + d->msix_cap;
> > -uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
> > -uint32_t table_offset = table & ~PCI_MSIX_FLAGS_BIRMASK;
> > -uint32_t pba = pci_get_long(config + PCI_MSIX_PBA);
> > -uint32_t pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
> > -/* TODO: for assigned devices, we'll want to make it possible to map
> > - * pending bits separately in case they are in a separate bar. */
> > -
> > -memory_region_add_subregion(bar, table_offset, &d->msix_table_mmio);
> > -memory_region_add_subregion(bar, pba_offset, &d->msix_pba_mmio);
> > -}
> > -
> >  static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
> >  {
> >  int vector;
> > @@ -268,11 +205,72 @@ static void msix_mask_all(struct PCIDevice *dev, 
> > unsigned nentries)
> >  }
> >  }
> >  
> > -/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size 
> > is
> > - * modified, it should be retrieved with msix_bar_size. */
> > +/* Add MSI-X capability to the config space for the device. */
> > +static int msix_add_config(struct PCIDevice *dev, unsigned short nentries,
> > +

[Qemu-devel] [PATCH 5/5] scsi: Ensure command and transfer lengths are set for all SCSI stream commands.

2012-06-14 Thread Christian Borntraeger

From: Christian Hoff 

scsi-generic relies on those values to be correct, so it is important that
those values are initialized properly prior to submitting the SCSI command.

Signed-off-by: Christian Hoff 
Signed-off-by: Christian Borntraeger 
---
 hw/scsi-bus.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 9854321..3c3f190 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -817,7 +817,11 @@ static int scsi_req_stream_length(SCSICommand *cmd, 
SCSIDevice *dev, uint8_t *bu
 switch (buf[0]) {
 /* stream commands */
 case ERASE_12:
+cmd->len = 12;
+cmd->xfer = 0;
+break;
 case ERASE_16:
+cmd->len = 16;
 cmd->xfer = 0;
 break;
 case READ_6:
@@ -836,9 +840,12 @@ static int scsi_req_stream_length(SCSICommand *cmd, 
SCSIDevice *dev, uint8_t *bu
 cmd->xfer = 0;
 break;
 case SPACE_16:
+cmd->len = 16;
 cmd->xfer = buf[13] | (buf[12] << 8);
 break;
 case READ_POSITION:
+cmd->len = 10;
+
 switch (buf[1] & 0x1f) /* operation code */ {
 case SHORT_FORM_BLOCK_ID:
 case SHORT_FORM_VENDOR_SPECIFIC:
@@ -856,6 +863,7 @@ static int scsi_req_stream_length(SCSICommand *cmd, 
SCSIDevice *dev, uint8_t *bu
 
 break;
 case FORMAT_UNIT:
+cmd->len = 6;
 cmd->xfer = buf[4] | (buf[3] << 8);
 break;
 /* generic commands */
-- 
1.7.0.4

Re: [Qemu-devel] [PATCH v2 6/6] msix: Fix last PCIDevice naming inconsitency

2012-06-14 Thread Alex Williamson

On Thu, 2012-06-14 at 11:13 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 13, 2012 at 10:52:06PM -0600, Alex Williamson wrote:
> > The previous patches fixed almost all the inconsistent names used
> > for PCIDevice in msix.c, fix the one remaining transgression.
> > 
> > Signed-off-by: Alex Williamson 
> 
> Except, would be better to first rename parameters and
> do other mechanical changes, and then apply your diff
> on top.
> 
> As it is the patch looks like it rewrites everything while
> it could be a very small one.

You're welcome for trying to add consistency, sorry I pointed it out.

> > ---
> > 
> >  hw/msix.h |3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/hw/msix.h b/hw/msix.h
> > index 14b1a2e..1786e27 100644
> > --- a/hw/msix.h
> > +++ b/hw/msix.h
> > @@ -11,8 +11,7 @@ int msix_init(PCIDevice *dev, unsigned short nentries,
> >  int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
> >  uint8_t bar_nr);
> >  
> > -void msix_write_config(PCIDevice *pci_dev, uint32_t address,
> > -   uint32_t val, int len);
> > +void msix_write_config(PCIDevice *dev, uint32_t address, uint32_t val, int 
> > len);
> >  
> >  void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar,
> >   MemoryRegion *pba_bar);

Re: [Qemu-devel] [PATCH 4/6] qemu-img: add-cow will not support convert

2012-06-14 Thread Dong Xu Wang

On Thu, Jun 14, 2012 at 10:11 PM, Kevin Wolf  wrote:
> Am 14.06.2012 16:06, schrieb Dong Xu Wang:
>> On Thu, Jun 14, 2012 at 6:51 PM, Kevin Wolf  wrote:
>>> Am 13.06.2012 16:36, schrieb Dong Xu Wang:
 add-cow file can't live alone, must together will image_file and 
 backing_file.
 If we implement qemu-img convert operation for add-cow file format, we must
 create image_file and backing_file manually, that will be confused for 
 users,
 so we just ignore add-cow format while doing converting.

 Signed-off-by: Dong Xu Wang 
>>>
>>> NACK.
>>>
>>> This stupid "let's drop the feature, it might confuse users" attitude is
>>> known from Gnome, but not from qemu.
>>>
>>> There's no technical reason to forbid it and anyone who manages to
>>> create a valid add-cow image will also be able to specify the very same
>>> options to a convert command. Also, having image format specific code in
>>> qemu-img is evil.
>>>
>>
>> If I implement add-cow convert command, I am wondering which method should
>> I use:
>> 1) create add-cow, and its backing_file, and its image_file,  then we
>> need 3 files.
>> 2) create add-cow(with all bitmap marked to allocated), and its
>> image_file, then we
>>     need 2 files.
>>
>> 2) will be easier, I should let .add-cow file can live only with
>> image_file, without backing_file.
>>
>> I think both 1) and 2) need add code to qemu-img.c. Or I will have to create
>> image_file automaticly in add_cow_create function.
>>
>> Can you give some comments on how to implement convert? Thanks.
>
> Just leave it alone and it will work.
>
> qemu-img convert takes the same options as qemu-img create. So like for
> any other image you specify the backing file with -b or -o backing_file,
> and for add-cow images to be successfully created you also need to
> specify -o image_file.
>
"-o image_file", the image_file should be precreated? Or I should created it
manually?
> Kevin
>

[Qemu-devel] [PATCH 0/5] scsi related fixes

2012-06-14 Thread Christian Borntraeger

Paolo,

this patch series was created during bringup of virtio-scsi on s390x.
We used a tape library as a test vehicle.

so here is the set of patches that allowed us to use it via virtio-scsi.
Any comments? Ok to apply?

Christian


Christian Hoff (5):
  scsi: Fix data length == SCSI_SENSE_BUF_SIZE
  scsi: Fix LOAD_UNLOAD
  scsi: Add basic support for SCSI media changer commands.
  scsi: Fix transfer length for READ POSITION commands.
  scsi: Ensure command and transfer lengths are set for all SCSI stream
commands.

 hw/scsi-bus.c  |   91 +---
 hw/scsi-defs.h |   14 -
 2 files changed, 93 insertions(+), 12 deletions(-)

Re: [Qemu-devel] [PATCH v2 6/6] msix: Fix last PCIDevice naming inconsitency

2012-06-14 Thread Michael S. Tsirkin

On Thu, Jun 14, 2012 at 08:18:10AM -0600, Alex Williamson wrote:
> On Thu, 2012-06-14 at 11:13 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 13, 2012 at 10:52:06PM -0600, Alex Williamson wrote:
> > > The previous patches fixed almost all the inconsistent names used
> > > for PCIDevice in msix.c, fix the one remaining transgression.
> > > 
> > > Signed-off-by: Alex Williamson 
> > 
> > Except, would be better to first rename parameters and
> > do other mechanical changes, and then apply your diff
> > on top.
> > 
> > As it is the patch looks like it rewrites everything while
> > it could be a very small one.
> 
> You're welcome for trying to add consistency, sorry I pointed it out.

No problem with this patch itself though.

> > > ---
> > > 
> > >  hw/msix.h |3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/msix.h b/hw/msix.h
> > > index 14b1a2e..1786e27 100644
> > > --- a/hw/msix.h
> > > +++ b/hw/msix.h
> > > @@ -11,8 +11,7 @@ int msix_init(PCIDevice *dev, unsigned short nentries,
> > >  int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
> > >  uint8_t bar_nr);
> > >  
> > > -void msix_write_config(PCIDevice *pci_dev, uint32_t address,
> > > -   uint32_t val, int len);
> > > +void msix_write_config(PCIDevice *dev, uint32_t address, uint32_t val, 
> > > int len);
> > >  
> > >  void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar,
> > >   MemoryRegion *pba_bar);
> 
>

Re: [Qemu-devel] [PATCH v2 4/6] msix: Split PBA into it's own MemoryRegion

2012-06-14 Thread Alex Williamson

On Thu, 2012-06-14 at 13:24 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 13, 2012 at 10:51:47PM -0600, Alex Williamson wrote:
> > These don't have to be contiguous.  Size them to only what
> > they need and use separate MemoryRegions for the vector
> > table and PBA.
> > 
> > Signed-off-by: Alex Williamson 
> 
> Why is this still using NATIVE?

Because the bug already exists, this patch doesn't make it worse, so at
best it's a tangentially related additional fix.  It may seem like a
s/NATIVE/LITTLE/ to you, but to me it's asking to completely scrub
msix.c for endian correctness.  Is this going to be the carrot you hold
out to accept the rest of the series?

Alex

> > ---
> > 
> >  hw/msix.c |  116 
> > ++---
> >  hw/pci.h  |   15 ++--
> >  2 files changed, 83 insertions(+), 48 deletions(-)
> > 
> > diff --git a/hw/msix.c b/hw/msix.c
> > index a4cdfb0..d476d07 100644
> > --- a/hw/msix.c
> > +++ b/hw/msix.c
> > @@ -37,7 +37,7 @@
> >  
> >  static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
> >  {
> > -uint8_t *table_entry = dev->msix_table_page + vector * 
> > PCI_MSIX_ENTRY_SIZE;
> > +uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
> >  MSIMessage msg;
> >  
> >  msg.address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
> > @@ -86,16 +86,6 @@ static int msix_add_config(struct PCIDevice *pdev, 
> > unsigned short nentries,
> >  return 0;
> >  }
> >  
> > -static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
> > -   unsigned size)
> > -{
> > -PCIDevice *dev = opaque;
> > -unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> > -void *page = dev->msix_table_page;
> > -
> > -return pci_get_long(page + offset);
> > -}
> > -
> >  static uint8_t msix_pending_mask(int vector)
> >  {
> >  return 1 << (vector % 8);
> > @@ -103,7 +93,7 @@ static uint8_t msix_pending_mask(int vector)
> >  
> >  static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
> >  {
> > -return dev->msix_table_page + MSIX_PAGE_PENDING + vector / 8;
> > +return dev->msix_pba + vector / 8;
> >  }
> >  
> >  static int msix_is_pending(PCIDevice *dev, int vector)
> > @@ -124,7 +114,7 @@ static void msix_clr_pending(PCIDevice *dev, int vector)
> >  static bool msix_vector_masked(PCIDevice *dev, int vector, bool fmask)
> >  {
> >  unsigned offset = vector * PCI_MSIX_ENTRY_SIZE + 
> > PCI_MSIX_ENTRY_VECTOR_CTRL;
> > -return fmask || dev->msix_table_page[offset] & 
> > PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > +return fmask || dev->msix_table[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
> >  }
> >  
> >  static bool msix_is_masked(PCIDevice *dev, int vector)
> > @@ -203,27 +193,46 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
> >  }
> >  }
> >  
> > -static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> > -uint64_t val, unsigned size)
> > +static uint64_t msix_table_mmio_read(void *opaque, target_phys_addr_t addr,
> > + unsigned size)
> >  {
> >  PCIDevice *dev = opaque;
> > -unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> > -int vector = offset / PCI_MSIX_ENTRY_SIZE;
> > -bool was_masked;
> >  
> > -/* MSI-X page includes a read-only PBA and a writeable Vector Control. 
> > */
> > -if (vector >= dev->msix_entries_nr) {
> > -return;
> > -}
> > +return pci_get_long(dev->msix_table + addr);
> > +}
> > +
> > +static void msix_table_mmio_write(void *opaque, target_phys_addr_t addr,
> > +  uint64_t val, unsigned size)
> > +{
> > +PCIDevice *dev = opaque;
> > +int vector = addr / PCI_MSIX_ENTRY_SIZE;
> > +bool was_masked;
> >  
> >  was_masked = msix_is_masked(dev, vector);
> > -pci_set_long(dev->msix_table_page + offset, val);
> > +pci_set_long(dev->msix_table + addr, val);
> >  msix_handle_mask_update(dev, vector, was_masked);
> >  }
> >  
> > -static const MemoryRegionOps msix_mmio_ops = {
> > -.read = msix_mmio_read,
> > -.write = msix_mmio_write,
> > +static const MemoryRegionOps msix_table_mmio_ops = {
> > +.read = msix_table_mmio_read,
> > +.write = msix_table_mmio_write,
> > +.endianness = DEVICE_NATIVE_ENDIAN,
> > +.valid = {
> > +.min_access_size = 4,
> > +.max_access_size = 4,
> > +},
> > +};
> > +
> > +static uint64_t msix_pba_mmio_read(void *opaque, target_phys_addr_t addr,
> > +   unsigned size)
> > +{
> > +PCIDevice *dev = opaque;
> > +
> > +return pci_get_long(dev->msix_pba + addr);
> > +}
> > +
> > +static const MemoryRegionOps msix_pba_mmio_ops = {
> > +.read = msix_pba_mmio_read,
> >  .endianness = DEVICE_NATIVE_ENDIAN,
> >  .valid = {
> >  .min_access_size = 4,
> > @@ -235,11 +244,14 @@ static void msix_mmio_setup(PCIDevice

Re: [Qemu-devel] [PATCH 5/5] scsi: Ensure command and transfer lengths are set for all SCSI stream commands.

2012-06-14 Thread Paolo Bonzini

Il 14/06/2012 15:55, Christian Borntraeger ha scritto:
> From: Christian Hoff 
> 
> scsi-generic relies on those values to be correct, so it is important that
> those values are initialized properly prior to submitting the SCSI command.

This and the similar code in patch 3/5 can be replaced by the attached
patch.  You can test the result at git://github.com/bonzini/qemu.git
branch scsi-devel.

Paolo
>From 387da72123720635aeb27b3b67ee3f060b926f3b Mon Sep 17 00:00:00 2001
From: Paolo Bonzini 
Date: Thu, 14 Jun 2012 16:13:49 +0200
Subject: [PATCH] scsi: Ensure command and transfer lengths are set for all
 SCSI devices

scsi-generic relies on those values to be correct, so it is important that
those values are initialized properly for all device types.

Reported-by: Christian Hoff 
Reported-by: Christian Borntraeger 
Signed-off-by: Paolo Bonzini 
---
 hw/scsi-bus.c |   25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 99e37b5..7ad6538 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -723,20 +723,16 @@ static int scsi_req_length(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf)
 switch (buf[0] >> 5) {
 case 0:
 cmd->xfer = buf[4];
-cmd->len = 6;
 break;
 case 1:
 case 2:
 cmd->xfer = lduw_be_p(&buf[7]);
-cmd->len = 10;
 break;
 case 4:
 cmd->xfer = ldl_be_p(&buf[10]) & 0xULL;
-cmd->len = 16;
 break;
 case 5:
 cmd->xfer = ldl_be_p(&buf[6]) & 0xULL;
-cmd->len = 12;
 break;
 default:
 return -1;
@@ -873,7 +869,6 @@ static int scsi_req_stream_length(SCSICommand *cmd, SCSIDevice *dev, uint8_t *bu
 case READ_REVERSE:
 case RECOVER_BUFFERED_DATA:
 case WRITE_6:
-cmd->len = 6;
 cmd->xfer = buf[4] | (buf[3] << 8) | (buf[2] << 16);
 if (buf[1] & 0x01) { /* fixed */
 cmd->xfer *= dev->blocksize;
@@ -883,7 +878,6 @@ static int scsi_req_stream_length(SCSICommand *cmd, SCSIDevice *dev, uint8_t *bu
 case READ_REVERSE_16:
 case VERIFY_16:
 case WRITE_16:
-cmd->len = 16;
 cmd->xfer = buf[14] | (buf[13] << 8) | (buf[12] << 16);
 if (buf[1] & 0x01) { /* fixed */
 cmd->xfer *= dev->blocksize;
@@ -891,7 +885,6 @@ static int scsi_req_stream_length(SCSICommand *cmd, SCSIDevice *dev, uint8_t *bu
 break;
 case REWIND:
 case LOAD_UNLOAD:
-cmd->len = 6;
 cmd->xfer = 0;
 break;
 case SPACE_16:
@@ -989,6 +982,24 @@ int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf)
 {
 int rc;
 
+switch (buf[0] >> 5) {
+case 0:
+cmd->len = 6;
+break;
+case 1:
+case 2:
+cmd->len = 10;
+break;
+case 4:
+cmd->len = 16;
+break;
+case 5:
+cmd->len = 12;
+break;
+default:
+return -1;
+}
+
 if (dev->type == TYPE_TAPE) {
 rc = scsi_req_stream_length(cmd, dev, buf);
 } else {
-- 
1.7.10.2

Re: [Qemu-devel] [PATCH v2 4/6] msix: Split PBA into it's own MemoryRegion

2012-06-14 Thread Alex Williamson

On Thu, 2012-06-14 at 08:13 +0200, Jan Kiszka wrote:
> On 2012-06-14 06:51, Alex Williamson wrote:
> >  
> > -memory_region_init_io(&dev->msix_mmio, &msix_mmio_ops, dev,
> > -  "msix", MSIX_PAGE_SIZE);
> > +memory_region_init_io(&dev->msix_table_mmio, &msix_table_mmio_ops, dev,
> > +  "msix", table_size);
> 
> "msix-table" (vs. "msix-pba")? Only if you have to repost anyway.

Ok.  Thanks,

Alex

Re: [Qemu-devel] [PATCH 4/6] qemu-img: add-cow will not support convert

2012-06-14 Thread Kevin Wolf

Am 14.06.2012 16:17, schrieb Dong Xu Wang:
> On Thu, Jun 14, 2012 at 10:11 PM, Kevin Wolf  wrote:
>> Am 14.06.2012 16:06, schrieb Dong Xu Wang:
>>> On Thu, Jun 14, 2012 at 6:51 PM, Kevin Wolf  wrote:
 Am 13.06.2012 16:36, schrieb Dong Xu Wang:
> add-cow file can't live alone, must together will image_file and 
> backing_file.
> If we implement qemu-img convert operation for add-cow file format, we 
> must
> create image_file and backing_file manually, that will be confused for 
> users,
> so we just ignore add-cow format while doing converting.
>
> Signed-off-by: Dong Xu Wang 

 NACK.

 This stupid "let's drop the feature, it might confuse users" attitude is
 known from Gnome, but not from qemu.

 There's no technical reason to forbid it and anyone who manages to
 create a valid add-cow image will also be able to specify the very same
 options to a convert command. Also, having image format specific code in
 qemu-img is evil.

>>>
>>> If I implement add-cow convert command, I am wondering which method should
>>> I use:
>>> 1) create add-cow, and its backing_file, and its image_file,  then we
>>> need 3 files.
>>> 2) create add-cow(with all bitmap marked to allocated), and its
>>> image_file, then we
>>> need 2 files.
>>>
>>> 2) will be easier, I should let .add-cow file can live only with
>>> image_file, without backing_file.
>>>
>>> I think both 1) and 2) need add code to qemu-img.c. Or I will have to create
>>> image_file automaticly in add_cow_create function.
>>>
>>> Can you give some comments on how to implement convert? Thanks.
>>
>> Just leave it alone and it will work.
>>
>> qemu-img convert takes the same options as qemu-img create. So like for
>> any other image you specify the backing file with -b or -o backing_file,
>> and for add-cow images to be successfully created you also need to
>> specify -o image_file.
>>
> "-o image_file", the image_file should be precreated? Or I should created it
> manually?

Yes, it must already exist. Just like with qemu-img create. To convert
an existing qcow2 image that has a backing file 'base.img' to an add-cow
image, you would use something like:

$ qemu-img create -f raw target.img 4G
$ qemu-img convert -O add-cow -o image_file=target.img -B base.img
source.qcow2 target.add-cow

Kevin

Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support

2012-06-14 Thread Zhi Hui Li


On 2012年06月11日 23:37, Jeff Cody wrote:

On 06/11/2012 10:24 AM, Stefan Hajnoczi wrote:

On Mon, Jun 11, 2012 at 1:50 PM, Kevin Wolf  wrote:

Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:

On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody  wrote:

On 06/08/2012 12:11 PM, Kevin Wolf wrote:

Am 08.06.2012 16:32, schrieb Jeff Cody:

On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:

On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody  wrote:

On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:

Let's figure out how to specify block-commit so we're all happy, that
way we can avoid duplicating work.  Any comments on my notes above?



I think we are almost completely on the same page - devil is in the
details, of course (for instance, on how to convert the destination base
from r/o to r/w).


Great.  The atomic r/o ->  r/w transition and the commit coroutine can
be implemented on in parallel.  Are you happy to implement the atomic
r/o ->  r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
assume that part already works and focus on implementing the commit
coroutine in the meantime.  I'm just suggesting a way to split up the
work, please let me know if you think this is good.


I am happy to do it that way.  I'll shift my focus to the atomic image
reopen in r/w mode.  I'll go ahead and post my diagrams and other info
for block-commit on the wiki, because I don't think it conflicts with we
discussed above (although I will modify my diagrams to not show commit
from the top-level image).  Of course, feel free to change it as
necessary.


I may have mentioned it before, but just in case: I think Supriya's
bdrv_reopen() patches are a good starting point. I don't know why they
were never completed, but I think we all agreed on the general design,
so it should be possible to pick that up.

Though if you have already started with your own work on it, Jeff, I
expect that it won't be much different because it's basically the same
transactional approach that you know and that we already used for group
snapshots.



I will definitely use parts of Supriya's as it makes sense - what I
started work on is similar to bdrv_append() and the current transaction
approach, so there will be plenty in common to reuse, even with some
differences.


I have CCed Supriya who has been working on the reopen patch series.
She is close to posting a new version.


It's just a bit disappointing that it takes several months between each
two versions of the patch series. We'd like to have this in qemu 1.2,
not only in qemu 1.14.

I can understand if someone can't find the time, but then allow at least
someone else to pick it up.


Hey, don't shoot the messenger :).  I just wanted add Supriya to CC so
she can join the discussion and see how much overlap there is with
what she's doing.  We all contribute to QEMU from different angles and
with different priorities.  If there is a time critical dependency on
the reopen code then discuss it here and the result will be that
someone officially drives the feature on.



I am more than happy to take the previous reopen() patches, and drive
those forward, and also do whatever else is needed for live block
commit.

It sounds like Zhi Hui is working on live block commit patches, and
Supriya is working on the bdrv_reopen() portion - I don't want to
duplicate any effort, but if there is anything I can do to help with
either of those areas, just let me know.

Thanks,
Jeff




Jeff please go ahead with block-commit, I
am finishing up my fdc async conversion patch series first.  I will
reply when I'm ready to work on block-commit.

Thank you very much!

Re: [Qemu-devel] [PATCH v2 1/6] msix: Add simple BAR allocation MSIX setup functions

2012-06-14 Thread Alex Williamson

On Thu, 2012-06-14 at 13:29 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 13, 2012 at 10:51:19PM -0600, Alex Williamson wrote:
> > msi_init() takes over a BAR without really specifying or allowing
> > specification of how it does so.  Instead, let's split it into
> > two interfaces, one fully specified, and one trivially easy.  This
> > implements the latter.  msix_init_exclusive_bar() takes over
> > allocating and filling a PCI BAR _exclusively_ for the use of MSIX.
> > When used, the matching msi_uninit_exclusive_bar() should be used
> > to tear it down.
> > 
> > Signed-off-by: Alex Williamson 
> > ---
> > 
> >  hw/msix.c |   43 +++
> >  hw/msix.h |3 +++
> >  hw/pci.h  |2 ++
> >  3 files changed, 48 insertions(+)
> > 
> > diff --git a/hw/msix.c b/hw/msix.c
> > index b64f109..a4cdfb0 100644
> > --- a/hw/msix.c
> > +++ b/hw/msix.c
> > @@ -299,6 +299,41 @@ err_config:
> >  return ret;
> >  }
> >  
> > +int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
> > +uint8_t bar_nr)
> > +{
> > +int ret;
> > +char *name;
> > +
> > +/*
> > + * Migration compatibility dictates that this remains a 4k
> > + * BAR with the vector table in the lower half and PBA in
> > + * the upper half.
> > + */
> > +if (nentries * PCI_MSIX_ENTRY_SIZE > 2048) {
> > +return -EINVAL;
> > +}
> > +
> > +if (asprintf(&name, "%s MSIX BAR", dev->name) == -1) {
> > +return -ENOMEM;
> > +}
> > +
> 
> I think we should avoid spaces in names.
> Also let's use lower case. And BAR is confusing IMO:
> it's not a base address register, it's the region.

It's a region backing a base address register.

> So I would prefer simply: "%s-msix".

Ok, fixed.  Thanks,

Alex

> > +memory_region_init(&dev->msix_exclusive_bar, name, 4096);
> > +
> > +free(name);
> > +
> > +ret = msix_init(dev, nentries, &dev->msix_exclusive_bar, bar_nr, 4096);
> > +if (ret) {
> > +memory_region_destroy(&dev->msix_exclusive_bar);
> > +return ret;
> > +}
> > +
> > +pci_register_bar(dev, bar_nr, PCI_BASE_ADDRESS_SPACE_MEMORY,
> > + &dev->msix_exclusive_bar);
> > +
> > +return 0;
> > +}
> > +
> >  static void msix_free_irq_entries(PCIDevice *dev)
> >  {
> >  int vector;
> > @@ -329,6 +364,14 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
> >  return 0;
> >  }
> >  
> > +void msix_uninit_exclusive_bar(PCIDevice *dev)
> > +{
> > +if (msix_present(dev)) {
> > +msix_uninit(dev, &dev->msix_exclusive_bar);
> > +memory_region_destroy(&dev->msix_exclusive_bar);
> > +}
> > +}
> > +
> >  void msix_save(PCIDevice *dev, QEMUFile *f)
> >  {
> >  unsigned n = dev->msix_entries_nr;
> > diff --git a/hw/msix.h b/hw/msix.h
> > index e5a488d..bed6bfb 100644
> > --- a/hw/msix.h
> > +++ b/hw/msix.h
> > @@ -7,11 +7,14 @@
> >  int msix_init(PCIDevice *pdev, unsigned short nentries,
> >MemoryRegion *bar,
> >unsigned bar_nr, unsigned bar_size);
> > +int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
> > +uint8_t bar_nr);
> >  
> >  void msix_write_config(PCIDevice *pci_dev, uint32_t address,
> > uint32_t val, int len);
> >  
> >  int msix_uninit(PCIDevice *d, MemoryRegion *bar);
> > +void msix_uninit_exclusive_bar(PCIDevice *dev);
> >  
> >  unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
> >  
> > diff --git a/hw/pci.h b/hw/pci.h
> > index 4c96268..d517a54 100644
> > --- a/hw/pci.h
> > +++ b/hw/pci.h
> > @@ -226,6 +226,8 @@ struct PCIDevice {
> >  
> >  /* Space to store MSIX table */
> >  uint8_t *msix_table_page;
> > +/* MemoryRegion container for msix exclusive BAR setup */
> > +MemoryRegion msix_exclusive_bar;
> >  /* MMIO index used to map MSIX table and pending bit entries. */
> >  MemoryRegion msix_mmio;
> >  /* Reference-count for entries actually in use by driver. */
>

Re: [Qemu-devel] [PATCH 1/1] Add usb option in machine options.

2012-06-14 Thread Andreas Färber

Am 14.06.2012 07:17, schrieb zhlci...@gmail.com:
> From: Li Zhang 
> 
> For pseries machine, it needs to enable usb
> to add kbd or usb mouse. -usb option won't
> be used in the future, and machine options
> is a better way to enable usb.
> 
> So this patch is to add usb option to machine
> options (-machine type=psereis,usb=on/off)
> to enable/disable usb controller.
> 
> In this patch, usb_opt is an global option
> which can be checked by machines. For example,
> on pseries, it will check if usb_opt is on, if
> it is on, it will create one usb ohci controller.
> As the following:
> if (usb_opts && strcmp(usb_opts, "on") == 0)
>  pci_create_simple(bus, -1, "pci-ohci");
> 
> In this patch, usb is on by default.
> So, for -nodefault, usb should be set off in the
> command line as the following:
>  -machine type=pseries,usb=off.
> 
> Signed-off-by: Li Zhang 
> reviewed-by:   Anthony Liguori 
> reviewed-by:   Benjamin Herrenschmidt 
> ---
>  qemu-config.c |4 
>  sysemu.h  |1 +
>  vl.c  |   12 
>  3 files changed, 17 insertions(+)
> 
> diff --git a/qemu-config.c b/qemu-config.c
> index bb3bff4..258712a 100644
> --- a/qemu-config.c
> +++ b/qemu-config.c
> @@ -583,6 +583,10 @@ static QemuOptsList qemu_machine_opts = {
>  .name = "dtb",
>  .type = QEMU_OPT_STRING,
>  .help = "Linux kernel device tree file",
> +}, {
> +.name = "usb",
> +.type = QEMU_OPT_BOOL,
> +.help = "Set on/off to enable/disable usb",
>  },
>  { /* End of list */ }
>  },
> diff --git a/sysemu.h b/sysemu.h
> index bc2c788..c5ea10d 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -13,6 +13,7 @@
>  /* vl.c */
>  
>  extern const char *bios_name;
> +extern const char *usb_opt;
>  
>  extern const char *qemu_name;
>  extern uint8_t qemu_uuid[];
> diff --git a/vl.c b/vl.c
> index 204d85b..10f8e4c 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -171,6 +171,7 @@ int main(int argc, char **argv)
>  
>  static const char *data_dir;
>  const char *bios_name = NULL;
> +const char *usb_opt = NULL;
>  enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
>  DisplayType display_type = DT_DEFAULT;
>  int display_remote = 0;

I'd be very surprised if Anthony has actually reviewed this...

The point of using machine options is so that you can use the QemuOpts
infrastructure to inquire this value, not to save more global state.
Especially not a string when all you want is a boolean value.

Further, in this patch it's only being assigned, not used anywhere.

Andreas

> @@ -758,6 +759,15 @@ static int bt_parse(const char *opt)
>  return 1;
>  }
>  
> +static int default_enable_usb(QemuOpts *opts)
> +{
> +if (NULL == qemu_opt_get(opts, "usb")) {
> +qemu_opt_set(opts, "usb", "on");
> +}
> +
> +return 0;
> +}
> +
>  /***/
>  /* QEMU Block devices */
>  
> @@ -3356,6 +3366,8 @@ int main(int argc, char **argv, char **envp)
>  kernel_filename = qemu_opt_get(machine_opts, "kernel");
>  initrd_filename = qemu_opt_get(machine_opts, "initrd");
>  kernel_cmdline = qemu_opt_get(machine_opts, "append");
> +default_enable_usb(machine_opts);
> +usb_opt = qemu_opt_get(machine_opts, "usb");
>  } else {
>  kernel_filename = initrd_filename = kernel_cmdline = NULL;
>  }


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 4/6] qemu-img: add-cow will not support convert

2012-06-14 Thread Dong Xu Wang

On Thu, Jun 14, 2012 at 10:24 PM, Kevin Wolf  wrote:
> Am 14.06.2012 16:17, schrieb Dong Xu Wang:
>> On Thu, Jun 14, 2012 at 10:11 PM, Kevin Wolf  wrote:
>>> Am 14.06.2012 16:06, schrieb Dong Xu Wang:
 On Thu, Jun 14, 2012 at 6:51 PM, Kevin Wolf  wrote:
> Am 13.06.2012 16:36, schrieb Dong Xu Wang:
>> add-cow file can't live alone, must together will image_file and 
>> backing_file.
>> If we implement qemu-img convert operation for add-cow file format, we 
>> must
>> create image_file and backing_file manually, that will be confused for 
>> users,
>> so we just ignore add-cow format while doing converting.
>>
>> Signed-off-by: Dong Xu Wang 
>
> NACK.
>
> This stupid "let's drop the feature, it might confuse users" attitude is
> known from Gnome, but not from qemu.
>
> There's no technical reason to forbid it and anyone who manages to
> create a valid add-cow image will also be able to specify the very same
> options to a convert command. Also, having image format specific code in
> qemu-img is evil.
>

 If I implement add-cow convert command, I am wondering which method should
 I use:
 1) create add-cow, and its backing_file, and its image_file,  then we
 need 3 files.
 2) create add-cow(with all bitmap marked to allocated), and its
 image_file, then we
     need 2 files.

 2) will be easier, I should let .add-cow file can live only with
 image_file, without backing_file.

 I think both 1) and 2) need add code to qemu-img.c. Or I will have to 
 create
 image_file automaticly in add_cow_create function.

 Can you give some comments on how to implement convert? Thanks.
>>>
>>> Just leave it alone and it will work.
>>>
>>> qemu-img convert takes the same options as qemu-img create. So like for
>>> any other image you specify the backing file with -b or -o backing_file,
>>> and for add-cow images to be successfully created you also need to
>>> specify -o image_file.
>>>
>> "-o image_file", the image_file should be precreated? Or I should created it
>> manually?
>
> Yes, it must already exist. Just like with qemu-img create. To convert
> an existing qcow2 image that has a backing file 'base.img' to an add-cow
> image, you would use something like:
>
> $ qemu-img create -f raw target.img 4G
> $ qemu-img convert -O add-cow -o image_file=target.img -B base.img
> source.qcow2 target.add-cow
>
> Kevin
>
Got it, thanks

Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support

2012-06-14 Thread Jeff Cody

On 06/14/2012 10:23 AM, Zhi Hui Li wrote:
> On 2012年06月11日 23:37, Jeff Cody wrote:
>> On 06/11/2012 10:24 AM, Stefan Hajnoczi wrote:
>>> On Mon, Jun 11, 2012 at 1:50 PM, Kevin Wolf  wrote:
 Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:
> On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody  wrote:
>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
 On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody  wrote:
>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>> Let's figure out how to specify block-commit so we're all happy, 
>>> that
>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>
>>
>> I think we are almost completely on the same page - devil is in the
>> details, of course (for instance, on how to convert the destination 
>> base
>> from r/o to r/w).
>
> Great.  The atomic r/o ->  r/w transition and the commit coroutine can
> be implemented on in parallel.  Are you happy to implement the atomic
> r/o ->  r/w transition since you wrote bdrv_append()?  Then Zhi Hui 
> can
> assume that part already works and focus on implementing the commit
> coroutine in the meantime.  I'm just suggesting a way to split up the
> work, please let me know if you think this is good.

 I am happy to do it that way.  I'll shift my focus to the atomic image
 reopen in r/w mode.  I'll go ahead and post my diagrams and other info
 for block-commit on the wiki, because I don't think it conflicts with 
 we
 discussed above (although I will modify my diagrams to not show commit
 from the top-level image).  Of course, feel free to change it as
 necessary.
>>>
>>> I may have mentioned it before, but just in case: I think Supriya's
>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>> were never completed, but I think we all agreed on the general design,
>>> so it should be possible to pick that up.
>>>
>>> Though if you have already started with your own work on it, Jeff, I
>>> expect that it won't be much different because it's basically the same
>>> transactional approach that you know and that we already used for group
>>> snapshots.
>>>
>>
>> I will definitely use parts of Supriya's as it makes sense - what I
>> started work on is similar to bdrv_append() and the current transaction
>> approach, so there will be plenty in common to reuse, even with some
>> differences.
>
> I have CCed Supriya who has been working on the reopen patch series.
> She is close to posting a new version.

 It's just a bit disappointing that it takes several months between each
 two versions of the patch series. We'd like to have this in qemu 1.2,
 not only in qemu 1.14.

 I can understand if someone can't find the time, but then allow at least
 someone else to pick it up.
>>>
>>> Hey, don't shoot the messenger :).  I just wanted add Supriya to CC so
>>> she can join the discussion and see how much overlap there is with
>>> what she's doing.  We all contribute to QEMU from different angles and
>>> with different priorities.  If there is a time critical dependency on
>>> the reopen code then discuss it here and the result will be that
>>> someone officially drives the feature on.
>>>
>>
>> I am more than happy to take the previous reopen() patches, and drive
>> those forward, and also do whatever else is needed for live block
>> commit.
>>
>> It sounds like Zhi Hui is working on live block commit patches, and
>> Supriya is working on the bdrv_reopen() portion - I don't want to
>> duplicate any effort, but if there is anything I can do to help with
>> either of those areas, just let me know.
>>
>> Thanks,
>> Jeff
>>
>>
>>
> Jeff please go ahead with block-commit, I
> am finishing up my fdc async conversion patch series first.  I will
> reply when I'm ready to work on block-commit.
> 
> Thank you very much!
> 

Hi Zhi,

I will do that.  Thanks!

Jeff

[Qemu-devel] [PATCH] cris: Add break support for v10

2012-06-14 Thread Edgar E. Iglesias

I seem to have missed to push this upstream...

commit f756c7a723faa3a21dcb6bb6806e77f1628019f5
Author: Edgar E. Iglesias 
Date:   Tue Jul 5 12:56:41 2011 +0200

cris: Add break support for v10.

Still no retb

Signed-off-by: Edgar E. Iglesias 

diff --git a/target-cris/cpu.h b/target-cris/cpu.h
index a760367..73004af 100644
--- a/target-cris/cpu.h
+++ b/target-cris/cpu.h
@@ -64,6 +64,7 @@
 #define PR_NRP 12
 #define PR_CCS 13
 #define PR_USP 14
+#define PRV10_BRP 14
 #define PR_SPC 15
 
 /* CPU flags.  */
diff --git a/target-cris/helper.c b/target-cris/helper.c
index 8680f43..dcc19ef 100644
--- a/target-cris/helper.c
+++ b/target-cris/helper.c
@@ -121,14 +121,14 @@ static void do_interruptv10(CPUCRISState *env)
/* These exceptions are genereated by the core itself.
   ERP should point to the insn following the brk.  */
ex_vec = env->trap_vector;
-   env->pregs[PR_ERP] = env->pc;
+   env->pregs[PRV10_BRP] = env->pc;
break;
 
case EXCP_NMI:
/* NMI is hardwired to vector zero.  */
ex_vec = 0;
env->pregs[PR_CCS] &= ~M_FLAG;
-   env->pregs[PR_NRP] = env->pc;
+   env->pregs[PRV10_BRP] = env->pc;
break;
 
case EXCP_BUSFAULT:
diff --git a/target-cris/translate_v10.c b/target-cris/translate_v10.c
index 4ada3ed..3629629 100644
--- a/target-cris/translate_v10.c
+++ b/target-cris/translate_v10.c
@@ -1132,6 +1132,7 @@ static unsigned int dec10_ind(DisasContext *dc)
 LOG_DIS("break %d\n", dc->src);
 cris_evaluate_flags(dc);
 tcg_gen_movi_tl(env_pc, dc->pc + 2);
+t_gen_mov_env_TN(trap_vector, tcg_const_tl(dc->src + 2));
 t_gen_raise_exception(EXCP_BREAK);
 dc->is_jmp = DISAS_UPDATE;
 return insn_len;

Re: [Qemu-devel] [PATCH V12 5/9] Revert "pci: don't export an internal function"

2012-06-14 Thread Anthony PERARD


On 13/06/12 13:48, Jan Kiszka wrote:

>>  Will you be able to use an address parser via some device property?

>
>  Yes. Actually, the address is already a device property.
>

Great. Then we should try to come up with a common pci-host-devaddr
property everyone can use.


Is this patch would be a good common property?

It's probably close to the patch you send earlier Jan.

If it's good, I'll resend my passthrough patch series with this patch.


diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index b7b5597..9c7a271 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -928,6 +928,104 @@ PropertyInfo qdev_prop_blocksize = {
 .max   = 65024,
 };

+/* --- pci host address --- */
+
+static void get_pci_host_devaddr(Object *obj, Visitor *v, void *opaque,
+ const char *name, Error **errp)
+{
+DeviceState *dev = DEVICE(obj);
+Property *prop = opaque;
+PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
+char buffer[4 + 2 + 2 + 2 + 1];
+char *p = buffer;
+
+snprintf(buffer, sizeof(buffer), "%04x:%02x:%02x.%d",
+ addr->domain, addr->bus, addr->slot, addr->function);
+
+visit_type_str(v, &p, name, errp);
+}
+
+/*
+ * Parse [:]:.
+ *   if  is not supplied, it's assumed to be 0.
+ */
+static void set_pci_host_devaddr(Object *obj, Visitor *v, void *opaque,
+ const char *name, Error **errp)
+{
+DeviceState *dev = DEVICE(obj);
+Property *prop = opaque;
+PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
+Error *local_err = NULL;
+char *str, *p;
+char *e;
+unsigned long val;
+unsigned long dom = 0, bus = 0;
+unsigned int slot = 0, func = 0;
+
+if (dev->state != DEV_STATE_CREATED) {
+error_set(errp, QERR_PERMISSION_DENIED);
+return;
+}
+
+visit_type_str(v, &str, name, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+p = str;
+val = strtoul(p, &e, 16);
+if (e == p || *e != ':')
+goto inval;
+bus = val;
+
+p = e + 1;
+val = strtoul(p, &e, 16);
+if (e == p)
+goto inval;
+if (*e == ':') {
+dom = bus;
+bus = val;
+p = e + 1;
+val = strtoul(p, &e, 16);
+if (e == p)
+goto inval;
+}
+slot = val;
+
+if (*e != '.')
+goto inval;
+p = e + 1;
+val = strtoul(p, &e, 10);
+if (e == p)
+goto inval;
+func = val;
+
+if (dom > 0x || bus > 0xff || slot > 0x1f || func > 7)
+goto inval;
+
+if (*e)
+goto inval;
+
+addr->domain = dom;
+addr->bus = bus;
+addr->slot = slot;
+addr->function = func;
+
+g_free(str);
+return;
+
+inval:
+error_set_from_qdev_prop_error(errp, EINVAL, dev, prop, str);
+g_free(str);
+}
+
+PropertyInfo qdev_prop_pci_host_devaddr = {
+.name = "pci-host-devaddr",
+.get = get_pci_host_devaddr,
+.set = set_pci_host_devaddr,
+};
+
 /* --- public helpers --- */

 static Property *qdev_prop_walk(Property *props, const char *name)
diff --git a/hw/qdev.h b/hw/qdev.h
index 4e90119..9a54ebe 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -225,6 +225,7 @@ extern PropertyInfo qdev_prop_netdev;
 extern PropertyInfo qdev_prop_vlan;
 extern PropertyInfo qdev_prop_pci_devfn;
 extern PropertyInfo qdev_prop_blocksize;
+extern PropertyInfo qdev_prop_pci_host_devaddr;

 #define DEFINE_PROP(_name, _state, _field, _prop, _type) { \
 .name  = (_name),\
@@ -288,6 +289,8 @@ extern PropertyInfo qdev_prop_blocksize;
 LostTickPolicy)
 #define DEFINE_PROP_BLOCKSIZE(_n, _s, _f, _d) \
 DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_blocksize, uint16_t)
+#define DEFINE_PROP_PCI_HOST_DEVADDR(_n, _s, _f) \
+DEFINE_PROP(_n, _s, _f, qdev_prop_pci_host_devaddr, 
PCIHostDeviceAddress)


 #define DEFINE_PROP_END_OF_LIST()   \
 {}
diff --git a/qemu-common.h b/qemu-common.h
index 91e0562..0d6e51c 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -274,6 +274,13 @@ typedef enum LostTickPolicy {
 LOST_TICK_MAX
 } LostTickPolicy;

+typedef struct PCIHostDeviceAddress {
+unsigned int domain;
+unsigned int bus;
+unsigned int slot;
+unsigned int function;
+} PCIHostDeviceAddress;
+
 void tcg_exec_init(unsigned long tb_size);
 bool tcg_enabled(void);


--
Anthony PERARD

[Qemu-devel] [PATCH] configure: properly check if -lrt and -lm is needed

2012-06-14 Thread Natanael Copa

Fixes build against uClibc.

uClibc provides 2 versions of clock_gettime(), one with realtime
support and one without (this is so you can avoid linking in -lrt
unless actually needed). This means that the clock_gettime() don't
need -lrt. We still need it for timer_create() so we check for this
function in addition.

We also need check if -lm is needed for isnan().

Signed-off-by: Natanael Copa 
---
 Makefile|4 ++--
 Makefile.target |4 +---
 configure   |   33 +++--
 3 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 32550cb..9dfa01a 100644
--- a/Makefile
+++ b/Makefile
@@ -35,7 +35,7 @@ configure: ;
 
 $(call set-vpath, $(SRC_PATH))
 
-LIBS+=-lz $(LIBS_TOOLS)
+LIBS+=-lz $(LIBS_TOOLS) $(LIBM) $(LIBRT)
 
 HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
 
@@ -172,7 +172,7 @@ qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN  
 $@")
 
 qapi-dir := $(BUILD_DIR)/qapi-generated
-qemu-ga$(EXESUF): LIBS = $(LIBS_QGA)
+qemu-ga$(EXESUF): LIBS = $(LIBS_QGA) $(LIBRT) $(LIBM)
 qemu-ga$(EXESUF): QEMU_CFLAGS += -I $(qapi-dir)
 
 gen-out-type = $(subst .,-,$(suffix $@))
diff --git a/Makefile.target b/Makefile.target
index 2907aad..d214d2c 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -34,9 +34,7 @@ PROGS+=$(QEMU_PROGW)
 endif
 STPFILES=
 
-ifndef CONFIG_HAIKU
-LIBS+=-lm
-endif
+LIBS+=$(LIBM) $(LIBRT)
 
 config-target.h: config-target.h-timestamp
 config-target.h-timestamp: config-target.mak
diff --git a/configure b/configure
index c2366ee..f925973 100755
--- a/configure
+++ b/configure
@@ -102,6 +102,8 @@ audio_win_int=""
 cc_i386=i386-pc-linux-gnu-gcc
 libs_qga=""
 debug_info="yes"
+libm=""
+librt=""
 
 target_list=""
 
@@ -2568,17 +2570,42 @@ fi
 
 
 ##
+# Do we need libm
+cat > $TMPC << EOF
+#include 
+int main(void) { return isnan(0.0); }
+EOF
+if compile_prog "" "" ; then
+  libm=
+elif compile_prog "" "-lm" ; then
+  libm="-lm"
+else
+  echo
+  echo "Error: libm check failed"
+  echo
+  exit 1
+fi
+
+##
 # Do we need librt
 cat > $TMPC <
 #include 
-int main(void) { return clock_gettime(CLOCK_REALTIME, NULL); }
+int main(void) {
+  timer_create(CLOCK_REALTIME, NULL, NULL);
+  return clock_gettime(CLOCK_REALTIME, NULL);
+}
 EOF
 
 if compile_prog "" "" ; then
   :
 elif compile_prog "" "-lrt" ; then
-  LIBS="-lrt $LIBS"
+  librt="-lrt"
+else
+  echo
+  echo "Error: librt check failed"
+  echo
+  exit 1
 fi
 
 if test "$darwin" != "yes" -a "$mingw32" != "yes" -a "$solaris" != yes -a \
@@ -3442,6 +3469,8 @@ echo "LIBS_TOOLS+=$libs_tools" >> $config_host_mak
 echo "EXESUF=$EXESUF" >> $config_host_mak
 echo "LIBS_QGA+=$libs_qga" >> $config_host_mak
 echo "POD2MAN=$POD2MAN" >> $config_host_mak
+echo "LIBM=$libm" >> $config_host_mak
+echo "LIBRT=$librt" >> $config_host_mak
 
 # generate list of library paths for linker script
 
-- 
1.7.10.4

Re: [Qemu-devel] Does kvm_fd mean KVM or VCPU fd?

2012-06-14 Thread Andreas Färber

Am 14.06.2012 12:16, schrieb 陳韋任 (Wei-Ren Chen):
>>>   I think KVM_CREATE_VCPU should return VCPU fd, right? AFAIK, in KVM
>>> world, kvm_fd usually means the fd we get after opening "/dev/kvm".
>>> Just want to make sure I understand the code correcly. Thanks.
>>
>> This is the kvm_fd inside struct CPUState, so it's per-CPU.
> 
>   Okay, then it's VCPU fd. Don't know rename it or add a comment to indicate
> it's a VCPU fd is a good idea. ;)

Not a good idea, those fields need to move from CPU*State to CPUState.
We can rename them then if someone has a good suggestion.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH v2 4/6] msix: Split PBA into it's own MemoryRegion

2012-06-14 Thread Michael S. Tsirkin

On Thu, Jun 14, 2012 at 08:21:39AM -0600, Alex Williamson wrote:
> On Thu, 2012-06-14 at 13:24 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 13, 2012 at 10:51:47PM -0600, Alex Williamson wrote:
> > > These don't have to be contiguous.  Size them to only what
> > > they need and use separate MemoryRegions for the vector
> > > table and PBA.
> > > 
> > > Signed-off-by: Alex Williamson 
> > 
> > Why is this still using NATIVE?
> 
> Because the bug already exists,

We have lots of broken code. The way progress happens here is
such code is in a kind of freeze until fixed. This way whoever needs new
features gets to fix the bugs too.

> this patch doesn't make it worse, so at best it's a tangentially related 
> additional fix.
> It may seem like a s/NATIVE/LITTLE/ to you, but to me it's asking to 
> completely scrub
> msix.c for endian correctness.  Is this going to be the carrot you hold
> out to accept the rest of the series?
> 
> Alex

Unfortunately no promises yet, and that is because you basically decided
to rewrite lots of code in your preferred style while also adding new 
functionality.
If changes were done in small steps, then I could apply things we can
agree on and defer the ones we don't.  Sometimes it's hard, but clearly
not in this case.

> > > ---
> > > 
> > >  hw/msix.c |  116 
> > > ++---
> > >  hw/pci.h  |   15 ++--
> > >  2 files changed, 83 insertions(+), 48 deletions(-)
> > > 
> > > diff --git a/hw/msix.c b/hw/msix.c
> > > index a4cdfb0..d476d07 100644
> > > --- a/hw/msix.c
> > > +++ b/hw/msix.c
> > > @@ -37,7 +37,7 @@
> > >  
> > >  static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
> > >  {
> > > -uint8_t *table_entry = dev->msix_table_page + vector * 
> > > PCI_MSIX_ENTRY_SIZE;
> > > +uint8_t *table_entry = dev->msix_table + vector * 
> > > PCI_MSIX_ENTRY_SIZE;
> > >  MSIMessage msg;
> > >  
> > >  msg.address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
> > > @@ -86,16 +86,6 @@ static int msix_add_config(struct PCIDevice *pdev, 
> > > unsigned short nentries,
> > >  return 0;
> > >  }
> > >  
> > > -static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
> > > -   unsigned size)
> > > -{
> > > -PCIDevice *dev = opaque;
> > > -unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> > > -void *page = dev->msix_table_page;
> > > -
> > > -return pci_get_long(page + offset);
> > > -}
> > > -
> > >  static uint8_t msix_pending_mask(int vector)
> > >  {
> > >  return 1 << (vector % 8);
> > > @@ -103,7 +93,7 @@ static uint8_t msix_pending_mask(int vector)
> > >  
> > >  static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
> > >  {
> > > -return dev->msix_table_page + MSIX_PAGE_PENDING + vector / 8;
> > > +return dev->msix_pba + vector / 8;
> > >  }
> > >  
> > >  static int msix_is_pending(PCIDevice *dev, int vector)
> > > @@ -124,7 +114,7 @@ static void msix_clr_pending(PCIDevice *dev, int 
> > > vector)
> > >  static bool msix_vector_masked(PCIDevice *dev, int vector, bool fmask)
> > >  {
> > >  unsigned offset = vector * PCI_MSIX_ENTRY_SIZE + 
> > > PCI_MSIX_ENTRY_VECTOR_CTRL;
> > > -return fmask || dev->msix_table_page[offset] & 
> > > PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > > +return fmask || dev->msix_table[offset] & 
> > > PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > >  }
> > >  
> > >  static bool msix_is_masked(PCIDevice *dev, int vector)
> > > @@ -203,27 +193,46 @@ void msix_write_config(PCIDevice *dev, uint32_t 
> > > addr,
> > >  }
> > >  }
> > >  
> > > -static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> > > -uint64_t val, unsigned size)
> > > +static uint64_t msix_table_mmio_read(void *opaque, target_phys_addr_t 
> > > addr,
> > > + unsigned size)
> > >  {
> > >  PCIDevice *dev = opaque;
> > > -unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> > > -int vector = offset / PCI_MSIX_ENTRY_SIZE;
> > > -bool was_masked;
> > >  
> > > -/* MSI-X page includes a read-only PBA and a writeable Vector 
> > > Control. */
> > > -if (vector >= dev->msix_entries_nr) {
> > > -return;
> > > -}
> > > +return pci_get_long(dev->msix_table + addr);
> > > +}
> > > +
> > > +static void msix_table_mmio_write(void *opaque, target_phys_addr_t addr,
> > > +  uint64_t val, unsigned size)
> > > +{
> > > +PCIDevice *dev = opaque;
> > > +int vector = addr / PCI_MSIX_ENTRY_SIZE;
> > > +bool was_masked;
> > >  
> > >  was_masked = msix_is_masked(dev, vector);
> > > -pci_set_long(dev->msix_table_page + offset, val);
> > > +pci_set_long(dev->msix_table + addr, val);
> > >  msix_handle_mask_update(dev, vector, was_masked);
> > >  }
> > >  
> > > -static const MemoryRegionOps msix_mmio_ops = {
> > > -.read = msix_mmio_read,
> > > -.wr

[Qemu-devel] [PATCH 0/2] introduce bdrv_swap, implement bdrv_append on top

2012-06-14 Thread Paolo Bonzini

Yet another tiny bit extracted from block mirroring, looks like it
should be useful for block commit too.

Paolo Bonzini (2):
  block: copy over job and dirty bitmap fields in bdrv_append
  block: introduce bdrv_swap, implement bdrv_append on top of it

 block.c |  175 +--
 block.h |1 +
 2 files changed, 103 insertions(+), 73 deletions(-)

-- 
1.7.10.2

[Qemu-devel] [PATCH 2/2] block: introduce bdrv_swap, implement bdrv_append on top of it

2012-06-14 Thread Paolo Bonzini

The new function can be made a bit nicer than bdrv_append.  It swaps the
whole contents, and then swaps back (using the usual t=a;a=b;b=t idiom)
the fields that need to stay on top.  Thus, it does not need explicit
bdrv_detach_dev, bdrv_iostatus_disable, etc.

Signed-off-by: Paolo Bonzini 
---
 block.c |  184 ++-
 block.h |1 +
 2 files changed, 100 insertions(+), 85 deletions(-)

diff --git a/block.c b/block.c
index 702821d..0991ded 100644
--- a/block.c
+++ b/block.c
@@ -971,116 +971,130 @@ static void bdrv_rebind(BlockDriverState *bs)
 }
 }
 
-/*
- * Add new bs contents at the top of an image chain while the chain is
- * live, while keeping required fields on the top layer.
- *
- * This will modify the BlockDriverState fields, and swap contents
- * between bs_new and bs_top. Both bs_new and bs_top are modified.
- *
- * bs_new is required to be anonymous.
- *
- * This function does not create any image files.
- */
-void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
+static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
+ BlockDriverState *bs_src)
 {
-BlockDriverState tmp;
-
-/* bs_new must be anonymous */
-assert(bs_new->device_name[0] == '\0');
-
-tmp = *bs_new;
-
-/* there are some fields that need to stay on the top layer: */
-tmp.open_flags= bs_top->open_flags;
+/* move some fields that need to stay attached to the device */
+bs_dest->open_flags = bs_src->open_flags;
 
 /* dev info */
-tmp.dev_ops   = bs_top->dev_ops;
-tmp.dev_opaque= bs_top->dev_opaque;
-tmp.dev   = bs_top->dev;
-tmp.buffer_alignment  = bs_top->buffer_alignment;
-tmp.copy_on_read  = bs_top->copy_on_read;
+bs_dest->dev_ops= bs_src->dev_ops;
+bs_dest->dev_opaque = bs_src->dev_opaque;
+bs_dest->dev= bs_src->dev;
+bs_dest->buffer_alignment   = bs_src->buffer_alignment;
+bs_dest->copy_on_read   = bs_src->copy_on_read;
 
-tmp.enable_write_cache = bs_top->enable_write_cache;
+bs_dest->enable_write_cache = bs_src->enable_write_cache;
 
 /* i/o timing parameters */
-tmp.slice_time= bs_top->slice_time;
-tmp.slice_start   = bs_top->slice_start;
-tmp.slice_end = bs_top->slice_end;
-tmp.io_limits = bs_top->io_limits;
-tmp.io_base   = bs_top->io_base;
-tmp.throttled_reqs= bs_top->throttled_reqs;
-tmp.block_timer   = bs_top->block_timer;
-tmp.io_limits_enabled = bs_top->io_limits_enabled;
+bs_dest->slice_time = bs_src->slice_time;
+bs_dest->slice_start= bs_src->slice_start;
+bs_dest->slice_end  = bs_src->slice_end;
+bs_dest->io_limits  = bs_src->io_limits;
+bs_dest->io_base= bs_src->io_base;
+bs_dest->throttled_reqs = bs_src->throttled_reqs;
+bs_dest->block_timer= bs_src->block_timer;
+bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
 
 /* geometry */
-tmp.cyls  = bs_top->cyls;
-tmp.heads = bs_top->heads;
-tmp.secs  = bs_top->secs;
-tmp.translation   = bs_top->translation;
+bs_dest->cyls   = bs_src->cyls;
+bs_dest->heads  = bs_src->heads;
+bs_dest->secs   = bs_src->secs;
+bs_dest->translation= bs_src->translation;
 
 /* r/w error */
-tmp.on_read_error = bs_top->on_read_error;
-tmp.on_write_error= bs_top->on_write_error;
+bs_dest->on_read_error  = bs_src->on_read_error;
+bs_dest->on_write_error = bs_src->on_write_error;
 
 /* i/o status */
-tmp.iostatus_enabled  = bs_top->iostatus_enabled;
-tmp.iostatus  = bs_top->iostatus;
+bs_dest->iostatus_enabled   = bs_src->iostatus_enabled;
+bs_dest->iostatus   = bs_src->iostatus;
 
 /* dirty bitmap */
-tmp.dirty_count   = bs_top->dirty_count;
-tmp.dirty_bitmap  = bs_top->dirty_bitmap;
-assert(bs_new->dirty_bitmap == NULL);
+bs_dest->dirty_count= bs_src->dirty_count;
+bs_dest->dirty_bitmap   = bs_src->dirty_bitmap;
 
 /* job */
-tmp.in_use= bs_top->in_use;
-tmp.job   = bs_top->job;
-assert(bs_new->job == NULL);
+bs_dest->in_use = bs_src->in_use;
+bs_dest->job= bs_src->job;
 
 /* keep the same entry in bdrv_states */
-pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name);
-tmp.list = bs_top->list;
+pstrcpy(bs_dest->device_name, sizeof(bs_dest->device_name),
+bs_src->device_name);
+bs_dest->list = bs_src->list;
+}
 
-/* The contents of 'tmp' will become bs_top, as we are
- * swapping bs_new and bs_top contents. */
-tmp.backing_hd = bs_new;
-pstrcpy(tmp.backing_file, sizeof

[Qemu-devel] [PATCH 1/2] block: copy over job and dirty bitmap fields in bdrv_append

2012-06-14 Thread Paolo Bonzini

While these should not be in use at the time a transaction is started,
a command in the prepare phase of a transaction might have added them,
so they need to be brought over.

Signed-off-by: Paolo Bonzini 
---
 block.c |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/block.c b/block.c
index 0acdcac..702821d 100644
--- a/block.c
+++ b/block.c
@@ -1027,6 +1027,16 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top)
 tmp.iostatus_enabled  = bs_top->iostatus_enabled;
 tmp.iostatus  = bs_top->iostatus;
 
+/* dirty bitmap */
+tmp.dirty_count   = bs_top->dirty_count;
+tmp.dirty_bitmap  = bs_top->dirty_bitmap;
+assert(bs_new->dirty_bitmap == NULL);
+
+/* job */
+tmp.in_use= bs_top->in_use;
+tmp.job   = bs_top->job;
+assert(bs_new->job == NULL);
+
 /* keep the same entry in bdrv_states */
 pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name);
 tmp.list = bs_top->list;
@@ -1051,6 +1061,11 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top)
 /* clear the copied fields in the new backing file */
 bdrv_detach_dev(bs_new, bs_new->dev);
 
+bs_new->job= NULL;
+bs_new->in_use = 0;
+bs_new->dirty_bitmap   = NULL;
+bs_new->dirty_count= 0;
+
 qemu_co_queue_init(&bs_new->throttled_reqs);
 memset(&bs_new->io_base,   0, sizeof(bs_new->io_base));
 memset(&bs_new->io_limits, 0, sizeof(bs_new->io_limits));
-- 
1.7.10.2

Re: [Qemu-devel] [PATCH v2] qapi: converted commit

2012-06-14 Thread Pavel Hrdina


On 06/14/2012 02:18 PM, Eric Blake wrote:

On 06/14/2012 01:35 AM, Pavel Hrdina wrote:

Signed-off-by: Pavel Hrdina
---
+++ b/qapi-schema.json
@@ -1169,6 +1169,21 @@
  { 'command': 'block_resize', 'data': { 'device': 'str', 'size': 'int' }}

  ##
+# @commit
+#
+# Commit changes to the disk images (if -snapshot is used) or backing files.
+#
+# @device: the name of the device or the "all" to commit all devices
+#
+# Returns: nothing on success
+#  If @device is not a valid block device, DeviceNotFound
+#  If a long-running operation is using the device, DeviceInUse
+#
+# Since: 1.2
+##
+{ 'command': 'commit', 'data': { 'device': 'str' }}

Should we use this as an opportunity to make the command more powerful?
  For example, integrating this with the 'transaction' command or a block
job queried by 'query-block-jobs' to track its progress would be useful.
  Also, suppose I have A<- B<- C.  Does 'commit' only do one layer (C
into B), or all layers (B and C into A)?  That argues that we need an
optional parameter that says how deep to commit (committing C into B
only to repeat and commit B into A is more time-consuming than directly
committing both B and C into A to start with).  When a commit is
complete, which file is backing the device - is it still C (which
continues to diverge, but now from the point of the commit) or does qemu
pivot things to have the device now backed by B (and C can be discarded,
particularly true if changes are now going into B which invalidate C).


What i find out is that 'commit' will commit changes only from C to B 
and qemu continues with C from the new commit point. I couldn't find a 
way to commit changes and go back to backing file. This should be 
supported by parameter and also as you mention that commit all changes 
through all snapshots should be supported by another parameter.

The 'transaction' feature would be nice to have too.

Pavel

1 2 3 >

1 - 100 of 214 matches

Mail list logo