date:20191015

[Bug 1846451] Re: K800 keyboard no longer works when attached to a VM

2019-10-15 Thread Gerd Hoffmann

https://patchwork.ozlabs.org/patch/1176777/

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1846451

Title:
  K800 keyboard no longer works when attached to a VM

Status in QEMU:
  New

Bug description:
  I use Logitech K800 keyboard which is connected to a PC through
  Logitech unifying receiver. In order to control my windows VM i attach
  unifying receiver USB device to a VM using "virsh attach-device VM-
  Name ./device.xml". Device ID as seen in lsusb is 046d:c52b.

  As of v4.1.0 keyboard no longer works when attached to a windows VM.
  When attached receiver is still at least partially functional.
  Logitech pairing utility properly displays paired keyboard, pressing
  buttons on the keyboard shows changing indicator icon in pairing
  utility. Pairing and unpairing works. Pressing keys however fails to
  register any key presses.

  Downgrading to v4.0.0 fixes the issue.

  device.xml used to attach USB device:
  ```
  
  
  
  
  
  

  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1846451/+subscriptions

[Bug 1846451] Re: K800 keyboard no longer works when attached to a VM

2019-10-15 Thread Rokas Kupstys

Could you please clarify how this config is supposed to be used? I would
test your patch.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1846451

Title:
  K800 keyboard no longer works when attached to a VM

Status in QEMU:
  New

Bug description:
  I use Logitech K800 keyboard which is connected to a PC through
  Logitech unifying receiver. In order to control my windows VM i attach
  unifying receiver USB device to a VM using "virsh attach-device VM-
  Name ./device.xml". Device ID as seen in lsusb is 046d:c52b.

  As of v4.1.0 keyboard no longer works when attached to a windows VM.
  When attached receiver is still at least partially functional.
  Logitech pairing utility properly displays paired keyboard, pressing
  buttons on the keyboard shows changing indicator icon in pairing
  utility. Pairing and unpairing works. Pressing keys however fails to
  register any key presses.

  Downgrading to v4.0.0 fixes the issue.

  device.xml used to attach USB device:
  ```
  
  
  
  
  
  

  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1846451/+subscriptions

Re: [PATCH v4 1/8] target/mips: Clean up helper.c

2019-10-15 Thread Markus Armbruster

Aleksandar Markovic  writes:

> --07f6800594da656e
> Content-Type: text/plain; charset="UTF-8"
>
> On Monday, October 14, 2019, Markus Armbruster  wrote:
>
>> Aleksandar Markovic  writes:
>>
>> > From: Aleksandar Markovic 
>> >
>> > Mostly fix errors and warnings reported by 'checkpatch.pl -f'.
>> >
>> > Signed-off-by: Aleksandar Markovic 
>> > ---
>> >  target/mips/helper.c | 128 ++
>> +
>> >  1 file changed, 78 insertions(+), 50 deletions(-)
>> >
>> > diff --git a/target/mips/helper.c b/target/mips/helper.c
>> > index a2b6459..2411a2c 100644
>> > --- a/target/mips/helper.c
>> > +++ b/target/mips/helper.c
[...]
>> > @@ -130,8 +133,11 @@ static int is_seg_am_mapped(unsigned int am, bool eu, 
>> > int mmu_idx)
>> >  int32_t adetlb_mask;
>> >
>> >  switch (mmu_idx) {
>> > -case 3 /* ERL */:
>> > -/* If EU is set, always unmapped */
>> > +case 3:
>> > +/*
>> > + * ERL
>> > + * If EU is set, always unmapped
>> > + */
>> >  if (eu) {
>> >  return 0;
>> >  }
>>
>> This changes from the usual way we format switch case comments to an
>> unusual way.
>>
>> If you want to pursue this change, please put it in a separate patch,
>> so this one is really about fixing "errors and warnings reported by
>> 'checkpatch.pl -f'", as your commit message promises.
>>
>>
>
> Hi, Markus. Thank you for your response.
>
> There must be some misunderstanding here:
>
> The line:
>
>case 3 /* ERL */:
>
> generates a checkpatch warning. I don't know why I would put it in a
> separate patch, if this patch is about fixing checkpatch warnings. Please
> explain.

You're right; I misread the line you patch as

 case 3: /* ERL */

> Secondly, I don't see that this is a usual way we format switch statement.
> I found just several cases in the whole QEMU code base (and you claimed in
> previous comments that there are thousands).
>
> I am just guessing that you somehow mixed this line with the line:
>
>case 3: /* ERL */
>
> that would have not generated checkpatch warning.

You guessed correctly.  Telling me right away that my remark doesn't
make sense to you would've helped :)

The pattern

case VALUE: /* comment on VALUE */

is common: >8000 instances.

The pattern

case VALUE /* comment on VALUE */:

is uncommon: <20 instances.  I agree with cleaning it up.

However, I find the common pattern applied here

case 3: /* ERL */
/* If EU is set, always unmapped */
if (eu) {
return 0;
}

more readable than the unusual (to my eyes)

case 3:
/*
 * ERL
 * If EU is set, always unmapped
 */
if (eu) {
return 0;
}

The first line of the comment applies to the value preceding it, the
second to the code following it.  Making these connections doesn't
exactly take genius, but neither is it effortless.

Nice and consistent coding style is all about reducing the effort of
reading code.

For what it's worth, the pattern

case VALUE: /* comment on VALUE */
/* comment on CODE */
CODE

occurs almost 300 times.

> I don't see any reason to change this patch. Please let me know it you
> still think I should do something else. And you are welcome to analyse any
> patches of mine.

Please consider keeping two separate comments, i.e. just move the colon
to its usual place.

Thanks!

[PATCH 1/2] migration: Boost SaveStateEntry.instance_id to 64 bits

2019-10-15 Thread Peter Xu

It was "int" and used as 32bits fields (see save_section_header()).
It's unsafe already because sizeof(int) could be 2 on i386, I think.
So at least uint32_t would suite more.  While it also uses "-1" as a
placeholder of "we want to generate the instance ID automatically".
Hence a more proper value should be int64_t.

This will start to be useful after next patch in which we can start to
convert a real uint32_t value as instance ID.

Signed-off-by: Peter Xu 
---
 include/migration/register.h |  2 +-
 include/migration/vmstate.h  |  4 ++--
 migration/savevm.c   | 10 +-
 stubs/vmstate.c  |  2 +-
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index a13359a08d..54f42c7413 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -69,7 +69,7 @@ typedef struct SaveVMHandlers {
 } SaveVMHandlers;
 
 int register_savevm_live(const char *idstr,
- int instance_id,
+ int64_t instance_id,
  int version_id,
  const SaveVMHandlers *ops,
  void *opaque);
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 1fbfd099dd..6a7498463c 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -1114,14 +1114,14 @@ int vmstate_save_state_v(QEMUFile *f, const 
VMStateDescription *vmsd,
 bool vmstate_save_needed(const VMStateDescription *vmsd, void *opaque);
 
 /* Returns: 0 on success, -1 on failure */
-int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
+int vmstate_register_with_alias_id(DeviceState *dev, int64_t instance_id,
const VMStateDescription *vmsd,
void *base, int alias_id,
int required_for_version,
Error **errp);
 
 /* Returns: 0 on success, -1 on failure */
-static inline int vmstate_register(DeviceState *dev, int instance_id,
+static inline int vmstate_register(DeviceState *dev, int64_t instance_id,
const VMStateDescription *vmsd,
void *opaque)
 {
diff --git a/migration/savevm.c b/migration/savevm.c
index bb9462a54d..dc9281c897 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -233,7 +233,7 @@ typedef struct CompatEntry {
 typedef struct SaveStateEntry {
 QTAILQ_ENTRY(SaveStateEntry) entry;
 char idstr[256];
-int instance_id;
+int64_t instance_id;
 int alias_id;
 int version_id;
 /* version id read from the stream */
@@ -668,7 +668,7 @@ void dump_vmstate_json_to_file(FILE *out_file)
 static int calculate_new_instance_id(const char *idstr)
 {
 SaveStateEntry *se;
-int instance_id = 0;
+int64_t instance_id = 0;
 
 QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
 if (strcmp(idstr, se->idstr) == 0
@@ -730,7 +730,7 @@ static void savevm_state_handler_insert(SaveStateEntry *nse)
Meanwhile pass -1 as instance_id if you do not already have a clearly
distinguishing id for all instances of your device class. */
 int register_savevm_live(const char *idstr,
- int instance_id,
+ int64_t instance_id,
  int version_id,
  const SaveVMHandlers *ops,
  void *opaque)
@@ -784,7 +784,7 @@ void unregister_savevm(DeviceState *dev, const char *idstr, 
void *opaque)
 }
 }
 
-int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
+int vmstate_register_with_alias_id(DeviceState *dev, int64_t instance_id,
const VMStateDescription *vmsd,
void *opaque, int alias_id,
int required_for_version,
@@ -1566,7 +1566,7 @@ int qemu_save_device_state(QEMUFile *f)
 return qemu_file_get_error(f);
 }
 
-static SaveStateEntry *find_se(const char *idstr, int instance_id)
+static SaveStateEntry *find_se(const char *idstr, int64_t instance_id)
 {
 SaveStateEntry *se;
 
diff --git a/stubs/vmstate.c b/stubs/vmstate.c
index e1e89b87f0..699003f3b0 100644
--- a/stubs/vmstate.c
+++ b/stubs/vmstate.c
@@ -4,7 +4,7 @@
 const VMStateDescription vmstate_dummy = {};
 
 int vmstate_register_with_alias_id(DeviceState *dev,
-   int instance_id,
+   int64_t instance_id,
const VMStateDescription *vmsd,
void *base, int alias_id,
int required_for_version,
-- 
2.21.0

[PATCH 0/2] apic: Fix migration breakage of >255 vcpus

2019-10-15 Thread Peter Xu

I'm not very certain, but... it seems to be broken starting from when
x2apic was introduced in QEMU, until now.

Please review, thanks.

Peter Xu (2):
  migration: Boost SaveStateEntry.instance_id to 64 bits
  apic: Use 32bit APIC ID for migration instance ID

 hw/intc/apic_common.c|  2 +-
 include/migration/register.h |  2 +-
 include/migration/vmstate.h  |  4 ++--
 migration/savevm.c   | 10 +-
 stubs/vmstate.c  |  2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

-- 
2.21.0

[PATCH 2/2] apic: Use 32bit APIC ID for migration instance ID

2019-10-15 Thread Peter Xu

Migration is silently broken now with x2apic config like this:

 -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
 -device intel-iommu,intremap=on,eim=on

After migration, the guest kernel could hang at anything, due to
x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
any operations related to x2apic could be broken then (e.g., RDMSR on
x2apic MSRs could fail because KVM would think that the vcpu hasn't
enabled x2apic at all).

The issue is that the x2apic bit was never applied correctly for vcpus
whose ID > 255 when migrate completes, and that's because when we
migrate APIC we use the APICCommonState.id as instance ID of the
migration stream, while that's too short for x2apic.

Let's use the newly introduced initial_apic_id for that.

Signed-off-by: Peter Xu 
---
 hw/intc/apic_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index aafd8e0e33..6024a3e06a 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -315,7 +315,7 @@ static void apic_common_realize(DeviceState *dev, Error 
**errp)
 APICCommonState *s = APIC_COMMON(dev);
 APICCommonClass *info;
 static DeviceState *vapic;
-int instance_id = s->id;
+int64_t instance_id = s->initial_apic_id;
 
 info = APIC_COMMON_GET_CLASS(s);
 info->realize(dev, errp);
-- 
2.21.0

Re: RFC: Why dont we move to newer capstone?

2019-10-15 Thread Daniel P . Berrangé

On Sat, Oct 05, 2019 at 02:33:34PM +0100, Peter Maydell wrote:
> On Sat, 5 Oct 2019 at 11:21, Lucien Murray-Pitts
>  wrote:
> > Whilst working on a m68k patch I noticed that the capstone in use
> > today (3.0) doesnt support the M68K and thus a hand turned disasm
> > function is used.
> >
> > The newer capstone (5.0) appears to support a few more CPU, inc. m68k.
> >
> > Why we move to this newer capstone?
> 
> Moving to a newer capstone sounds like a good idea. The only
> reason we haven't moved forward as far as I'm aware is that
> nobody has done the work to send a patch to do that move
> forward to the newer version. Richard Henderson would
> probably know if there was any other blocker.

Bearing in mind our distro support policy, we need to continue to
support 3.0 series of capstone for a while yet based on what I
see in various distros. eg Ubuntu 18.04 LTS has 3.0.4, as does
Fedora 29.  Version 4.0 is only in a few very new distros:

   https://repology.org/project/capstone/versions

We can of course use features from newer capstone, *provided* we correctly
do conditional compilation so that we can still build against 3.0 series
on distros that have that version.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 2/2] apic: Use 32bit APIC ID for migration instance ID

2019-10-15 Thread Juan Quintela

Peter Xu  wrote:
> Migration is silently broken now with x2apic config like this:
>
>  -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
>  -device intel-iommu,intremap=on,eim=on
>
> After migration, the guest kernel could hang at anything, due to
> x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> any operations related to x2apic could be broken then (e.g., RDMSR on
> x2apic MSRs could fail because KVM would think that the vcpu hasn't
> enabled x2apic at all).
>
> The issue is that the x2apic bit was never applied correctly for vcpus
> whose ID > 255 when migrate completes, and that's because when we
> migrate APIC we use the APICCommonState.id as instance ID of the
> migration stream, while that's too short for x2apic.
>
> Let's use the newly introduced initial_apic_id for that.
>
> Signed-off-by: Peter Xu 
> ---
>  hw/intc/apic_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> index aafd8e0e33..6024a3e06a 100644
> --- a/hw/intc/apic_common.c
> +++ b/hw/intc/apic_common.c
> @@ -315,7 +315,7 @@ static void apic_common_realize(DeviceState *dev, Error 
> **errp)
>  APICCommonState *s = APIC_COMMON(dev);
>  APICCommonClass *info;
>  static DeviceState *vapic;
> -int instance_id = s->id;
> +int64_t instance_id = s->initial_apic_id;

int is ok here.

But damn thing, initial_apic_id is uint32_t.  Sniff.

Later, Juan.

Re: [PATCH 1/2] migration: Boost SaveStateEntry.instance_id to 64 bits

2019-10-15 Thread Juan Quintela

Peter Xu  wrote:
> It was "int" and used as 32bits fields (see save_section_header()).
> It's unsafe already because sizeof(int) could be 2 on i386,

i386 is 32bits, so int is 32bits O:-)
I really hope that we would never, ever, need a 64bits instance id.
It would mean that we have more than 2.000.000.000 objects of the same
type, no?

I am pretty sure than in 16bits platforms we have other problems than
insntance_id (namely that we don't have enough memory).

>I think.
> So at least uint32_t would suite more.  While it also uses "-1" as a
> placeholder of "we want to generate the instance ID automatically".
> Hence a more proper value should be int64_t.
>
> This will start to be useful after next patch in which we can start to
> convert a real uint32_t value as instance ID.

Later, Juan.

[PATCH v3 0/2] RTC support for QEMU RISC-V virt machine

2019-10-15 Thread Anup Patel

This series adds RTC device to QEMU RISC-V virt machine. We have
selected Goldfish RTC device model for this. It's a pretty simple
synthetic device with few MMIO registers and no dependency external
clock. The driver for Goldfish RTC is already available in Linux so
we just need to enable it in Kconfig for RISCV and also update Linux
defconfigs.

We have tested this series with Linux-5.4-rc1 plus defconfig changes
available in 'goldfish_rtc_v2' branch of:
https://github.com/avpatel/linux.git

Changes since v2:
 - Rebased patches on recent RTC refactoring by Philippe Mathieu-Daud??
   (Refer, https://patchew.org/QEMU/20191003230404.19384-1-phi...@redhat.com/)

Changes since v1:
 - Implemented VMState save/restore for Goldfish RTC

Anup Patel (2):
  hw: timer: Add Goldfish RTC device
  riscv: virt: Use Goldfish RTC device

 hw/riscv/Kconfig|   1 +
 hw/riscv/virt.c |  15 ++
 hw/rtc/Kconfig  |   3 +
 hw/rtc/Makefile.objs|   1 +
 hw/rtc/goldfish_rtc.c   | 278 
 include/hw/riscv/virt.h |   2 +
 include/hw/timer/goldfish_rtc.h |  46 ++
 7 files changed, 346 insertions(+)
 create mode 100644 hw/rtc/goldfish_rtc.c
 create mode 100644 include/hw/timer/goldfish_rtc.h

--
2.17.1

[PATCH v3 1/2] hw: timer: Add Goldfish RTC device

2019-10-15 Thread Anup Patel

This patch adds model for Google Goldfish virtual platform RTC device.

We will be adding Goldfish RTC device to the QEMU RISC-V virt machine
for providing real date-time to Guest Linux. The corresponding Linux
driver for Goldfish RTC device is already available in upstream Linux.

For now, VM migration support is available but untested for Goldfish RTC
device. It will be hardened in-future when we implement VM migration for
KVM RISC-V.

Signed-off-by: Anup Patel 
---
 hw/rtc/Kconfig  |   3 +
 hw/rtc/Makefile.objs|   1 +
 hw/rtc/goldfish_rtc.c   | 278 
 include/hw/timer/goldfish_rtc.h |  46 ++
 4 files changed, 328 insertions(+)
 create mode 100644 hw/rtc/goldfish_rtc.c
 create mode 100644 include/hw/timer/goldfish_rtc.h

diff --git a/hw/rtc/Kconfig b/hw/rtc/Kconfig
index 45daa8d655..bafe6ac2c9 100644
--- a/hw/rtc/Kconfig
+++ b/hw/rtc/Kconfig
@@ -21,3 +21,6 @@ config MC146818RTC
 
 config SUN4V_RTC
 bool
+
+config GOLDFISH_RTC
+bool
diff --git a/hw/rtc/Makefile.objs b/hw/rtc/Makefile.objs
index 8dc9fcd3a9..aa208d0d10 100644
--- a/hw/rtc/Makefile.objs
+++ b/hw/rtc/Makefile.objs
@@ -11,3 +11,4 @@ common-obj-$(CONFIG_EXYNOS4) += exynos4210_rtc.o
 obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
 common-obj-$(CONFIG_ASPEED_SOC) += aspeed_rtc.o
+common-obj-$(CONFIG_GOLDFISH_RTC) += goldfish_rtc.o
diff --git a/hw/rtc/goldfish_rtc.c b/hw/rtc/goldfish_rtc.c
new file mode 100644
index 00..223616ed75
--- /dev/null
+++ b/hw/rtc/goldfish_rtc.c
@@ -0,0 +1,278 @@
+/*
+ * Goldfish virtual platform RTC
+ *
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * For more details on Google Goldfish virtual platform refer:
+ * 
https://android.googlesource.com/platform/external/qemu/+/master/docs/GOLDFISH-VIRTUAL-HARDWARE.TXT
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "hw/timer/goldfish_rtc.h"
+#include "migration/vmstate.h"
+#include "hw/irq.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "qemu/timer.h"
+#include "sysemu/sysemu.h"
+#include "qemu/cutils.h"
+#include "qemu/log.h"
+
+#define RTC_TIME_LOW0x00
+#define RTC_TIME_HIGH   0x04
+#define RTC_ALARM_LOW   0x08
+#define RTC_ALARM_HIGH  0x0c
+#define RTC_IRQ_ENABLED 0x10
+#define RTC_CLEAR_ALARM 0x14
+#define RTC_ALARM_STATUS0x18
+#define RTC_CLEAR_INTERRUPT 0x1c
+
+static void goldfish_rtc_update(GoldfishRTCState *s)
+{
+qemu_set_irq(s->irq, (s->irq_pending & s->irq_enabled) ? 1 : 0);
+}
+
+static void goldfish_rtc_interrupt(void *opaque)
+{
+GoldfishRTCState *s = (GoldfishRTCState *)opaque;
+
+s->alarm_running = 0;
+s->irq_pending = 1;
+goldfish_rtc_update(s);
+}
+
+static uint64_t goldfish_rtc_get_count(GoldfishRTCState *s)
+{
+return s->tick_offset + (uint64_t)qemu_clock_get_ns(rtc_clock);
+}
+
+static void goldfish_rtc_clear_alarm(GoldfishRTCState *s)
+{
+timer_del(s->timer);
+s->alarm_running = 0;
+}
+
+static void goldfish_rtc_set_alarm(GoldfishRTCState *s)
+{
+uint64_t ticks = goldfish_rtc_get_count(s);
+uint64_t event = s->alarm_next;
+
+if (event <= ticks) {
+goldfish_rtc_clear_alarm(s);
+goldfish_rtc_interrupt(s);
+} else {
+int64_t now = qemu_clock_get_ns(rtc_clock);
+timer_mod(s->timer, now + (event - ticks));
+s->alarm_running = 1;
+}
+}
+
+static uint64_t goldfish_rtc_read(void *opaque, hwaddr offset,
+  unsigned size)
+{
+GoldfishRTCState *s = (GoldfishRTCState *)opaque;
+uint64_t r;
+
+switch (offset) {
+case RTC_TIME_LOW:
+r = goldfish_rtc_get_count(s) & 0x;
+break;
+case RTC_TIME_HIGH:
+r = goldfish_rtc_get_count(s) >> 32;
+break;
+case RTC_ALARM_LOW:
+r = s->alarm_next & 0x;
+break;
+case RTC_ALARM_HIGH:
+r = s->alarm_next >> 32;
+break;
+case RTC_IRQ_ENABLED:
+r = s->irq_enabled;
+break;
+case RTC_ALARM_STATUS:
+r = s->alarm_running;
+break;
+default:
+qemu_log_mask(LOG_GUEST_ERROR,
+  "goldfish_rtc_read: Bad offset 0x%x\n", (int)offset);
+r = 0;
+break;
+}
+
+return

[PATCH v3 2/2] riscv: virt: Use Goldfish RTC device

2019-10-15 Thread Anup Patel

We extend QEMU RISC-V virt machine by adding Goldfish RTC device
to it. This will allow Guest Linux to sync it's local date/time
with Host date/time via RTC device.

Signed-off-by: Anup Patel 
---
 hw/riscv/Kconfig|  1 +
 hw/riscv/virt.c | 15 +++
 include/hw/riscv/virt.h |  2 ++
 3 files changed, 18 insertions(+)

diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
index fb19b2df3a..b33753c780 100644
--- a/hw/riscv/Kconfig
+++ b/hw/riscv/Kconfig
@@ -34,6 +34,7 @@ config RISCV_VIRT
 select PCI
 select HART
 select SERIAL
+select GOLDFISH_RTC
 select VIRTIO_MMIO
 select PCI_EXPRESS_GENERIC_BRIDGE
 select SIFIVE
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index d36f5625ec..95c42ab993 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -57,6 +57,7 @@ static const struct MemmapEntry {
 [VIRT_DEBUG] =   {0x0, 0x100 },
 [VIRT_MROM] ={ 0x1000,   0x11000 },
 [VIRT_TEST] ={   0x10,0x1000 },
+[VIRT_RTC] = {   0x101000,0x1000 },
 [VIRT_CLINT] =   {  0x200,   0x1 },
 [VIRT_PLIC] ={  0xc00, 0x400 },
 [VIRT_UART0] =   { 0x1000, 0x100 },
@@ -310,6 +311,17 @@ static void create_fdt(RISCVVirtState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
 qemu_fdt_setprop_cell(fdt, nodename, "interrupts", UART0_IRQ);
 
+nodename = g_strdup_printf("/rtc@%lx",
+(long)memmap[VIRT_RTC].base);
+qemu_fdt_add_subnode(fdt, nodename);
+qemu_fdt_setprop_string(fdt, nodename, "compatible",
+"google,goldfish-rtc");
+qemu_fdt_setprop_cells(fdt, nodename, "reg",
+0x0, memmap[VIRT_RTC].base,
+0x0, memmap[VIRT_RTC].size);
+qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
+qemu_fdt_setprop_cell(fdt, nodename, "interrupts", RTC_IRQ);
+
 qemu_fdt_add_subnode(fdt, "/chosen");
 qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", nodename);
 if (cmdline) {
@@ -496,6 +508,9 @@ static void riscv_virt_board_init(MachineState *machine)
 0, qdev_get_gpio_in(DEVICE(s->plic), UART0_IRQ), 399193,
 serial_hd(0), DEVICE_LITTLE_ENDIAN);
 
+sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
+qdev_get_gpio_in(DEVICE(s->plic), RTC_IRQ));
+
 g_free(plic_hart_config);
 }
 
diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
index 6e5fbe5d3b..e6423258d3 100644
--- a/include/hw/riscv/virt.h
+++ b/include/hw/riscv/virt.h
@@ -37,6 +37,7 @@ enum {
 VIRT_DEBUG,
 VIRT_MROM,
 VIRT_TEST,
+VIRT_RTC,
 VIRT_CLINT,
 VIRT_PLIC,
 VIRT_UART0,
@@ -49,6 +50,7 @@ enum {
 
 enum {
 UART0_IRQ = 10,
+RTC_IRQ = 11,
 VIRTIO_IRQ = 1, /* 1 to 8 */
 VIRTIO_COUNT = 8,
 PCIE_IRQ = 0x20, /* 32 to 35 */
-- 
2.17.1

Re: RFC: Why dont we move to newer capstone?

2019-10-15 Thread Thomas Huth

On 15/10/2019 10.27, Daniel P. Berrangé wrote:
> On Sat, Oct 05, 2019 at 02:33:34PM +0100, Peter Maydell wrote:
>> On Sat, 5 Oct 2019 at 11:21, Lucien Murray-Pitts
>>  wrote:
>>> Whilst working on a m68k patch I noticed that the capstone in use
>>> today (3.0) doesnt support the M68K and thus a hand turned disasm
>>> function is used.
>>>
>>> The newer capstone (5.0) appears to support a few more CPU, inc. m68k.
>>>
>>> Why we move to this newer capstone?
>>
>> Moving to a newer capstone sounds like a good idea. The only
>> reason we haven't moved forward as far as I'm aware is that
>> nobody has done the work to send a patch to do that move
>> forward to the newer version. Richard Henderson would
>> probably know if there was any other blocker.
> 
> Bearing in mind our distro support policy, we need to continue to
> support 3.0 series of capstone for a while yet based on what I
> see in various distros. eg Ubuntu 18.04 LTS has 3.0.4, as does
> Fedora 29.  Version 4.0 is only in a few very new distros:
> 
>https://repology.org/project/capstone/versions
> 
> We can of course use features from newer capstone, *provided* we correctly
> do conditional compilation so that we can still build against 3.0 series
> on distros that have that version.

We're embedding the capstone submodule in the release tarballs, so I
think we're independent from the distro release, aren't we? So this
should not be an issue, as far as I can see.

 Thomas

Re: [PATCH v4 2/3] target/riscv: Expose "priv" register for GDB for reads

2019-10-15 Thread Bin Meng

On Mon, Oct 14, 2019 at 11:53 PM Jonathan Behrens  wrote:
>
> This patch enables a debugger to read the current privilege level via a 
> virtual
> "priv" register. When compiled with CONFIG_USER_ONLY the register is still
> visible but always reports the value zero.
>
> Signed-off-by: Jonathan Behrens 
> ---
>  configure   |  4 ++--
>  gdb-xml/riscv-32bit-virtual.xml | 11 +++
>  gdb-xml/riscv-64bit-virtual.xml | 11 +++
>  target/riscv/gdbstub.c  | 23 +++
>  4 files changed, 47 insertions(+), 2 deletions(-)
>  create mode 100644 gdb-xml/riscv-32bit-virtual.xml
>  create mode 100644 gdb-xml/riscv-64bit-virtual.xml
>

Reviewed-by: Bin Meng 
Tested-by: Bin Meng

Re: [PULL 01/19] util/hbitmap: strict hbitmap_reset

2019-10-15 Thread Kevin Wolf

Am 14.10.2019 um 20:10 hat John Snow geschrieben:
> 
> 
> On 10/11/19 7:18 PM, John Snow wrote:
> > 
> > 
> > On 10/11/19 5:48 PM, Eric Blake wrote:
> >> On 10/11/19 4:25 PM, John Snow wrote:
> >>> From: Vladimir Sementsov-Ogievskiy 
> >>>
> >>> hbitmap_reset has an unobvious property: it rounds requested region up.
> >>> It may provoke bugs, like in recently fixed write-blocking mode of
> >>> mirror: user calls reset on unaligned region, not keeping in mind that
> >>> there are possible unrelated dirty bytes, covered by rounded-up region
> >>> and information of this unrelated "dirtiness" will be lost.
> >>>
> >>> Make hbitmap_reset strict: assert that arguments are aligned, allowing
> >>> only one exception when @start + @count == hb->orig_size. It's needed
> >>> to comfort users of hbitmap_next_dirty_area, which cares about
> >>> hb->orig_size.
> >>>
> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >>> Reviewed-by: Max Reitz 
> >>> Message-Id: <20190806152611.280389-1-vsement...@virtuozzo.com>
> >>> [Maintainer edit: Max's suggestions from on-list. --js]
> >>> Signed-off-by: John Snow 
> >>> ---
> >>>   include/qemu/hbitmap.h | 5 +
> >>>   tests/test-hbitmap.c   | 2 +-
> >>>   util/hbitmap.c | 4 
> >>>   3 files changed, 10 insertions(+), 1 deletion(-)
> >>>
> >>
> >>> +++ b/util/hbitmap.c
> >>> @@ -476,6 +476,10 @@ void hbitmap_reset(HBitmap *hb, uint64_t start,
> >>> uint64_t count)
> >>>   /* Compute range in the last layer.  */
> >>>   uint64_t first;
> >>>   uint64_t last = start + count - 1;
> >>> +    uint64_t gran = 1ULL << hb->granularity;
> >>> +
> >>> +    assert(!(start & (gran - 1)));
> >>> +    assert(!(count & (gran - 1)) || (start + count == hb->orig_size));
> >>
> >> I know I'm replying a bit late (since this is now a pull request), but
> >> would it be worth using the dedicated macro:
> >>
> >> assert(QEMU_IS_ALIGNED(start, gran));
> >> assert(QEMU_IS_ALIGNED(count, gran) || start + count == hb->orig_size);
> >>
> >> instead of open-coding it?  (I would also drop the extra () around the
> >> right half of ||). If we want it, that would now be a followup patch.
> 
> I've noticed that seasoned C programmers hate extra parentheses a lot.
> I've noticed that I cannot remember operator precedence enough to ever
> feel like this is actually an improvement.
> 
> Something about a nice weighted tree of ((expr1) || (expr2)) feels
> soothing to my weary eyes. So, if it's not terribly important, I'd
> prefer to leave it as-is.

I don't mind the parentheses, but I do prefer QEMU_IS_ALIGNED() to the
open-coded version. Would that be a viable compromise?

Kevin

Re: [PATCH 1/2] migration: Boost SaveStateEntry.instance_id to 64 bits

2019-10-15 Thread Juan Quintela

Peter Xu  wrote:
> It was "int" and used as 32bits fields (see save_section_header()).
> It's unsafe already because sizeof(int) could be 2 on i386, I think.
> So at least uint32_t would suite more.  While it also uses "-1" as a
> placeholder of "we want to generate the instance ID automatically".
> Hence a more proper value should be int64_t.
>
> This will start to be useful after next patch in which we can start to
> convert a real uint32_t value as instance ID.
>
> Signed-off-by: Peter Xu 

Hi

Being more helpful,  I think that it is better to just:

* change instance_id to be an uint32_t (notice that for all architectures
  that we support, it is actually int32_t).

* export calculate_new_instance_id() and adjust callers that use -1.

or

* export a new function that just use the calculate_new_instance_id()

A fast search shows:

10 callers of vmstate_register() with -1
1 caller of vmstate_register_with_alias_id with -1 (but it is the one
  that sets all qdev devices).
1 caller of vmstate_register_with_alias_id in apic, where it can be -1.
1 caller of register_savevm_live() with -1 (spapr)

And call it a day?

What do you think, Juan.

> ---
>  include/migration/register.h |  2 +-
>  include/migration/vmstate.h  |  4 ++--
>  migration/savevm.c   | 10 +-
>  stubs/vmstate.c  |  2 +-
>  4 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/include/migration/register.h b/include/migration/register.h
> index a13359a08d..54f42c7413 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -69,7 +69,7 @@ typedef struct SaveVMHandlers {
>  } SaveVMHandlers;
>  
>  int register_savevm_live(const char *idstr,
> - int instance_id,
> + int64_t instance_id,
>   int version_id,
>   const SaveVMHandlers *ops,
>   void *opaque);
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 1fbfd099dd..6a7498463c 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -1114,14 +1114,14 @@ int vmstate_save_state_v(QEMUFile *f, const 
> VMStateDescription *vmsd,
>  bool vmstate_save_needed(const VMStateDescription *vmsd, void *opaque);
>  
>  /* Returns: 0 on success, -1 on failure */
> -int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
> +int vmstate_register_with_alias_id(DeviceState *dev, int64_t instance_id,
> const VMStateDescription *vmsd,
> void *base, int alias_id,
> int required_for_version,
> Error **errp);
>  
>  /* Returns: 0 on success, -1 on failure */
> -static inline int vmstate_register(DeviceState *dev, int instance_id,
> +static inline int vmstate_register(DeviceState *dev, int64_t instance_id,
> const VMStateDescription *vmsd,
> void *opaque)
>  {
> diff --git a/migration/savevm.c b/migration/savevm.c
> index bb9462a54d..dc9281c897 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -233,7 +233,7 @@ typedef struct CompatEntry {
>  typedef struct SaveStateEntry {
>  QTAILQ_ENTRY(SaveStateEntry) entry;
>  char idstr[256];
> -int instance_id;
> +int64_t instance_id;
>  int alias_id;
>  int version_id;
>  /* version id read from the stream */
> @@ -668,7 +668,7 @@ void dump_vmstate_json_to_file(FILE *out_file)
>  static int calculate_new_instance_id(const char *idstr)
>  {
>  SaveStateEntry *se;
> -int instance_id = 0;
> +int64_t instance_id = 0;
>  
>  QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>  if (strcmp(idstr, se->idstr) == 0
> @@ -730,7 +730,7 @@ static void savevm_state_handler_insert(SaveStateEntry 
> *nse)
> Meanwhile pass -1 as instance_id if you do not already have a clearly
> distinguishing id for all instances of your device class. */
>  int register_savevm_live(const char *idstr,
> - int instance_id,
> + int64_t instance_id,
>   int version_id,
>   const SaveVMHandlers *ops,
>   void *opaque)
> @@ -784,7 +784,7 @@ void unregister_savevm(DeviceState *dev, const char 
> *idstr, void *opaque)
>  }
>  }
>  
> -int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
> +int vmstate_register_with_alias_id(DeviceState *dev, int64_t instance_id,
> const VMStateDescription *vmsd,
> void *opaque, int alias_id,
> int required_for_version,
> @@ -1566,7 +1566,7 @@ int qemu_save_device_state(QEMUFile *f)
>  return qemu_file_get_error(f);
>  }
>  
> -static SaveStateEntry *find_se(const char *idstr, int instance_id)
>

Re: [PULL 1/2] trace: add --group=all to tracing.txt

2019-10-15 Thread Stefan Hajnoczi

On Mon, Oct 14, 2019 at 11:08:25AM +0200, Philippe Mathieu-Daudé wrote:
> Hi Stefan,
> 
> On 10/14/19 10:57 AM, Stefan Hajnoczi wrote:
> > tracetool needs to know the group name ("all", "root", or a specific
> > subdirectory).  Also remove the stdin redirection because tracetool.py
> > needs the path to the trace-events file.  Update the documentation.
> > 
> > Fixes: 2098c56a9bc5901e145fa5d4759f075808811685
> > ("trace: move setting of group name into Makefiles")
> > Launchpad: https://bugs.launchpad.net/bugs/1844814
> 
> Sorry I didn't noticed that earlier, but on 
> https://wiki.qemu.org/Contribute/SubmitAPatch#Write_a_meaningful_commit_message
> we recommend using the 'Buglink' tag.
> Not sure it's worth resending another pull request...

Sure, it hasn't been merged yet so I can send a v2.

Stefan


signature.asc
Description: PGP signature

Re: RFC: Why dont we move to newer capstone?

2019-10-15 Thread Daniel P . Berrangé

On Tue, Oct 15, 2019 at 10:36:40AM +0200, Thomas Huth wrote:
> On 15/10/2019 10.27, Daniel P. Berrangé wrote:
> > On Sat, Oct 05, 2019 at 02:33:34PM +0100, Peter Maydell wrote:
> >> On Sat, 5 Oct 2019 at 11:21, Lucien Murray-Pitts
> >>  wrote:
> >>> Whilst working on a m68k patch I noticed that the capstone in use
> >>> today (3.0) doesnt support the M68K and thus a hand turned disasm
> >>> function is used.
> >>>
> >>> The newer capstone (5.0) appears to support a few more CPU, inc. m68k.
> >>>
> >>> Why we move to this newer capstone?
> >>
> >> Moving to a newer capstone sounds like a good idea. The only
> >> reason we haven't moved forward as far as I'm aware is that
> >> nobody has done the work to send a patch to do that move
> >> forward to the newer version. Richard Henderson would
> >> probably know if there was any other blocker.
> > 
> > Bearing in mind our distro support policy, we need to continue to
> > support 3.0 series of capstone for a while yet based on what I
> > see in various distros. eg Ubuntu 18.04 LTS has 3.0.4, as does
> > Fedora 29.  Version 4.0 is only in a few very new distros:
> > 
> >https://repology.org/project/capstone/versions
> > 
> > We can of course use features from newer capstone, *provided* we correctly
> > do conditional compilation so that we can still build against 3.0 series
> > on distros that have that version.
> 
> We're embedding the capstone submodule in the release tarballs, so I
> think we're independent from the distro release, aren't we? So this
> should not be an issue, as far as I can see.

It is an issue for people/distros who don't want to building with bundled
3rd party code.

I'd suggest it is probably time we could drop the capstone git submodule.
We originally added it because capstone wasn't widely present in distros
we care about. AFAICT, it is now present in all the distros, so could be
treated the same way as any other 3rd party library dep we have.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PULL v2 0/2] Tracing patches

2019-10-15 Thread Stefan Hajnoczi

The following changes since commit 98b2e3c9ab3abfe476a2b02f8f51813edb90e72d:

  Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into 
staging (2019-10-08 16:08:35 +0100)

are available in the Git repository at:

  https://github.com/stefanha/qemu.git tags/tracing-pull-request

for you to fetch changes up to 403e11edbfad5da2e6d5842adc9222f60e76ee43:

  trace: avoid "is" with a literal Python 3.8 warnings (2019-10-15 09:47:16 
+0100)


Pull request

v2:
 * Replaced "Launchpad:" tag with "Buglink:" as documented on the SubmitAPatch 
wiki page [Philippe]



Stefan Hajnoczi (2):
  trace: add --group=all to tracing.txt
  trace: avoid "is" with a literal Python 3.8 warnings

 docs/devel/tracing.txt| 3 ++-
 scripts/tracetool/__init__.py | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

-- 
2.21.0

[PULL v2 2/2] trace: avoid "is" with a literal Python 3.8 warnings

2019-10-15 Thread Stefan Hajnoczi

The following statement produces a SyntaxWarning with Python 3.8:

  if len(format) is 0:
  scripts/tracetool/__init__.py:459: SyntaxWarning: "is" with a literal. Did 
you mean "=="?

Use the conventional len(x) == 0 syntax instead.

Reported-by: Daniel P. Berrangé 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20191010122154.10553-1-stefa...@redhat.com>
Signed-off-by: Stefan Hajnoczi 
---
 scripts/tracetool/__init__.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 04279fa62e..44c118bc2a 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -456,12 +456,12 @@ def generate(events, group, format, backends,
 import tracetool
 
 format = str(format)
-if len(format) is 0:
+if len(format) == 0:
 raise TracetoolError("format not set")
 if not tracetool.format.exists(format):
 raise TracetoolError("unknown format: %s" % format)
 
-if len(backends) is 0:
+if len(backends) == 0:
 raise TracetoolError("no backends specified")
 for backend in backends:
 if not tracetool.backend.exists(backend):
-- 
2.21.0

Re: [PULL 1/1] test-bdrv-drain: fix iothread_join() hang

2019-10-15 Thread Stefan Hajnoczi

On Mon, Oct 14, 2019 at 01:11:41PM +0200, Paolo Bonzini wrote:
> On 14/10/19 10:52, Stefan Hajnoczi wrote:
> > tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run():
> > 
> >   while (!atomic_read(&iothread->stopping)) {
> >   aio_poll(iothread->ctx, true);
> >   }
> > 
> > The iothread_join() function works as follows:
> > 
> >   void iothread_join(IOThread *iothread)
> >   {
> >   iothread->stopping = true;
> >   aio_notify(iothread->ctx);
> >   qemu_thread_join(&iothread->thread);
> > 
> > If iothread_run() checks iothread->stopping before the iothread_join()
> > thread sets stopping to true, then aio_notify() may be optimized away
> > and iothread_run() hangs forever in aio_poll().
> > 
> > The correct way to change iothread->stopping is from a BH that executes
> > within iothread_run().  This ensures that iothread->stopping is checked
> > after we set it to true.
> > 
> > This was already fixed for ./iothread.c (note this is a different source
> > file!) by commit 2362a28ea11c145e1a13ae79342d76dc118a72a6 ("iothread:
> > fix iothread_stop() race condition"), but not for tests/iothread.c.
> 
> Aha, I did have some kind of dejavu when sending the patch I have just
> sent; let's see if this also fixes the test-aio-multithread assertion
> failure.
> 
> Note that with this change the atomic read of iothread->stopping can go
> away; I can send a separate patch later.

Yes, I thought about the atomic_read() later as well.

Stefan


signature.asc
Description: PGP signature

[PULL v2 1/2] trace: add --group=all to tracing.txt

2019-10-15 Thread Stefan Hajnoczi

tracetool needs to know the group name ("all", "root", or a specific
subdirectory).  Also remove the stdin redirection because tracetool.py
needs the path to the trace-events file.  Update the documentation.

Fixes: 2098c56a9bc5901e145fa5d4759f075808811685
   ("trace: move setting of group name into Makefiles")
Buglink: https://bugs.launchpad.net/bugs/1844814
Reported-by: Philippe Mathieu-Daudé 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20191009135154.10970-1-stefa...@redhat.com>
---
 docs/devel/tracing.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/devel/tracing.txt b/docs/devel/tracing.txt
index 8231bbf5d1..8c0376fefa 100644
--- a/docs/devel/tracing.txt
+++ b/docs/devel/tracing.txt
@@ -317,7 +317,8 @@ probes:
  --binary path/to/qemu-binary \
  --target-type system \
  --target-name x86_64 \
- qemu.stp
+ --group=all \
+ trace-events-all >qemu.stp
 
 To facilitate simple usage of systemtap where there merely needs to be printf
 logging of certain probes, a helper script "qemu-trace-stap" is provided.
-- 
2.21.0

Re: [PATCH 1/2] migration: Boost SaveStateEntry.instance_id to 64 bits

2019-10-15 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Peter Xu  wrote:
> > It was "int" and used as 32bits fields (see save_section_header()).
> > It's unsafe already because sizeof(int) could be 2 on i386, I think.
> > So at least uint32_t would suite more.  While it also uses "-1" as a
> > placeholder of "we want to generate the instance ID automatically".
> > Hence a more proper value should be int64_t.
> >
> > This will start to be useful after next patch in which we can start to
> > convert a real uint32_t value as instance ID.
> >
> > Signed-off-by: Peter Xu 
> 
> Hi
> 
> Being more helpful,  I think that it is better to just:
> 
> * change instance_id to be an uint32_t (notice that for all architectures
>   that we support, it is actually int32_t).
> 
> * export calculate_new_instance_id() and adjust callers that use -1.
> 
> or
> 
> * export a new function that just use the calculate_new_instance_id()

Do you mean that we end up with two functions, one that does it
automatically, and one that takes an ID?

Dave

> A fast search shows:
> 
> 10 callers of vmstate_register() with -1
> 1 caller of vmstate_register_with_alias_id with -1 (but it is the one
>   that sets all qdev devices).
> 1 caller of vmstate_register_with_alias_id in apic, where it can be -1.
> 1 caller of register_savevm_live() with -1 (spapr)
> 
> And call it a day?
> 
> What do you think, Juan.

> 
> > ---
> >  include/migration/register.h |  2 +-
> >  include/migration/vmstate.h  |  4 ++--
> >  migration/savevm.c   | 10 +-
> >  stubs/vmstate.c  |  2 +-
> >  4 files changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/migration/register.h b/include/migration/register.h
> > index a13359a08d..54f42c7413 100644
> > --- a/include/migration/register.h
> > +++ b/include/migration/register.h
> > @@ -69,7 +69,7 @@ typedef struct SaveVMHandlers {
> >  } SaveVMHandlers;
> >  
> >  int register_savevm_live(const char *idstr,
> > - int instance_id,
> > + int64_t instance_id,
> >   int version_id,
> >   const SaveVMHandlers *ops,
> >   void *opaque);
> > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > index 1fbfd099dd..6a7498463c 100644
> > --- a/include/migration/vmstate.h
> > +++ b/include/migration/vmstate.h
> > @@ -1114,14 +1114,14 @@ int vmstate_save_state_v(QEMUFile *f, const 
> > VMStateDescription *vmsd,
> >  bool vmstate_save_needed(const VMStateDescription *vmsd, void *opaque);
> >  
> >  /* Returns: 0 on success, -1 on failure */
> > -int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
> > +int vmstate_register_with_alias_id(DeviceState *dev, int64_t instance_id,
> > const VMStateDescription *vmsd,
> > void *base, int alias_id,
> > int required_for_version,
> > Error **errp);
> >  
> >  /* Returns: 0 on success, -1 on failure */
> > -static inline int vmstate_register(DeviceState *dev, int instance_id,
> > +static inline int vmstate_register(DeviceState *dev, int64_t instance_id,
> > const VMStateDescription *vmsd,
> > void *opaque)
> >  {
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index bb9462a54d..dc9281c897 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -233,7 +233,7 @@ typedef struct CompatEntry {
> >  typedef struct SaveStateEntry {
> >  QTAILQ_ENTRY(SaveStateEntry) entry;
> >  char idstr[256];
> > -int instance_id;
> > +int64_t instance_id;
> >  int alias_id;
> >  int version_id;
> >  /* version id read from the stream */
> > @@ -668,7 +668,7 @@ void dump_vmstate_json_to_file(FILE *out_file)
> >  static int calculate_new_instance_id(const char *idstr)
> >  {
> >  SaveStateEntry *se;
> > -int instance_id = 0;
> > +int64_t instance_id = 0;
> >  
> >  QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> >  if (strcmp(idstr, se->idstr) == 0
> > @@ -730,7 +730,7 @@ static void savevm_state_handler_insert(SaveStateEntry 
> > *nse)
> > Meanwhile pass -1 as instance_id if you do not already have a clearly
> > distinguishing id for all instances of your device class. */
> >  int register_savevm_live(const char *idstr,
> > - int instance_id,
> > + int64_t instance_id,
> >   int version_id,
> >   const SaveVMHandlers *ops,
> >   void *opaque)
> > @@ -784,7 +784,7 @@ void unregister_savevm(DeviceState *dev, const char 
> > *idstr, void *opaque)
> >  }
> >  }
> >  
> > -int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
> > +int vmstate_register_with_alias_id(DeviceState *dev, int64_t ins

Re: RFC: Why dont we move to newer capstone?

2019-10-15 Thread Marc-André Lureau

Hi

On Tue, Oct 15, 2019 at 10:48 AM Daniel P. Berrangé  wrote:
>
> On Tue, Oct 15, 2019 at 10:36:40AM +0200, Thomas Huth wrote:
> > On 15/10/2019 10.27, Daniel P. Berrangé wrote:
> > > On Sat, Oct 05, 2019 at 02:33:34PM +0100, Peter Maydell wrote:
> > >> On Sat, 5 Oct 2019 at 11:21, Lucien Murray-Pitts
> > >>  wrote:
> > >>> Whilst working on a m68k patch I noticed that the capstone in use
> > >>> today (3.0) doesnt support the M68K and thus a hand turned disasm
> > >>> function is used.
> > >>>
> > >>> The newer capstone (5.0) appears to support a few more CPU, inc. m68k.
> > >>>
> > >>> Why we move to this newer capstone?
> > >>
> > >> Moving to a newer capstone sounds like a good idea. The only
> > >> reason we haven't moved forward as far as I'm aware is that
> > >> nobody has done the work to send a patch to do that move
> > >> forward to the newer version. Richard Henderson would
> > >> probably know if there was any other blocker.
> > >
> > > Bearing in mind our distro support policy, we need to continue to
> > > support 3.0 series of capstone for a while yet based on what I
> > > see in various distros. eg Ubuntu 18.04 LTS has 3.0.4, as does
> > > Fedora 29.  Version 4.0 is only in a few very new distros:
> > >
> > >https://repology.org/project/capstone/versions
> > >
> > > We can of course use features from newer capstone, *provided* we correctly
> > > do conditional compilation so that we can still build against 3.0 series
> > > on distros that have that version.
> >
> > We're embedding the capstone submodule in the release tarballs, so I
> > think we're independent from the distro release, aren't we? So this
> > should not be an issue, as far as I can see.
>
> It is an issue for people/distros who don't want to building with bundled
> 3rd party code.
>
> I'd suggest it is probably time we could drop the capstone git submodule.
> We originally added it because capstone wasn't widely present in distros
> we care about. AFAICT, it is now present in all the distros, so could be
> treated the same way as any other 3rd party library dep we have.

I suppose the same applies to dtc (1.4.2 required by qemu, but xenial
has 1.4.0... so we have to wait until April 26, 2020? 18.04 LTS
release date + 2y).

libslirp will take even longer.

Re: RFC: Why dont we move to newer capstone?

2019-10-15 Thread Daniel P . Berrangé

On Tue, Oct 15, 2019 at 11:02:43AM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Oct 15, 2019 at 10:48 AM Daniel P. Berrangé  
> wrote:
> >
> > On Tue, Oct 15, 2019 at 10:36:40AM +0200, Thomas Huth wrote:
> > > On 15/10/2019 10.27, Daniel P. Berrangé wrote:
> > > > On Sat, Oct 05, 2019 at 02:33:34PM +0100, Peter Maydell wrote:
> > > >> On Sat, 5 Oct 2019 at 11:21, Lucien Murray-Pitts
> > > >>  wrote:
> > > >>> Whilst working on a m68k patch I noticed that the capstone in use
> > > >>> today (3.0) doesnt support the M68K and thus a hand turned disasm
> > > >>> function is used.
> > > >>>
> > > >>> The newer capstone (5.0) appears to support a few more CPU, inc. m68k.
> > > >>>
> > > >>> Why we move to this newer capstone?
> > > >>
> > > >> Moving to a newer capstone sounds like a good idea. The only
> > > >> reason we haven't moved forward as far as I'm aware is that
> > > >> nobody has done the work to send a patch to do that move
> > > >> forward to the newer version. Richard Henderson would
> > > >> probably know if there was any other blocker.
> > > >
> > > > Bearing in mind our distro support policy, we need to continue to
> > > > support 3.0 series of capstone for a while yet based on what I
> > > > see in various distros. eg Ubuntu 18.04 LTS has 3.0.4, as does
> > > > Fedora 29.  Version 4.0 is only in a few very new distros:
> > > >
> > > >https://repology.org/project/capstone/versions
> > > >
> > > > We can of course use features from newer capstone, *provided* we 
> > > > correctly
> > > > do conditional compilation so that we can still build against 3.0 series
> > > > on distros that have that version.
> > >
> > > We're embedding the capstone submodule in the release tarballs, so I
> > > think we're independent from the distro release, aren't we? So this
> > > should not be an issue, as far as I can see.
> >
> > It is an issue for people/distros who don't want to building with bundled
> > 3rd party code.
> >
> > I'd suggest it is probably time we could drop the capstone git submodule.
> > We originally added it because capstone wasn't widely present in distros
> > we care about. AFAICT, it is now present in all the distros, so could be
> > treated the same way as any other 3rd party library dep we have.
> 
> I suppose the same applies to dtc (1.4.2 required by qemu, but xenial
> has 1.4.0... so we have to wait until April 26, 2020? 18.04 LTS
> release date + 2y).

Possibly - depends on scope of changes between 1.4.0 & 1.4.2 - maybe it
is easy to conditionally support 1.4.0 too.

> libslirp will take even longer.

This is reasonable as a git submodule for a while yet, since it only
existed as a separate project very recently, so isn't widely available
across distros / OS.

IMHO the key point is that submodules bundling 3rd party libraries [1]
should be viewed as something with a limited lifetime. A temporary
hack until distros have the library widely available, rather than
something which continues forever.

Regards,
Daniel

[1] We have other types of submodule.

The keycodemapdb which is not a library, rather a static database
from which we auto-generate code to statically link in.

The firmware submodules which developers don't actually build from
normally. Ideally these would go into a separate dist tarball but
we seem stalled on this idea despite discussing it many times.
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH v7 0/3] 9p: Fix file ID collisions

2019-10-15 Thread Greg Kurz

On Tue, 08 Oct 2019 14:05:28 +0200
Christian Schoenebeck  wrote:

> 
> I wonder though whether virtio-fs suffers from the same file ID collisions 
> problem when sharing multiple file systems.
> 

I gave a try and it seems that virtio-fs might expose the inode numbers from
different devices in the host, unvirtualized AND with the same device in the
guest:

# mkdir -p /var/tmp/virtio-fs/proc
# mount --bind /proc /var/tmp/virtio-fs/proc
# virtiofsd -o vhost_user_socket=/tmp/vhostqemu -o source=/var/tmp/virtio-fs -o 
cache=always

and then started QEMU with:

-chardev socket,id=char0,path=/tmp/vhostqemu \
-device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs \
-m 4G -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
-numa node,memdev=mem

In the host:

$ stat /var/tmp/virtio-fs
  File: /var/tmp/virtio-fs
  Size: 4096Blocks: 8  IO Block: 4096   directory
Device: fd00h/64768dInode: 787796  Links: 4
Access: (0775/drwxrwxr-x)  Uid: ( 1000/greg)   Gid: ( 1000/greg)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2019-10-15 11:08:52.070080922 +0200
Modify: 2019-10-15 11:02:09.887404446 +0200
Change: 2019-10-15 11:02:09.887404446 +0200
 Birth: 2019-10-13 19:13:04.009699354 +0200
[greg@bahia ~]$ stat /var/tmp/virtio-fs/FOO
  File: /var/tmp/virtio-fs/FOO
  Size: 0   Blocks: 0  IO Block: 4096   regular empty file
Device: fd00h/64768dInode: 790740  Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/greg)   Gid: ( 1000/greg)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2019-10-15 11:02:09.888404448 +0200
Modify: 2019-10-15 11:02:09.888404448 +0200
Change: 2019-10-15 11:02:09.888404448 +0200
 Birth: 2019-10-15 11:02:09.887404446 +0200
[greg@bahia ~]$ stat /var/tmp/virtio-fs/proc/fs
  File: /var/tmp/virtio-fs/proc/fs
  Size: 0   Blocks: 0  IO Block: 1024   directory
Device: 4h/4d   Inode: 4026531845  Links: 5
Access: (0555/dr-xr-xr-x)  Uid: (0/root)   Gid: (0/root)
Context: system_u:object_r:proc_t:s0
Access: 2019-10-01 14:50:09.223233901 +0200
Modify: 2019-10-01 14:50:09.223233901 +0200
Change: 2019-10-01 14:50:09.223233901 +0200
 Birth: -

In the guest:

[greg@localhost ~]$ stat /mnt
  File: /mnt
  Size: 4096Blocks: 8  IO Block: 4096   directory
Device: 2dh/45d Inode: 787796  Links: 4
Access: (0775/drwxrwxr-x)  Uid: ( 1000/greg)   Gid: ( 1000/greg)
Context: system_u:object_r:unlabeled_t:s0
Access: 2019-10-15 11:08:52.070080922 +0200
Modify: 2019-10-15 11:02:09.887404446 +0200
Change: 2019-10-15 11:02:09.887404446 +0200
 Birth: -
[greg@localhost ~]$ stat /mnt/FOO
  File: /mnt/FOO
  Size: 0   Blocks: 0  IO Block: 4096   regular empty file
Device: 2dh/45d Inode: 790740  Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/greg)   Gid: ( 1000/greg)
Context: system_u:object_r:unlabeled_t:s0
Access: 2019-10-15 11:02:09.888404448 +0200
Modify: 2019-10-15 11:02:09.888404448 +0200
Change: 2019-10-15 11:02:09.888404448 +0200
 Birth: -
[greg@localhost ~]$ stat /mnt/proc/fs
  File: /mnt/proc/fs
  Size: 0   Blocks: 0  IO Block: 1024   directory
Device: 2dh/45d Inode: 4026531845  Links: 5
Access: (0555/dr-xr-xr-x)  Uid: (0/root)   Gid: (0/root)
Context: system_u:object_r:unlabeled_t:s0
Access: 2019-10-01 14:50:09.223233901 +0200
Modify: 2019-10-01 14:50:09.223233901 +0200
Change: 2019-10-01 14:50:09.223233901 +0200
 Birth: -

Unless I'm missing something, it seems that "virtio-fs" has the same
issue we had on 9pfs before Christian's patches... :-\

--
Greg

Re: [PATCH 2/2] apic: Use 32bit APIC ID for migration instance ID

2019-10-15 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> Migration is silently broken now with x2apic config like this:
> 
>  -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
>  -device intel-iommu,intremap=on,eim=on
> 
> After migration, the guest kernel could hang at anything, due to
> x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> any operations related to x2apic could be broken then (e.g., RDMSR on
> x2apic MSRs could fail because KVM would think that the vcpu hasn't
> enabled x2apic at all).
> 
> The issue is that the x2apic bit was never applied correctly for vcpus
> whose ID > 255 when migrate completes, and that's because when we
> migrate APIC we use the APICCommonState.id as instance ID of the
> migration stream, while that's too short for x2apic.
> 
> Let's use the newly introduced initial_apic_id for that.

I'd like to understand a few things:
   a) Does this change the instance ID of existing APICs on the
migration stream? 
 a1) Ever for <256 CPUs?
 a2) For >=256 CPUs?

[Because changing the ID breaks migration]

  b) Is the instance ID constant - I can see it's a property on the
 APIC, but I cna't see who sets it

  c) In the case where it fails, did we end up registering two
 devices with the same name and instance ID?  If so, is it worth
 adding a check that would error if we tried?

Dave

> 
> Signed-off-by: Peter Xu 
> ---
>  hw/intc/apic_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> index aafd8e0e33..6024a3e06a 100644
> --- a/hw/intc/apic_common.c
> +++ b/hw/intc/apic_common.c
> @@ -315,7 +315,7 @@ static void apic_common_realize(DeviceState *dev, Error 
> **errp)
>  APICCommonState *s = APIC_COMMON(dev);
>  APICCommonClass *info;
>  static DeviceState *vapic;
> -int instance_id = s->id;
> +int64_t instance_id = s->initial_apic_id;
>  
>  info = APIC_COMMON_GET_CLASS(s);
>  info->realize(dev, errp);
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PULL v2 0/2] Tracing patches

2019-10-15 Thread Philippe Mathieu-Daudé


On 10/15/19 10:49 AM, Stefan Hajnoczi wrote:

The following changes since commit 98b2e3c9ab3abfe476a2b02f8f51813edb90e72d:

   Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into 
staging (2019-10-08 16:08:35 +0100)

are available in the Git repository at:

   https://github.com/stefanha/qemu.git tags/tracing-pull-request

for you to fetch changes up to 403e11edbfad5da2e6d5842adc9222f60e76ee43:

   trace: avoid "is" with a literal Python 3.8 warnings (2019-10-15 09:47:16 
+0100)


Pull request

v2:
  * Replaced "Launchpad:" tag with "Buglink:" as documented on the SubmitAPatch 
wiki page [Philippe]


Thanks Stefan for this updated pullreq!

Re: [PATCH v3 07/10] migration: add new migration state wait-unplug

2019-10-15 Thread Jens Freimann


On Fri, Oct 11, 2019 at 06:11:33PM +0100, Dr. David Alan Gilbert wrote:

* Jens Freimann (jfreim...@redhat.com) wrote:

This patch adds a new migration state called wait-unplug.  It is entered
after the SETUP state and will transition into ACTIVE once all devices
were succesfully unplugged from the guest.

So if a guest doesn't respond or takes long to honor the unplug request
the user will see the migration state 'wait-unplug'.

In the migration thread we query failover devices if they're are still
pending the guest unplug. When all are unplugged the migration
continues. We give it a defined number of iterations including small
waiting periods before we proceed.

Signed-off-by: Jens Freimann 

[..]

@@ -3260,6 +3271,27 @@ static void *migration_thread(void *opaque)

 qemu_savevm_state_setup(s->to_dst_file);

+migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+  MIGRATION_STATUS_WAIT_UNPLUG);


I think I'd prefer if you only went into this state if you had any
devices that were going to need unplugging.


Sure, that makes sense. I'll change it.


+while (i < FAILOVER_UNPLUG_RETRIES &&
+   s->state == MIGRATION_STATUS_WAIT_UNPLUG) {
+i++;
+qemu_sem_timedwait(&s->wait_unplug_sem, FAILOVER_GUEST_UNPLUG_WAIT);
+all_unplugged = qemu_savevm_state_guest_unplug_pending();
+if (all_unplugged) {
+break;
+}
+}
+
+if (all_unplugged) {
+migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG,
+MIGRATION_STATUS_ACTIVE);
+} else {
+migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG,
+  MIGRATION_STATUS_CANCELLING);
+}


I think you can get rid of both the timeout and the count and just make
sure that migrate_cancel works at this point.


I see, I need to add the new state to migration_is_setup_or_active() or
a cancel won't work.  


This pushes the problem up a layer, which I think is fine.


Seems good to me. To be clear, you're saying I should just poll on
the device unplugged state? Like

while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
   !qemu_savevm_state_guest_unplug_pending()) {
_/* This block intentionally left blank */
}

regards,
Jens

Re: [PATCH v5 1/9] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-10-15 Thread Beata Michalska

On Tue, 1 Oct 2019 at 14:04, Andrew Jones  wrote:
>
> Add support for the query-cpu-model-expansion QMP command to Arm. We
> do this selectively, only exposing CPU properties which represent
> optional CPU features which the user may want to enable/disable.
> Additionally we restrict the list of queryable cpu models to 'max',
> 'host', or the current type when KVM is in use. And, finally, we only
> implement expansion type 'full', as Arm does not yet have a "base"
> CPU type. More details and example queries are described in a new
> document (docs/arm-cpu-features.rst).
>
> Note, certainly more features may be added to the list of advertised
> features, e.g. 'vfp' and 'neon'. The only requirement is that we can
> detect invalid configurations and emit failures at QMP query time.
> For 'vfp' and 'neon' this will require some refactoring to share a
> validation function between the QMP query and the CPU realize
> functions.
>
> Signed-off-by: Andrew Jones 
> Reviewed-by: Richard Henderson 
> Reviewed-by: Eric Auger 
> ---
>  docs/arm-cpu-features.rst | 137 +++
>  qapi/machine-target.json  |   6 +-
>  target/arm/monitor.c  | 145 ++
>  3 files changed, 285 insertions(+), 3 deletions(-)
>  create mode 100644 docs/arm-cpu-features.rst
>
> diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
> new file mode 100644
> index ..c79dcffb5556
> --- /dev/null
> +++ b/docs/arm-cpu-features.rst
> @@ -0,0 +1,137 @@
> +
> +ARM CPU Features
> +
> +
> +Examples of probing and using ARM CPU features
> +
> +Introduction
> +
> +
> +CPU features are optional features that a CPU of supporting type may
> +choose to implement or not.  In QEMU, optional CPU features have
> +corresponding boolean CPU proprieties that, when enabled, indicate
> +that the feature is implemented, and, conversely, when disabled,
> +indicate that it is not implemented. An example of an ARM CPU feature
> +is the Performance Monitoring Unit (PMU).  CPU types such as the
> +Cortex-A15 and the Cortex-A57, which respectively implement ARM
> +architecture reference manuals ARMv7-A and ARMv8-A, may both optionally
> +implement PMUs.  For example, if a user wants to use a Cortex-A15 without
> +a PMU, then the `-cpu` parameter should contain `pmu=off` on the QEMU
> +command line, i.e. `-cpu cortex-a15,pmu=off`.
> +
> +As not all CPU types support all optional CPU features, then whether or
> +not a CPU property exists depends on the CPU type.  For example, CPUs
> +that implement the ARMv8-A architecture reference manual may optionally
> +support the AArch32 CPU feature, which may be enabled by disabling the
> +`aarch64` CPU property.  A CPU type such as the Cortex-A15, which does
> +not implement ARMv8-A, will not have the `aarch64` CPU property.
> +
> +QEMU's support may be limited for some CPU features, only partially
> +supporting the feature or only supporting the feature under certain
> +configurations.  For example, the `aarch64` CPU feature, which, when
> +disabled, enables the optional AArch32 CPU feature, is only supported
> +when using the KVM accelerator and when running on a host CPU type that
> +supports the feature.
> +
> +CPU Feature Probing
> +===
> +
> +Determining which CPU features are available and functional for a given
> +CPU type is possible with the `query-cpu-model-expansion` QMP command.
> +Below are some examples where `scripts/qmp/qmp-shell` (see the top comment
> +block in the script for usage) is used to issue the QMP commands.
> +
> +(1) Determine which CPU features are available for the `max` CPU type
> +(Note, we started QEMU with qemu-system-aarch64, so `max` is
> + implementing the ARMv8-A reference manual in this case)::
> +
> +  (QEMU) query-cpu-model-expansion type=full model={"name":"max"}
> +  { "return": {
> +"model": { "name": "max", "props": {
> +"pmu": true, "aarch64": true
> +  
> +
> +We see that the `max` CPU type has the `pmu` and `aarch64` CPU features.
> +We also see that the CPU features are enabled, as they are all `true`.
> +
> +(2) Let's try to disable the PMU::
> +
> +  (QEMU) query-cpu-model-expansion type=full 
> model={"name":"max","props":{"pmu":false}}
> +  { "return": {
> +"model": { "name": "max", "props": {
> +"pmu": false, "aarch64": true
> +  
> +
> +We see it worked, as `pmu` is now `false`.
> +
> +(3) Let's try to disable `aarch64`, which enables the AArch32 CPU feature::
> +
> +  (QEMU) query-cpu-model-expansion type=full 
> model={"name":"max","props":{"aarch64":false}}
> +  {"error": {
> +   "class": "GenericError", "desc":
> +   "'aarch64' feature cannot be disabled unless KVM is enabled and 
> 32-bit EL1 is supported"
> +  }}
> +
> +It looks like this feature is limited to a configuration we do not
> +currently have.
> +
> +(4) Let's try probing CPU f

Re: [PULL 0/1] qemu-openbios queue 20191012

2019-10-15 Thread Peter Maydell

On Sat, 12 Oct 2019 at 11:24, Mark Cave-Ayland
 wrote:
>
> The following changes since commit 98b2e3c9ab3abfe476a2b02f8f51813edb90e72d:
>
>   Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' 
> into staging (2019-10-08 16:08:35 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/mcayland/qemu.git tags/qemu-openbios-20191012
>
> for you to fetch changes up to 25bf1811cffc2772fedaa9345026cb5375ae11b4:
>
>   Update OpenBIOS images to f28e16f9 built from submodule. (2019-10-12 
> 10:18:18 +0100)
>
> 
> qemu-openbios queue
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM

Re: [PATCH] tcg/arm: Expand epilogue inline

2019-10-15 Thread Philippe Mathieu-Daudé


Hi Richard,

On 10/15/19 3:29 AM, Richard Henderson wrote:

It is, after all, just two instructions.

Profiling on a cortex-a15, using -d nochain to increase the number
of exit_tb that are executed, shows a minor improvement of 0.5%.

Signed-off-by: Richard Henderson 
---
  tcg/arm/tcg-target.inc.c | 32 +---
  1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 94d80d79d1..2a9ebfe25a 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1745,24 +1745,18 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is64)
  #endif
  }
  
-static tcg_insn_unit *tb_ret_addr;

+static void tcg_out_epilogue(TCGContext *s);
  
-static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,

-const TCGArg *args, const int *const_args)
+static void tcg_out_op(TCGContext *s, TCGOpcode opc,
+   const TCGArg *args, const int *const_args)
  {
  TCGArg a0, a1, a2, a3, a4, a5;
  int c;
  
  switch (opc) {

  case INDEX_op_exit_tb:
-/* Reuse the zeroing that exists for goto_ptr.  */
-a0 = args[0];
-if (a0 == 0) {
-tcg_out_goto(s, COND_AL, s->code_gen_epilogue);
-} else {
-tcg_out_movi32(s, COND_AL, TCG_REG_R0, args[0]);
-tcg_out_goto(s, COND_AL, tb_ret_addr);
-}
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, args[0]);
+tcg_out_epilogue(s);
  break;
  case INDEX_op_goto_tb:
  {
@@ -2284,19 +2278,17 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int 
count)
+ TCG_TARGET_STACK_ALIGN - 1) \
   & -TCG_TARGET_STACK_ALIGN)
  
+#define STACK_ADDEND  (FRAME_SIZE - PUSH_SIZE)

+
  static void tcg_target_qemu_prologue(TCGContext *s)
  {
-int stack_addend;
-
  /* Calling convention requires us to save r4-r11 and lr.  */
  /* stmdb sp!, { r4 - r11, lr } */
  tcg_out32(s, (COND_AL << 28) | 0x092d4ff0);
  
  /* Reserve callee argument and tcg temp space.  */

-stack_addend = FRAME_SIZE - PUSH_SIZE;
-
  tcg_out_dat_rI(s, COND_AL, ARITH_SUB, TCG_REG_CALL_STACK,
-   TCG_REG_CALL_STACK, stack_addend, 1);
+   TCG_REG_CALL_STACK, STACK_ADDEND, 1);
  tcg_set_frame(s, TCG_REG_CALL_STACK, TCG_STATIC_CALL_ARGS_SIZE,
CPU_TEMP_BUF_NLONGS * sizeof(long));
  
@@ -2310,11 +2302,13 @@ static void tcg_target_qemu_prologue(TCGContext *s)

   */
  s->code_gen_epilogue = s->code_ptr;
  tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, 0);
+tcg_out_epilogue(s);
+}
  
-/* TB epilogue */

-tb_ret_addr = s->code_ptr;
+static void tcg_out_epilogue(TCGContext *s)


Do you mind splitting this patch in 2?
First use tcg_out_epilogue(), then optimize tcg_out_op().


+{
  tcg_out_dat_rI(s, COND_AL, ARITH_ADD, TCG_REG_CALL_STACK,
-   TCG_REG_CALL_STACK, stack_addend, 1);
+   TCG_REG_CALL_STACK, STACK_ADDEND, 1);
  
  /* ldmia sp!, { r4 - r11, pc } */

  tcg_out32(s, (COND_AL << 28) | 0x08bd8ff0);

Re: RFC: Why dont we move to newer capstone?

2019-10-15 Thread Peter Maydell

On Tue, 15 Oct 2019 at 10:14, Daniel P. Berrangé  wrote:
>
> On Tue, Oct 15, 2019 at 11:02:43AM +0200, Marc-André Lureau wrote:
> > I suppose the same applies to dtc (1.4.2 required by qemu, but xenial
> > has 1.4.0... so we have to wait until April 26, 2020? 18.04 LTS
> > release date + 2y).
>
> Possibly - depends on scope of changes between 1.4.0 & 1.4.2 - maybe it
> is easy to conditionally support 1.4.0 too.

We need fdt_first_subnode() and fdt_next_subnode() which only
came in in 1.4.2.

thanks
-- PMM

Re: RFC: Why dont we move to newer capstone?

2019-10-15 Thread Daniel P . Berrangé

On Tue, Oct 15, 2019 at 10:57:44AM +0100, Peter Maydell wrote:
> On Tue, 15 Oct 2019 at 10:14, Daniel P. Berrangé  wrote:
> >
> > On Tue, Oct 15, 2019 at 11:02:43AM +0200, Marc-André Lureau wrote:
> > > I suppose the same applies to dtc (1.4.2 required by qemu, but xenial
> > > has 1.4.0... so we have to wait until April 26, 2020? 18.04 LTS
> > > release date + 2y).
> >
> > Possibly - depends on scope of changes between 1.4.0 & 1.4.2 - maybe it
> > is easy to conditionally support 1.4.0 too.
> 
> We need fdt_first_subnode() and fdt_next_subnode() which only
> came in in 1.4.2.

Looks like those are just shims around fdt_next_node() which existed
in previous releases already, just to make code a little cleaner:

  commit 4e76ec796c90d44d417f82d9db2d67cfe575f8ed
  Author: Simon Glass 
  Date:   Fri Apr 26 05:43:31 2013 -0700

libfdt: Add fdt_next_subnode() to permit easy subnode iteration

Iterating through subnodes with libfdt is a little painful to write as we
need something like this:

for (depth = 0, count = 0,
offset = fdt_next_node(fdt, parent_offset, &depth);
 (offset >= 0) && (depth > 0);
 offset = fdt_next_node(fdt, offset, &depth)) {
if (depth == 1) {
/* code body */
}
}

Using fdt_next_subnode() we can instead write this, which is shorter and
easier to get right:

for (offset = fdt_first_subnode(fdt, parent_offset);
 offset >= 0;
 offset = fdt_next_subnode(fdt, offset)) {
/* code body */
}

Also, it doesn't require two levels of indentation for the loop body.


so I think we could indeed do conditional compilation where we provide a
local impl of fdt_first|next_subnode if we see older dtc present.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 2/2] apic: Use 32bit APIC ID for migration instance ID

2019-10-15 Thread Peter Xu

On Tue, Oct 15, 2019 at 10:22:18AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > Migration is silently broken now with x2apic config like this:
> > 
> >  -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
> >  -device intel-iommu,intremap=on,eim=on
> > 
> > After migration, the guest kernel could hang at anything, due to
> > x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> > any operations related to x2apic could be broken then (e.g., RDMSR on
> > x2apic MSRs could fail because KVM would think that the vcpu hasn't
> > enabled x2apic at all).
> > 
> > The issue is that the x2apic bit was never applied correctly for vcpus
> > whose ID > 255 when migrate completes, and that's because when we
> > migrate APIC we use the APICCommonState.id as instance ID of the
> > migration stream, while that's too short for x2apic.
> > 
> > Let's use the newly introduced initial_apic_id for that.
> 
> I'd like to understand a few things:
>a) Does this change the instance ID of existing APICs on the
> migration stream? 
>  a1) Ever for <256 CPUs?

No.

>  a2) For >=256 CPUs?

Yes.

> 
> [Because changing the ID breaks migration]

But if we don't change it, the stream is broken too. :)

Then the destination VM will receive e.g. two apic_id==0 instances (I
think the apic_id==256 instance will wrongly overwrite the apic_id==0
one), while the vcpu with apic_id==256 will use the initial apic
values.

So IMHO we should still fix this, even if it changes the migration
stream.  At least we start to make it right.

> 
>   b) Is the instance ID constant - I can see it's a property on the
>  APIC, but I cna't see who sets it

For each vcpu, I think yes it should be a constant as long as the
topology is the same.  This is how I understand it to be set:

(1) In pc_cpus_init(), we init these:

possible_cpus = mc->possible_cpu_arch_ids(ms);
for (i = 0; i < ms->smp.cpus; i++) {
pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal);
}

(2) In x86_cpu_apic_create(), we apply the apic_id to "id" property:

qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id);

> 
>   c) In the case where it fails, did we end up registering two
>  devices with the same name and instance ID?  If so, is it worth
>  adding a check that would error if we tried?

Sounds doable.

Thanks,

-- 
Peter Xu

Re: [PATCH 1/2] migration: Boost SaveStateEntry.instance_id to 64 bits

2019-10-15 Thread Peter Xu

On Tue, Oct 15, 2019 at 10:45:53AM +0200, Juan Quintela wrote:
> Peter Xu  wrote:
> > It was "int" and used as 32bits fields (see save_section_header()).
> > It's unsafe already because sizeof(int) could be 2 on i386, I think.
> > So at least uint32_t would suite more.  While it also uses "-1" as a
> > placeholder of "we want to generate the instance ID automatically".
> > Hence a more proper value should be int64_t.
> >
> > This will start to be useful after next patch in which we can start to
> > convert a real uint32_t value as instance ID.
> >
> > Signed-off-by: Peter Xu 
> 
> Hi
> 
> Being more helpful,  I think that it is better to just:
> 
> * change instance_id to be an uint32_t (notice that for all architectures
>   that we support, it is actually int32_t).
> 
> * export calculate_new_instance_id() and adjust callers that use -1.
> 
> or
> 
> * export a new function that just use the calculate_new_instance_id()
> 
> A fast search shows:
> 
> 10 callers of vmstate_register() with -1
> 1 caller of vmstate_register_with_alias_id with -1 (but it is the one
>   that sets all qdev devices).
> 1 caller of vmstate_register_with_alias_id in apic, where it can be -1.
> 1 caller of register_savevm_live() with -1 (spapr)
> 
> And call it a day?
> 
> What do you think, Juan.

Sure, I can switch instance_id to uint32_t and add a new flag to both
functions (register_savevm_live, vmstate_register_with_alias_id)

Regards,

-- 
Peter Xu

Re: [PATCH 1/2] migration: Boost SaveStateEntry.instance_id to 64 bits

2019-10-15 Thread Peter Xu

On Tue, Oct 15, 2019 at 10:34:57AM +0200, Juan Quintela wrote:
> Peter Xu  wrote:
> > It was "int" and used as 32bits fields (see save_section_header()).
> > It's unsafe already because sizeof(int) could be 2 on i386,
> 
> i386 is 32bits, so int is 32bits O:-)

Right it should be 16 bits systems.  And yes I don't think we need to
consider that! :)

-- 
Peter Xu

[PATCH v2 06/20] nvme: add support for the abort command

2019-10-15 Thread Klaus Jensen

Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
Section 5.1 ("Abort command").

The Abort command is a best effort command; for now, the device always
fails to abort the given command.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index daa2367b0863..84e4f2ea7a15 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -741,6 +741,18 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
 }
 }
 
+static uint16_t nvme_abort(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+{
+uint16_t sqid = le32_to_cpu(cmd->cdw10) & 0x;
+
+req->cqe.result = 1;
+if (nvme_check_sqid(n, sqid)) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+return NVME_SUCCESS;
+}
+
 static inline void nvme_set_timestamp(NvmeCtrl *n, uint64_t ts)
 {
 trace_nvme_setfeat_timestamp(ts);
@@ -859,6 +871,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 trace_nvme_err_invalid_setfeat(dw10);
 return NVME_INVALID_FIELD | NVME_DNR;
 }
+
 return NVME_SUCCESS;
 }
 
@@ -875,6 +888,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 return nvme_create_cq(n, cmd);
 case NVME_ADM_CMD_IDENTIFY:
 return nvme_identify(n, cmd);
+case NVME_ADM_CMD_ABORT:
+return nvme_abort(n, cmd, req);
 case NVME_ADM_CMD_SET_FEATURES:
 return nvme_set_feature(n, cmd, req);
 case NVME_ADM_CMD_GET_FEATURES:
@@ -1388,6 +1403,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 id->ieee[2] = 0xb3;
 id->ver = cpu_to_le32(0x00010201);
 id->oacs = cpu_to_le16(0);
+id->acl = 3;
 id->frmw = 7 << 1;
 id->lpa = 1 << 0;
 id->sqes = (0x6 << 4) | 0x6;
-- 
2.23.0

[PATCH v2 01/20] nvme: remove superfluous breaks

2019-10-15 Thread Klaus Jensen

These break statements was left over when commit 3036a626e9ef ("nvme:
add Get/Set Feature Timestamp support") was merged.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 12d825425016..c06e3ca31905 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -788,7 +788,6 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 break;
 case NVME_TIMESTAMP:
 return nvme_get_feature_timestamp(n, cmd);
-break;
 default:
 trace_nvme_err_invalid_getfeat(dw10);
 return NVME_INVALID_FIELD | NVME_DNR;
@@ -832,11 +831,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 req->cqe.result =
 cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
 break;
-
 case NVME_TIMESTAMP:
 return nvme_set_feature_timestamp(n, cmd);
-break;
-
 default:
 trace_nvme_err_invalid_setfeat(dw10);
 return NVME_INVALID_FIELD | NVME_DNR;
-- 
2.23.0

[PATCH v2 04/20] nvme: populate the mandatory subnqn and ver fields

2019-10-15 Thread Klaus Jensen

Required for compliance with NVMe revision 1.2.1 or later. See NVM
Express 1.2.1, Section 5.11 ("Identify command"), Figure 90 and Section
7.9 ("NVMe Qualified Names").

This also bumps the supported version to 1.2.1.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 277700fdcc58..16f0fba10b08 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -9,9 +9,9 @@
  */
 
 /**
- * Reference Specs: http://www.nvmexpress.org, 1.2, 1.1, 1.0e
+ * Reference Specification: NVM Express 1.2.1
  *
- *  http://www.nvmexpress.org/resources/
+ *   https://nvmexpress.org/resources/specifications/
  */
 
 /**
@@ -1366,6 +1366,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 id->ieee[0] = 0x00;
 id->ieee[1] = 0x02;
 id->ieee[2] = 0xb3;
+id->ver = cpu_to_le32(0x00010201);
 id->oacs = cpu_to_le16(0);
 id->frmw = 7 << 1;
 id->lpa = 1 << 0;
@@ -1373,6 +1374,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 id->cqes = (0x4 << 4) | 0x4;
 id->nn = cpu_to_le32(n->num_namespaces);
 id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
+
+strcpy((char *) id->subnqn, "nqn.2019-08.org.qemu:");
+pstrcat((char *) id->subnqn, sizeof(id->subnqn), n->params.serial);
+
 id->psd[0].mp = cpu_to_le16(0x9c4);
 id->psd[0].enlat = cpu_to_le32(0x10);
 id->psd[0].exlat = cpu_to_le32(0x4);
@@ -1387,7 +1392,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 NVME_CAP_SET_CSS(n->bar.cap, 1);
 NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
 
-n->bar.vs = 0x00010200;
+n->bar.vs = 0x00010201;
 n->bar.intmc = n->bar.intms = 0;
 
 if (n->params.cmb_size_mb) {
-- 
2.23.0

[PATCH v2 05/20] nvme: allow completion queues in the cmb

2019-10-15 Thread Klaus Jensen

Allow completion queues in the controller memory buffer.

This also inlines the nvme_addr_{read,write} functions and adds an
nvme_addr_is_cmb helper.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 16f0fba10b08..daa2367b0863 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -52,14 +52,34 @@
 
 static void nvme_process_sq(void *opaque);
 
-static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
+static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
 {
-if (n->cmbsz && addr >= n->ctrl_mem.addr &&
-addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size))) {
-memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
-} else {
-pci_dma_read(&n->parent_obj, addr, buf, size);
+hwaddr low = n->ctrl_mem.addr;
+hwaddr hi  = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size);
+
+return addr >= low && addr < hi;
+}
+
+static inline void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf,
+int size)
+{
+if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
+memcpy(buf, (void *) &n->cmbuf[addr - n->ctrl_mem.addr], size);
+return;
 }
+
+pci_dma_read(&n->parent_obj, addr, buf, size);
+}
+
+static inline void nvme_addr_write(NvmeCtrl *n, hwaddr addr, void *buf,
+int size)
+{
+if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
+memcpy((void *) &n->cmbuf[addr - n->ctrl_mem.addr], buf, size);
+return;
+}
+
+pci_dma_write(&n->parent_obj, addr, buf, size);
 }
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
@@ -281,6 +301,7 @@ static void nvme_post_cqes(void *opaque)
 
 QTAILQ_FOREACH_SAFE(req, &cq->req_list, entry, next) {
 NvmeSQueue *sq;
+NvmeCqe *cqe = &req->cqe;
 hwaddr addr;
 
 if (nvme_cq_full(cq)) {
@@ -294,8 +315,7 @@ static void nvme_post_cqes(void *opaque)
 req->cqe.sq_head = cpu_to_le16(sq->head);
 addr = cq->dma_addr + cq->tail * n->cqe_size;
 nvme_inc_cq_tail(cq);
-pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
-sizeof(req->cqe));
+nvme_addr_write(n, addr, (void *) cqe, sizeof(*cqe));
 QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
 }
 if (cq->tail != cq->head) {
@@ -1401,7 +1421,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
 
 NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
-NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
+NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 1);
 NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
 NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
 NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
-- 
2.23.0

[PATCH v2 08/20] nvme: add support for the get log page command

2019-10-15 Thread Klaus Jensen

Add support for the Get Log Page command and basic implementations
of the mandatory Error Information, SMART/Health Information and
Firmware Slot Information log pages.

In violation of the specification, the SMART/Health Information log page
does not persist information over the lifetime of the controller because
the device has no place to store such persistent state.

Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
Section 5.10 ("Get Log Page command").

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c   | 150 +-
 hw/block/nvme.h   |   9 ++-
 hw/block/trace-events |   2 +
 include/block/nvme.h  |   2 +-
 4 files changed, 160 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 1fdb3b8655ed..4412a3bea3bc 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -44,6 +44,7 @@
 #include "nvme.h"
 
 #define NVME_MAX_QS PCI_MSIX_FLAGS_QSIZE
+#define NVME_TEMPERATURE 0x143
 
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
@@ -577,6 +578,137 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
 return NVME_SUCCESS;
 }
 
+static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd,
+uint32_t buf_len, uint64_t off, NvmeRequest *req)
+{
+uint32_t trans_len;
+uint64_t prp1 = le64_to_cpu(cmd->prp1);
+uint64_t prp2 = le64_to_cpu(cmd->prp2);
+
+if (off > sizeof(*n->elpes) * (n->params.elpe + 1)) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+trans_len = MIN(sizeof(*n->elpes) * (n->params.elpe + 1) - off, buf_len);
+
+return nvme_dma_read_prp(n, (uint8_t *) n->elpes + off, trans_len, prp1,
+prp2);
+}
+
+static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
+uint64_t off, NvmeRequest *req)
+{
+uint64_t prp1 = le64_to_cpu(cmd->prp1);
+uint64_t prp2 = le64_to_cpu(cmd->prp2);
+uint32_t nsid = le32_to_cpu(cmd->nsid);
+
+uint32_t trans_len;
+time_t current_ms;
+uint64_t units_read = 0, units_written = 0, read_commands = 0,
+write_commands = 0;
+NvmeSmartLog smart;
+BlockAcctStats *s;
+
+if (!nsid || (nsid != 0x && nsid > n->num_namespaces)) {
+trace_nvme_err_invalid_ns(nsid, n->num_namespaces);
+return NVME_INVALID_NSID | NVME_DNR;
+}
+
+s = blk_get_stats(n->conf.blk);
+
+units_read = s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
+units_written = s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
+read_commands = s->nr_ops[BLOCK_ACCT_READ];
+write_commands = s->nr_ops[BLOCK_ACCT_WRITE];
+
+if (off > sizeof(smart)) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+trans_len = MIN(sizeof(smart) - off, buf_len);
+
+memset(&smart, 0x0, sizeof(smart));
+
+smart.data_units_read[0] = cpu_to_le64(units_read / 1000);
+smart.data_units_written[0] = cpu_to_le64(units_written / 1000);
+smart.host_read_commands[0] = cpu_to_le64(read_commands);
+smart.host_write_commands[0] = cpu_to_le64(write_commands);
+
+smart.number_of_error_log_entries[0] = cpu_to_le64(0);
+smart.temperature[0] = n->temperature & 0xff;
+smart.temperature[1] = (n->temperature >> 8) & 0xff;
+
+if (n->features.temp_thresh <= n->temperature) {
+smart.critical_warning |= NVME_SMART_TEMPERATURE;
+}
+
+current_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
+smart.power_on_hours[0] = cpu_to_le64(
+(((current_ms - n->starttime_ms) / 1000) / 60) / 60);
+
+return nvme_dma_read_prp(n, (uint8_t *) &smart + off, trans_len, prp1,
+prp2);
+}
+
+static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
+uint64_t off, NvmeRequest *req)
+{
+uint32_t trans_len;
+uint64_t prp1 = le64_to_cpu(cmd->prp1);
+uint64_t prp2 = le64_to_cpu(cmd->prp2);
+NvmeFwSlotInfoLog fw_log;
+
+if (off > sizeof(fw_log)) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+memset(&fw_log, 0, sizeof(NvmeFwSlotInfoLog));
+
+trans_len = MIN(sizeof(fw_log) - off, buf_len);
+
+return nvme_dma_read_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1,
+prp2);
+}
+
+static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+{
+uint32_t dw10 = le32_to_cpu(cmd->cdw10);
+uint32_t dw11 = le32_to_cpu(cmd->cdw11);
+uint32_t dw12 = le32_to_cpu(cmd->cdw12);
+uint32_t dw13 = le32_to_cpu(cmd->cdw13);
+uint16_t lid = dw10 & 0xff;
+uint8_t  rae = (dw10 >> 15) & 0x1;
+uint32_t numdl, numdu;
+uint64_t off, lpol, lpou;
+size_t   len;
+
+numdl = (dw10 >> 16);
+numdu = (dw11 & 0x);
+lpol = dw12;
+lpou = dw13;
+
+len = (((numdu << 16) | numdl) + 1) << 2;
+off = (lpou << 32ULL) | lpol;
+
+if (off & 0x3) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+trace_nvme_get_log(req->cid, lid, rae, len, off);
+
+switch (lid) {
+case NVME_LOG_ERROR_INFO:
+return nvme_error_info(n, cmd, len, off,

[PATCH v2 03/20] nvme: add missing fields in the identify controller data structure

2019-10-15 Thread Klaus Jensen

Not used by the device model but added for completeness. See NVM Express
1.2.1, Section 5.11 ("Identify command"), Figure 90.

Signed-off-by: Klaus Jensen 
---
 include/block/nvme.h | 34 +-
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 3ec8efcc435e..1b0accd4fe2b 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -543,7 +543,13 @@ typedef struct NvmeIdCtrl {
 uint8_t ieee[3];
 uint8_t cmic;
 uint8_t mdts;
-uint8_t rsvd255[178];
+uint16_tcntlid;
+uint32_tver;
+uint16_trtd3r;
+uint32_trtd3e;
+uint32_toaes;
+uint32_tctratt;
+uint8_t rsvd255[156];
 uint16_toacs;
 uint8_t acl;
 uint8_t aerl;
@@ -551,10 +557,22 @@ typedef struct NvmeIdCtrl {
 uint8_t lpa;
 uint8_t elpe;
 uint8_t npss;
-uint8_t rsvd511[248];
+uint8_t avscc;
+uint8_t apsta;
+uint16_twctemp;
+uint16_tcctemp;
+uint16_tmtfa;
+uint32_thmpre;
+uint32_thmmin;
+uint8_t tnvmcap[16];
+uint8_t unvmcap[16];
+uint32_trpmbs;
+uint8_t rsvd319[4];
+uint16_tkas;
+uint8_t rsvd511[190];
 uint8_t sqes;
 uint8_t cqes;
-uint16_trsvd515;
+uint16_tmaxcmd;
 uint32_tnn;
 uint16_toncs;
 uint16_tfuses;
@@ -562,8 +580,14 @@ typedef struct NvmeIdCtrl {
 uint8_t vwc;
 uint16_tawun;
 uint16_tawupf;
-uint8_t rsvd703[174];
-uint8_t rsvd2047[1344];
+uint8_t nvscc;
+uint8_t rsvd531;
+uint16_tacwu;
+uint16_trsvd535;
+uint32_tsgls;
+uint8_t rsvd767[228];
+uint8_t subnqn[256];
+uint8_t rsvd2047[1024];
 NvmePSD psd[32];
 uint8_t vs[1024];
 } NvmeIdCtrl;
-- 
2.23.0

[PATCH v2 00/20] nvme: support NVMe v1.3d, SGLs and multiple namespaces

2019-10-15 Thread Klaus Jensen

Hi,

(Quick note to Fam): most of this series is irrelevant to you as the
maintainer of the nvme block driver, but patch "nvme: add support for
scatter gather lists" touches block/nvme.c due to changes in the shared
NvmeCmd struct.

Anyway, v2 comes with a good bunch of changes. Compared to v1[1], I have
squashed some commits in the beginning of the series and heavily
refactored "nvme: support multiple block requests per request" into the
new commit "nvme: allow multiple aios per command".

I have also removed the original implementation of the Abort command
(commit "nvme: add support for the abort command") as it is currently
too tricky to test reliably. It has been replaced by a stub that,
besides a trivial sanity check, just fails to abort the given command.
*Some* implementation of the Abort command is mandatory, but given the
"best effort" nature of the command this is acceptable for now. When the
device gains support for arbitration it should be less tricky to test.

The support for multiple namespaces is now backwards compatible. The
nvme device still accepts a 'drive' parameter, but for multiple
namespaces the use of 'nvme-ns' devices are required. I also integrated
some feedback from Paul so the device supports non-consecutive namespace
ids.

I have also added some new commits at the end:

  - "nvme: bump controller pci device id" makes sure the Linux kernel
doesn't apply any quirks to the controller that it no longer has.
  - "nvme: handle dma errors" won't actually do anything before this[2]
fix to include/hw/pci/pci.h is merged. With these two patches added,
the device reliably passes some additional nasty tests from blktests
(block/011 "disable PCI device while doing I/O" and block/019 "break
PCI link device while doing I/O"). Before this patch, block/011
would pass from time to time if you were lucky, but would at least
mess up the controller pretty badly, causing a reset in the best
case.


  [1]: https://patchwork.kernel.org/project/qemu-devel/list/?series=142383
  [2]: https://patchwork.kernel.org/patch/11184911/


Klaus Jensen (20):
  nvme: remove superfluous breaks
  nvme: move device parameters to separate struct
  nvme: add missing fields in the identify controller data structure
  nvme: populate the mandatory subnqn and ver fields
  nvme: allow completion queues in the cmb
  nvme: add support for the abort command
  nvme: refactor device realization
  nvme: add support for the get log page command
  nvme: add support for the asynchronous event request command
  nvme: add logging to error information log page
  nvme: add missing mandatory features
  nvme: bump supported specification version to 1.3
  nvme: refactor prp mapping
  nvme: allow multiple aios per command
  nvme: add support for scatter gather lists
  nvme: support multiple namespaces
  nvme: bump controller pci device id
  nvme: remove redundant NvmeCmd pointer parameter
  nvme: make lba data size configurable
  nvme: handle dma errors

 block/nvme.c   |   18 +-
 hw/block/Makefile.objs |2 +-
 hw/block/nvme-ns.c |  139 +++
 hw/block/nvme-ns.h |   60 ++
 hw/block/nvme.c| 1863 +---
 hw/block/nvme.h|  219 -
 hw/block/trace-events  |   37 +-
 include/block/nvme.h   |  132 ++-
 8 files changed, 2094 insertions(+), 376 deletions(-)
 create mode 100644 hw/block/nvme-ns.c
 create mode 100644 hw/block/nvme-ns.h

-- 
2.23.0

[PATCH v2 07/20] nvme: refactor device realization

2019-10-15 Thread Klaus Jensen

This patch splits up nvme_realize into multiple individual functions,
each initializing a different subset of the device.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 176 +++-
 hw/block/nvme.h |  22 ++
 2 files changed, 135 insertions(+), 63 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 84e4f2ea7a15..1fdb3b8655ed 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -43,6 +43,8 @@
 #include "trace.h"
 #include "nvme.h"
 
+#define NVME_MAX_QS PCI_MSIX_FLAGS_QSIZE
+
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
 (trace_##trace)(__VA_ARGS__); \
@@ -1336,67 +1338,106 @@ static const MemoryRegionOps nvme_cmb_ops = {
 },
 };
 
-static void nvme_realize(PCIDevice *pci_dev, Error **errp)
+static int nvme_check_constraints(NvmeCtrl *n, Error **errp)
 {
-NvmeCtrl *n = NVME(pci_dev);
-NvmeIdCtrl *id = &n->id_ctrl;
-
-int i;
-int64_t bs_size;
-uint8_t *pci_conf;
-
-if (!n->params.num_queues) {
-error_setg(errp, "num_queues can't be zero");
-return;
-}
+NvmeParams *params = &n->params;
 
 if (!n->conf.blk) {
-error_setg(errp, "drive property not set");
-return;
+error_setg(errp, "nvme: block backend not configured");
+return 1;
 }
 
-bs_size = blk_getlength(n->conf.blk);
-if (bs_size < 0) {
-error_setg(errp, "could not get backing file size");
-return;
+if (!params->serial) {
+error_setg(errp, "nvme: serial not configured");
+return 1;
 }
 
-if (!n->params.serial) {
-error_setg(errp, "serial property not set");
-return;
+if ((params->num_queues < 1 || params->num_queues > NVME_MAX_QS)) {
+error_setg(errp, "nvme: invalid queue configuration");
+return 1;
 }
+
+return 0;
+}
+
+static int nvme_init_blk(NvmeCtrl *n, Error **errp)
+{
 blkconf_blocksizes(&n->conf);
 if (!blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk),
-   false, errp)) {
-return;
+false, errp)) {
+return 1;
 }
 
-pci_conf = pci_dev->config;
-pci_conf[PCI_INTERRUPT_PIN] = 1;
-pci_config_set_prog_interface(pci_dev->config, 0x2);
-pci_config_set_class(pci_dev->config, PCI_CLASS_STORAGE_EXPRESS);
-pcie_endpoint_cap_init(pci_dev, 0x80);
+return 0;
+}
 
+static void nvme_init_state(NvmeCtrl *n)
+{
 n->num_namespaces = 1;
 n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4);
-n->ns_size = bs_size / (uint64_t)n->num_namespaces;
-
 n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
 n->sq = g_new0(NvmeSQueue *, n->params.num_queues);
 n->cq = g_new0(NvmeCQueue *, n->params.num_queues);
+}
+
+static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
+{
+NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 2);
+NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
+
+NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
+NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 1);
+NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
+NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
+NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
+NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2);
+NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb);
+
+n->cmbloc = n->bar.cmbloc;
+n->cmbsz = n->bar.cmbsz;
+
+n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
+memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
+"nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
+pci_register_bar(pci_dev, NVME_CMBLOC_BIR(n->bar.cmbloc),
+PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64 |
+PCI_BASE_ADDRESS_MEM_PREFETCH, &n->ctrl_mem);
+}
+
+static void nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev)
+{
+uint8_t *pci_conf = pci_dev->config;
 
-memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
-  "nvme", n->reg_size);
+pci_conf[PCI_INTERRUPT_PIN] = 1;
+pci_config_set_prog_interface(pci_conf, 0x2);
+pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
+pci_config_set_device_id(pci_conf, 0x5845);
+pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
+pcie_endpoint_cap_init(pci_dev, 0x80);
+
+memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n, "nvme",
+n->reg_size);
 pci_register_bar(pci_dev, 0,
 PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
 &n->iomem);
 msix_init_exclusive_bar(pci_dev, n->params.num_queues, 4, NULL);
 
+if (n->params.cmb_size_mb) {
+nvme_init_cmb(n, pci_dev);
+}
+}
+
+static void nvme_init_ctrl(NvmeCtrl *n)
+{
+NvmeIdCtrl *id = &n->id_ctrl;
+NvmeParams *params = &n->params;
+uint8_t *pci_conf = n->parent_obj.config;
+
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
 strpadcpy(

[PATCH v2 09/20] nvme: add support for the asynchronous event request command

2019-10-15 Thread Klaus Jensen

Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
Section 5.2 ("Asynchronous Event Request command").

Mostly imported from Keith's qemu-nvme tree. Modified to not enqueue
events if something of the same type is already queued (but not cleared
by the host).

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c   | 180 --
 hw/block/nvme.h   |  13 ++-
 hw/block/trace-events |   8 ++
 include/block/nvme.h  |   4 +-
 4 files changed, 196 insertions(+), 9 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 4412a3bea3bc..5cdee37582f9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -334,6 +334,46 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, 
NvmeRequest *req)
 timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
 }
 
+static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type,
+uint8_t event_info, uint8_t log_page)
+{
+NvmeAsyncEvent *event;
+
+trace_nvme_enqueue_event(event_type, event_info, log_page);
+
+/*
+ * Do not enqueue the event if something of this type is already queued.
+ * This bounds the size of the event queue and makes sure it does not grow
+ * indefinitely when events are not processed by the host (i.e. does not
+ * issue any AERs).
+ */
+if (n->aer_mask_queued & (1 << event_type)) {
+trace_nvme_enqueue_event_masked(event_type);
+return;
+}
+n->aer_mask_queued |= (1 << event_type);
+
+event = g_new(NvmeAsyncEvent, 1);
+event->result = (NvmeAerResult) {
+.event_type = event_type,
+.event_info = event_info,
+.log_page   = log_page,
+};
+
+QTAILQ_INSERT_TAIL(&n->aer_queue, event, entry);
+
+timer_mod(n->aer_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
+}
+
+static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
+{
+n->aer_mask &= ~(1 << event_type);
+if (!QTAILQ_EMPTY(&n->aer_queue)) {
+timer_mod(n->aer_timer,
+qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
+}
+}
+
 static void nvme_rw_cb(void *opaque, int ret)
 {
 NvmeRequest *req = opaque;
@@ -578,7 +618,7 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
 return NVME_SUCCESS;
 }
 
-static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd,
+static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
 uint32_t buf_len, uint64_t off, NvmeRequest *req)
 {
 uint32_t trans_len;
@@ -591,12 +631,16 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd,
 
 trans_len = MIN(sizeof(*n->elpes) * (n->params.elpe + 1) - off, buf_len);
 
+if (!rae) {
+nvme_clear_events(n, NVME_AER_TYPE_ERROR);
+}
+
 return nvme_dma_read_prp(n, (uint8_t *) n->elpes + off, trans_len, prp1,
 prp2);
 }
 
-static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
-uint64_t off, NvmeRequest *req)
+static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
+uint32_t buf_len, uint64_t off, NvmeRequest *req)
 {
 uint64_t prp1 = le64_to_cpu(cmd->prp1);
 uint64_t prp2 = le64_to_cpu(cmd->prp2);
@@ -646,6 +690,10 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, 
uint32_t buf_len,
 smart.power_on_hours[0] = cpu_to_le64(
 (((current_ms - n->starttime_ms) / 1000) / 60) / 60);
 
+if (!rae) {
+nvme_clear_events(n, NVME_AER_TYPE_SMART);
+}
+
 return nvme_dma_read_prp(n, (uint8_t *) &smart + off, trans_len, prp1,
 prp2);
 }
@@ -698,9 +746,9 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 
 switch (lid) {
 case NVME_LOG_ERROR_INFO:
-return nvme_error_info(n, cmd, len, off, req);
+return nvme_error_info(n, cmd, rae, len, off, req);
 case NVME_LOG_SMART_INFO:
-return nvme_smart_info(n, cmd, len, off, req);
+return nvme_smart_info(n, cmd, rae, len, off, req);
 case NVME_LOG_FW_SLOT_INFO:
 return nvme_fw_log_info(n, cmd, len, off, req);
 default:
@@ -958,6 +1006,9 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 break;
 case NVME_TIMESTAMP:
 return nvme_get_feature_timestamp(n, cmd);
+case NVME_ASYNCHRONOUS_EVENT_CONF:
+result = cpu_to_le32(n->features.async_config);
+break;
 default:
 trace_nvme_err_invalid_getfeat(dw10);
 return NVME_INVALID_FIELD | NVME_DNR;
@@ -993,6 +1044,12 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 switch (dw10) {
 case NVME_TEMPERATURE_THRESHOLD:
 n->features.temp_thresh = dw11;
+
+if (n->features.temp_thresh <= n->temperature) {
+nvme_enqueue_event(n, NVME_AER_TYPE_SMART,
+NVME_AER_INFO_SMART_TEMP_THRESH, NVME_LOG_SMART_INFO);
+}
+
 break;
 
 case NVME_VOLATILE_WRITE_CACHE:
@@ -1008,6 +1065,9 @@ static uint16_t nvme_set_feature(NvmeCtr

[PATCH v2 02/20] nvme: move device parameters to separate struct

2019-10-15 Thread Klaus Jensen

Move device configuration parameters to separate struct to make it
explicit what is configurable and what is set internally.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 44 ++--
 hw/block/nvme.h | 16 +---
 2 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index c06e3ca31905..277700fdcc58 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -64,12 +64,12 @@ static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void 
*buf, int size)
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
 {
-return sqid < n->num_queues && n->sq[sqid] != NULL ? 0 : -1;
+return sqid < n->params.num_queues && n->sq[sqid] != NULL ? 0 : -1;
 }
 
 static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
 {
-return cqid < n->num_queues && n->cq[cqid] != NULL ? 0 : -1;
+return cqid < n->params.num_queues && n->cq[cqid] != NULL ? 0 : -1;
 }
 
 static void nvme_inc_cq_tail(NvmeCQueue *cq)
@@ -631,7 +631,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
 trace_nvme_err_invalid_create_cq_addr(prp1);
 return NVME_INVALID_FIELD | NVME_DNR;
 }
-if (unlikely(vector > n->num_queues)) {
+if (unlikely(vector > n->params.num_queues)) {
 trace_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
@@ -783,7 +783,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 trace_nvme_getfeat_vwcache(result ? "enabled" : "disabled");
 break;
 case NVME_NUMBER_OF_QUEUES:
-result = cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 
16));
+result = cpu_to_le32((n->params.num_queues - 2) |
+((n->params.num_queues - 2) << 16));
 trace_nvme_getfeat_numq(result);
 break;
 case NVME_TIMESTAMP:
@@ -827,9 +828,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 case NVME_NUMBER_OF_QUEUES:
 trace_nvme_setfeat_numq((dw11 & 0x) + 1,
 ((dw11 >> 16) & 0x) + 1,
-n->num_queues - 1, n->num_queues - 1);
-req->cqe.result =
-cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
+n->params.num_queues - 1,
+n->params.num_queues - 1);
+req->cqe.result = cpu_to_le32((n->params.num_queues - 2) |
+((n->params.num_queues - 2) << 16));
 break;
 case NVME_TIMESTAMP:
 return nvme_set_feature_timestamp(n, cmd);
@@ -900,12 +902,12 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
 
 blk_drain(n->conf.blk);
 
-for (i = 0; i < n->num_queues; i++) {
+for (i = 0; i < n->params.num_queues; i++) {
 if (n->sq[i] != NULL) {
 nvme_free_sq(n->sq[i], n);
 }
 }
-for (i = 0; i < n->num_queues; i++) {
+for (i = 0; i < n->params.num_queues; i++) {
 if (n->cq[i] != NULL) {
 nvme_free_cq(n->cq[i], n);
 }
@@ -1308,7 +1310,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 int64_t bs_size;
 uint8_t *pci_conf;
 
-if (!n->num_queues) {
+if (!n->params.num_queues) {
 error_setg(errp, "num_queues can't be zero");
 return;
 }
@@ -1324,7 +1326,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 return;
 }
 
-if (!n->serial) {
+if (!n->params.serial) {
 error_setg(errp, "serial property not set");
 return;
 }
@@ -1341,25 +1343,25 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 pcie_endpoint_cap_init(pci_dev, 0x80);
 
 n->num_namespaces = 1;
-n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
+n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4);
 n->ns_size = bs_size / (uint64_t)n->num_namespaces;
 
 n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
-n->sq = g_new0(NvmeSQueue *, n->num_queues);
-n->cq = g_new0(NvmeCQueue *, n->num_queues);
+n->sq = g_new0(NvmeSQueue *, n->params.num_queues);
+n->cq = g_new0(NvmeCQueue *, n->params.num_queues);
 
 memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
   "nvme", n->reg_size);
 pci_register_bar(pci_dev, 0,
 PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
 &n->iomem);
-msix_init_exclusive_bar(pci_dev, n->num_queues, 4, NULL);
+msix_init_exclusive_bar(pci_dev, n->params.num_queues, 4, NULL);
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
 strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
 strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
-strpadcpy((char *)id->sn, sizeof(id->sn), n->serial, ' ');
+strpadcpy((char *)id->sn

[PATCH v2 19/20] nvme: make lba data size configurable

2019-10-15 Thread Klaus Jensen

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.c | 2 +-
 hw/block/nvme-ns.h | 4 +++-
 hw/block/nvme.c| 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index aa76bb63ef45..70ff622a5729 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -18,7 +18,7 @@ static int nvme_ns_init(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = &ns->id_ns;
 
-id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
+id_ns->lbaf[0].ds = ns->params.lbads;
 id_ns->nuse = id_ns->ncap = id_ns->nsze =
 cpu_to_le64(nvme_ns_nlbas(ns));
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 64dd054cf6a9..aa1c81d85cde 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -6,10 +6,12 @@
 OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
 
 #define DEFINE_NVME_NS_PROPERTIES(_state, _props) \
-DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0)
+DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0), \
+DEFINE_PROP_UINT8("lbads", _state, _props.lbads, 9)
 
 typedef struct NvmeNamespaceParams {
 uint32_t nsid;
+uint8_t  lbads;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 67f92bf5a3ac..d0103c16cfe9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2602,6 +2602,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 if (n->namespace.conf.blk) {
 ns = &n->namespace;
 ns->params.nsid = 1;
+ns->params.lbads = 9;
 
 if (nvme_ns_setup(n, ns, &local_err)) {
 error_propagate_prepend(errp, local_err, "nvme_ns_setup: ");
-- 
2.23.0

[PATCH v2 10/20] nvme: add logging to error information log page

2019-10-15 Thread Klaus Jensen

This adds the nvme_set_error_page function which allows errors to be
written to the error information log page. The functionality is largely
unused in the device, but with this in place we can at least try to push
new contributions to use it.

NOTE: In violation of the specification the Error Count field is *not*
retained across power off conditions because the device currently has no
place to store this kind of persistent state.

Cribbed from Keith's qemu-nvme tree.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 22 --
 hw/block/nvme.h |  2 ++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5cdee37582f9..32381d7df655 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -161,6 +161,22 @@ static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
 }
 }
 
+static void nvme_set_error_page(NvmeCtrl *n, uint16_t sqid, uint16_t cid,
+uint16_t status, uint16_t location, uint64_t lba, uint32_t nsid)
+{
+NvmeErrorLog *elp;
+
+elp = &n->elpes[n->elp_index];
+elp->error_count = n->error_count++;
+elp->sqid = sqid;
+elp->cid = cid;
+elp->status_field = status;
+elp->param_error_location = location;
+elp->lba = lba;
+elp->nsid = nsid;
+n->elp_index = (n->elp_index + 1) % n->params.elpe;
+}
+
 static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
  uint64_t prp2, uint32_t len, NvmeCtrl *n)
 {
@@ -386,7 +402,9 @@ static void nvme_rw_cb(void *opaque, int ret)
 req->status = NVME_SUCCESS;
 } else {
 block_acct_failed(blk_get_stats(n->conf.blk), &req->acct);
-req->status = NVME_INTERNAL_DEV_ERROR;
+nvme_set_error_page(n, sq->sqid, cpu_to_le16(req->cid),
+NVME_INTERNAL_DEV_ERROR, 0, 0, 1);
+req->status = NVME_INTERNAL_DEV_ERROR | NVME_MORE;
 }
 if (req->has_sg) {
 qemu_sglist_destroy(&req->qsg);
@@ -678,7 +696,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, 
uint8_t rae,
 smart.host_read_commands[0] = cpu_to_le64(read_commands);
 smart.host_write_commands[0] = cpu_to_le64(write_commands);
 
-smart.number_of_error_log_entries[0] = cpu_to_le64(0);
+smart.number_of_error_log_entries[0] = cpu_to_le64(n->error_count);
 smart.temperature[0] = n->temperature & 0xff;
 smart.temperature[1] = (n->temperature >> 8) & 0xff;
 
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 3fc36f577b46..d74b0e0f9b2c 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -100,6 +100,8 @@ typedef struct NvmeCtrl {
 uint64_ttimestamp_set_qemu_clock_ms;/* QEMU clock time */
 uint64_tstarttime_ms;
 uint16_ttemperature;
+uint8_t elp_index;
+uint64_terror_count;
 
 QEMUTimer   *aer_timer;
 uint8_t aer_mask;
-- 
2.23.0

[PATCH v2 12/20] nvme: bump supported specification version to 1.3

2019-10-15 Thread Klaus Jensen

Add the new Namespace Identification Descriptor List (CNS 03h) and track
creation of queues to enable the controller to return Command Sequence
Error if Set Features is called for Number of Queues after any queues
have been created.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c   | 82 +++
 hw/block/nvme.h   |  1 +
 hw/block/trace-events |  8 +++--
 include/block/nvme.h  | 30 +---
 4 files changed, 100 insertions(+), 21 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e7d46dcc6afe..1e2320b38b14 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -9,20 +9,22 @@
  */
 
 /**
- * Reference Specification: NVM Express 1.2.1
+ * Reference Specification: NVM Express 1.3d
  *
  *   https://nvmexpress.org/resources/specifications/
  */
 
 /**
  * Usage: add options:
- *  -drive file=,if=none,id=
- *  -device nvme,drive=,serial=,id=, \
- *  cmb_size_mb=, \
- *  num_queues=
+ * -drive file=,if=none,id=
+ * -device nvme,drive=,serial=,id=
  *
- * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
- * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
+ * Advanced optional options:
+ *
+ *   num_queues=  : Maximum number of IO Queues.
+ *  Default: 64
+ *   cmb_size_mb= : Size of Controller Memory Buffer in MBs.
+ *  Default: 0 (disabled)
  */
 
 #include "qemu/osdep.h"
@@ -345,6 +347,8 @@ static void nvme_post_cqes(void *opaque)
 static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
 {
 assert(cq->cqid == req->sq->cqid);
+
+trace_nvme_enqueue_req_completion(req->cid, cq->cqid, req->status);
 QTAILQ_REMOVE(&req->sq->out_req_list, req, entry);
 QTAILQ_INSERT_TAIL(&cq->req_list, req, entry);
 timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
@@ -530,6 +534,7 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n)
 if (sq->sqid) {
 g_free(sq);
 }
+n->qs_created--;
 }
 
 static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd)
@@ -596,6 +601,7 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, 
uint64_t dma_addr,
 cq = n->cq[cqid];
 QTAILQ_INSERT_TAIL(&(cq->sq_list), sq, entry);
 n->sq[sqid] = sq;
+n->qs_created++;
 }
 
 static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
@@ -742,7 +748,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 uint32_t dw11 = le32_to_cpu(cmd->cdw11);
 uint32_t dw12 = le32_to_cpu(cmd->cdw12);
 uint32_t dw13 = le32_to_cpu(cmd->cdw13);
-uint16_t lid = dw10 & 0xff;
+uint8_t  lid = dw10 & 0xff;
+uint8_t  lsp = (dw10 >> 8) & 0xf;
 uint8_t  rae = (dw10 >> 15) & 0x1;
 uint32_t numdl, numdu;
 uint64_t off, lpol, lpou;
@@ -760,7 +767,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-trace_nvme_get_log(req->cid, lid, rae, len, off);
+trace_nvme_get_log(req->cid, lid, lsp, rae, len, off);
 
 switch (lid) {
 case NVME_LOG_ERROR_INFO:
@@ -784,6 +791,7 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 if (cq->cqid) {
 g_free(cq);
 }
+n->qs_created--;
 }
 
 static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeCmd *cmd)
@@ -824,6 +832,7 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, 
uint64_t dma_addr,
 msix_vector_use(&n->parent_obj, cq->vector);
 n->cq[cqid] = cq;
 cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+n->qs_created++;
 }
 
 static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
@@ -897,7 +906,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify 
*c)
 prp1, prp2);
 }
 
-static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
+static uint16_t nvme_identify_ns_list(NvmeCtrl *n, NvmeIdentify *c)
 {
 static const int data_len = 4 * KiB;
 uint32_t min_nsid = le32_to_cpu(c->nsid);
@@ -907,7 +916,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeIdentify *c)
 uint16_t ret;
 int i, j = 0;
 
-trace_nvme_identify_nslist(min_nsid);
+trace_nvme_identify_ns_list(min_nsid);
 
 list = g_malloc0(data_len);
 for (i = 0; i < n->num_namespaces; i++) {
@@ -924,6 +933,41 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeIdentify *c)
 return ret;
 }
 
+static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeCmd *c)
+{
+static const int len = 4096;
+
+struct ns_descr {
+uint8_t nidt;
+uint8_t nidl;
+uint8_t rsvd2[2];
+uint8_t nid[16];
+};
+
+uint32_t nsid = le32_to_cpu(c->nsid);
+uint64_t prp1 = le64_to_cpu(c->prp1);
+uint64_t prp2 = le64_to_cpu(c->prp2);
+
+struct ns_descr *list;
+uint16_t ret;
+
+trace_nvme_identify_ns_descr_list(nsid);
+
+if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
+trace_nvme_err_in

[PATCH v2 17/20] nvme: bump controller pci device id

2019-10-15 Thread Klaus Jensen

Since commits 9d6459d21a6e ("nvme: fix write zeroes offset and count")
and c7fe50bcf1f1 ("nvme: support multiple namespaces") the controller
device no longer has the quirks that the Linux kernel think it has.

As the quirks are applied based on pci vendor and device id, bump the
device id to get rid of them.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a23e9bc4e5ef..bcd801c345b6 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2500,7 +2500,7 @@ static void nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev)
 pci_conf[PCI_INTERRUPT_PIN] = 1;
 pci_config_set_prog_interface(pci_conf, 0x2);
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
-pci_config_set_device_id(pci_conf, 0x5845);
+pci_config_set_device_id(pci_conf, 0x5846);
 pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
 pcie_endpoint_cap_init(pci_dev, 0x80);
 
@@ -2655,7 +2655,7 @@ static void nvme_class_init(ObjectClass *oc, void *data)
 pc->exit = nvme_exit;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
 pc->vendor_id = PCI_VENDOR_ID_INTEL;
-pc->device_id = 0x5845;
+pc->device_id = 0x5846;
 pc->revision = 2;
 
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
-- 
2.23.0

Re: [PATCH v3 07/10] migration: add new migration state wait-unplug

2019-10-15 Thread Dr. David Alan Gilbert

* Jens Freimann (jfreim...@redhat.com) wrote:
> On Fri, Oct 11, 2019 at 06:11:33PM +0100, Dr. David Alan Gilbert wrote:
> > * Jens Freimann (jfreim...@redhat.com) wrote:
> > > This patch adds a new migration state called wait-unplug.  It is entered
> > > after the SETUP state and will transition into ACTIVE once all devices
> > > were succesfully unplugged from the guest.
> > > 
> > > So if a guest doesn't respond or takes long to honor the unplug request
> > > the user will see the migration state 'wait-unplug'.
> > > 
> > > In the migration thread we query failover devices if they're are still
> > > pending the guest unplug. When all are unplugged the migration
> > > continues. We give it a defined number of iterations including small
> > > waiting periods before we proceed.
> > > 
> > > Signed-off-by: Jens Freimann 
> [..]
> > > @@ -3260,6 +3271,27 @@ static void *migration_thread(void *opaque)
> > > 
> > >  qemu_savevm_state_setup(s->to_dst_file);
> > > 
> > > +migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> > > +  MIGRATION_STATUS_WAIT_UNPLUG);
> > 
> > I think I'd prefer if you only went into this state if you had any
> > devices that were going to need unplugging.
> 
> Sure, that makes sense. I'll change it.
> 
> > > +while (i < FAILOVER_UNPLUG_RETRIES &&
> > > +   s->state == MIGRATION_STATUS_WAIT_UNPLUG) {
> > > +i++;
> > > +qemu_sem_timedwait(&s->wait_unplug_sem, 
> > > FAILOVER_GUEST_UNPLUG_WAIT);
> > > +all_unplugged = qemu_savevm_state_guest_unplug_pending();
> > > +if (all_unplugged) {
> > > +break;
> > > +}
> > > +}
> > > +
> > > +if (all_unplugged) {
> > > +migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG,
> > > +MIGRATION_STATUS_ACTIVE);
> > > +} else {
> > > +migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG,
> > > +  MIGRATION_STATUS_CANCELLING);
> > > +}
> > 
> > I think you can get rid of both the timeout and the count and just make
> > sure that migrate_cancel works at this point.
> 
> I see, I need to add the new state to migration_is_setup_or_active() or
> a cancel won't work.

You probably need to do that anyway given all the other places
is_setup_or_active is called.

> > This pushes the problem up a layer, which I think is fine.
> 
> Seems good to me. To be clear, you're saying I should just poll on
> the device unplugged state? Like
> 
> while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
>!qemu_savevm_state_guest_unplug_pending()) {
> _/* This block intentionally left blank */
> }

I'd keep the qemu_sem_timedwait in there, but with a short time out
(e.g. 250ms say); that way it doesn't eat cpu, but also the cancel still
happens quickly.

Dave

> 
> regards,
> Jens
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PATCH v2 18/20] nvme: remove redundant NvmeCmd pointer parameter

2019-10-15 Thread Klaus Jensen

The command struct is available in the NvmeRequest that we generally
pass around anyway.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 219 +++-
 1 file changed, 106 insertions(+), 113 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index bcd801c345b6..67f92bf5a3ac 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -574,14 +574,14 @@ static uint16_t nvme_dma_write_sgl(NvmeCtrl *n, uint8_t 
*ptr, uint32_t len,
 }
 
 static uint16_t nvme_dma_write(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
-NvmeCmd *cmd, NvmeRequest *req)
+NvmeRequest *req)
 {
-if (NVME_CMD_FLAGS_PSDT(cmd->flags)) {
-return nvme_dma_write_sgl(n, ptr, len, cmd->dptr.sgl, req);
+if (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) {
+return nvme_dma_write_sgl(n, ptr, len, req->cmd.dptr.sgl, req);
 }
 
-uint64_t prp1 = le64_to_cpu(cmd->dptr.prp.prp1);
-uint64_t prp2 = le64_to_cpu(cmd->dptr.prp.prp2);
+uint64_t prp1 = le64_to_cpu(req->cmd.dptr.prp.prp1);
+uint64_t prp2 = le64_to_cpu(req->cmd.dptr.prp.prp2);
 
 return nvme_dma_write_prp(n, ptr, len, prp1, prp2, req);
 }
@@ -624,7 +624,7 @@ out:
 }
 
 static uint16_t nvme_dma_read_sgl(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
-NvmeSglDescriptor sgl, NvmeCmd *cmd, NvmeRequest *req)
+NvmeSglDescriptor sgl, NvmeRequest *req)
 {
 QEMUSGList qsg;
 uint16_t err = NVME_SUCCESS;
@@ -662,29 +662,29 @@ out:
 }
 
 static uint16_t nvme_dma_read(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
-NvmeCmd *cmd, NvmeRequest *req)
+NvmeRequest *req)
 {
-if (NVME_CMD_FLAGS_PSDT(cmd->flags)) {
-return nvme_dma_read_sgl(n, ptr, len, cmd->dptr.sgl, cmd, req);
+if (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) {
+return nvme_dma_read_sgl(n, ptr, len, req->cmd.dptr.sgl, req);
 }
 
-uint64_t prp1 = le64_to_cpu(cmd->dptr.prp.prp1);
-uint64_t prp2 = le64_to_cpu(cmd->dptr.prp.prp2);
+uint64_t prp1 = le64_to_cpu(req->cmd.dptr.prp.prp1);
+uint64_t prp2 = le64_to_cpu(req->cmd.dptr.prp.prp2);
 
 return nvme_dma_read_prp(n, ptr, len, prp1, prp2, req);
 }
 
-static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_map(NvmeCtrl *n, NvmeRequest *req)
 {
 uint32_t len = req->nlb << nvme_ns_lbads(req->ns);
 uint64_t prp1, prp2;
 
-if (NVME_CMD_FLAGS_PSDT(cmd->flags)) {
-return nvme_map_sgl(n, &req->qsg, cmd->dptr.sgl, len, req);
+if (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) {
+return nvme_map_sgl(n, &req->qsg, req->cmd.dptr.sgl, len, req);
 }
 
-prp1 = le64_to_cpu(cmd->dptr.prp.prp1);
-prp2 = le64_to_cpu(cmd->dptr.prp.prp2);
+prp1 = le64_to_cpu(req->cmd.dptr.prp.prp1);
+prp2 = le64_to_cpu(req->cmd.dptr.prp.prp2);
 
 return nvme_map_prp(n, &req->qsg, prp1, prp2, len, req);
 }
@@ -1045,7 +1045,7 @@ static uint16_t nvme_check_rw(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_SUCCESS;
 }
 
-static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeNamespace *ns = req->ns;
 
@@ -1057,12 +1057,12 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
-static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeAIO *aio;
 
 NvmeNamespace *ns = req->ns;
-NvmeRwCmd *rw = (NvmeRwCmd *) cmd;
+NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
 
 int64_t offset;
 size_t count;
@@ -1092,9 +1092,9 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
-static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 {
-NvmeRwCmd *rw = (NvmeRwCmd *) cmd;
+NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
 NvmeNamespace *ns = req->ns;
 int status;
 
@@ -1114,7 +1114,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 return status;
 }
 
-status = nvme_map(n, cmd, req);
+status = nvme_map(n, req);
 if (status) {
 block_acct_invalid(blk_get_stats(ns->conf.blk), acct);
 return status;
@@ -1126,11 +1126,12 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
-static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
-uint32_t nsid = le32_to_cpu(cmd->nsid);
+uint32_t nsid = le32_to_cpu(req->cmd.nsid);
 
-trace_nvme_io_cmd(req->cid, nsid, le16_to_cpu(req->sq->sqid), cmd->opcode);
+trace_nvme_io_cmd(req->cid, nsid, le16_to_cpu(req->sq->sqid),
+req->cmd.opcode);
 
 req->ns = nvme_ns(n, nsid);
 
@@ -1139,16 +1140,16 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 return NVME_INVALID_NSI

[PATCH v2 11/20] nvme: add missing mandatory features

2019-10-15 Thread Klaus Jensen

Add support for returning a resonable response to Get/Set Features of
mandatory features.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c   | 51 ---
 hw/block/trace-events |  2 ++
 include/block/nvme.h  |  3 ++-
 3 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 32381d7df655..e7d46dcc6afe 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1007,12 +1007,24 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, 
NvmeCmd *cmd)
 static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 {
 uint32_t dw10 = le32_to_cpu(cmd->cdw10);
+uint32_t dw11 = le32_to_cpu(cmd->cdw11);
 uint32_t result;
 
+trace_nvme_getfeat(dw10);
+
 switch (dw10) {
+case NVME_ARBITRATION:
+result = cpu_to_le32(n->features.arbitration);
+break;
+case NVME_POWER_MANAGEMENT:
+result = cpu_to_le32(n->features.power_mgmt);
+break;
 case NVME_TEMPERATURE_THRESHOLD:
 result = cpu_to_le32(n->features.temp_thresh);
 break;
+case NVME_ERROR_RECOVERY:
+result = cpu_to_le32(n->features.err_rec);
+break;
 case NVME_VOLATILE_WRITE_CACHE:
 result = blk_enable_write_cache(n->conf.blk);
 trace_nvme_getfeat_vwcache(result ? "enabled" : "disabled");
@@ -1024,6 +1036,19 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 break;
 case NVME_TIMESTAMP:
 return nvme_get_feature_timestamp(n, cmd);
+case NVME_INTERRUPT_COALESCING:
+result = cpu_to_le32(n->features.int_coalescing);
+break;
+case NVME_INTERRUPT_VECTOR_CONF:
+if ((dw11 & 0x) > n->params.num_queues) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+result = cpu_to_le32(n->features.int_vector_config[dw11 & 0x]);
+break;
+case NVME_WRITE_ATOMICITY:
+result = cpu_to_le32(n->features.write_atomicity);
+break;
 case NVME_ASYNCHRONOUS_EVENT_CONF:
 result = cpu_to_le32(n->features.async_config);
 break;
@@ -1059,6 +1084,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 uint32_t dw10 = le32_to_cpu(cmd->cdw10);
 uint32_t dw11 = le32_to_cpu(cmd->cdw11);
 
+trace_nvme_setfeat(dw10, dw11);
+
 switch (dw10) {
 case NVME_TEMPERATURE_THRESHOLD:
 n->features.temp_thresh = dw11;
@@ -1086,6 +1113,13 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 case NVME_ASYNCHRONOUS_EVENT_CONF:
 n->features.async_config = dw11;
 break;
+case NVME_ARBITRATION:
+case NVME_POWER_MANAGEMENT:
+case NVME_ERROR_RECOVERY:
+case NVME_INTERRUPT_COALESCING:
+case NVME_INTERRUPT_VECTOR_CONF:
+case NVME_WRITE_ATOMICITY:
+return NVME_FEAT_NOT_CHANGABLE | NVME_DNR;
 default:
 trace_nvme_err_invalid_setfeat(dw10);
 return NVME_INVALID_FIELD | NVME_DNR;
@@ -1709,6 +1743,14 @@ static void nvme_init_state(NvmeCtrl *n)
 n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->temperature = NVME_TEMPERATURE;
 n->features.temp_thresh = 0x14d;
+n->features.int_vector_config = g_malloc0_n(n->params.num_queues,
+sizeof(*n->features.int_vector_config));
+
+/* disable coalescing (not supported) */
+for (int i = 0; i < n->params.num_queues; i++) {
+n->features.int_vector_config[i] = i | (1 << 16);
+}
+
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
 }
 
@@ -1786,15 +1828,17 @@ static void nvme_init_ctrl(NvmeCtrl *n)
 id->nn = cpu_to_le32(n->num_namespaces);
 id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
 
+
+if (blk_enable_write_cache(n->conf.blk)) {
+id->vwc = 1;
+}
+
 strcpy((char *) id->subnqn, "nqn.2019-08.org.qemu:");
 pstrcat((char *) id->subnqn, sizeof(id->subnqn), n->params.serial);
 
 id->psd[0].mp = cpu_to_le16(0x9c4);
 id->psd[0].enlat = cpu_to_le32(0x10);
 id->psd[0].exlat = cpu_to_le32(0x4);
-if (blk_enable_write_cache(n->conf.blk)) {
-id->vwc = 1;
-}
 
 n->bar.cap = 0;
 NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
@@ -1866,6 +1910,7 @@ static void nvme_exit(PCIDevice *pci_dev)
 g_free(n->sq);
 g_free(n->elpes);
 g_free(n->aer_reqs);
+g_free(n->features.int_vector_config);
 
 if (n->params.cmb_size_mb) {
 g_free(n->cmbuf);
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 6ddb13d34061..a20a68d85d5a 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -41,6 +41,8 @@ nvme_del_cq(uint16_t cqid) "deleted completion queue, 
sqid=%"PRIu16""
 nvme_identify_ctrl(void) "identify controller"
 nvme_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16""
 nvme_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16""
+nvme_getfeat(uint32_t fid) "fid 0x%"PRIx32""
+nvme_se

[PATCH v2 14/20] nvme: allow multiple aios per command

2019-10-15 Thread Klaus Jensen

This refactors how the device issues asynchronous block backend
requests. The NvmeRequest now holds a queue of NvmeAIOs that are
associated with the command. This allows multiple aios to be issued for
a command. Only when all requests have been completed will the device
post a completion queue entry.

Because the device is currently guaranteed to only issue a single aio
request per command, the benefit is not immediately obvious. But this
functionality is required to support metadata.

Signed-off-by: Klaus Jensen 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c   | 455 +-
 hw/block/nvme.h   | 165 ---
 hw/block/trace-events |   8 +
 3 files changed, 511 insertions(+), 117 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index cbc0b6a660b6..f4b9bd36a04e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -25,6 +25,8 @@
  *  Default: 64
  *   cmb_size_mb= : Size of Controller Memory Buffer in MBs.
  *  Default: 0 (disabled)
+ *   mdts= : Maximum Data Transfer Size (power of two)
+ *  Default: 7
  */
 
 #include "qemu/osdep.h"
@@ -56,6 +58,7 @@
 } while (0)
 
 static void nvme_process_sq(void *opaque);
+static void nvme_aio_cb(void *opaque, int ret);
 
 static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
 {
@@ -197,7 +200,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, 
uint64_t prp1,
 }
 
 if (nvme_addr_is_cmb(n, prp1)) {
-req->is_cmb = true;
+nvme_req_set_cmb(req);
 }
 
 pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
@@ -255,8 +258,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, 
uint64_t prp1,
 }
 
 addr_is_cmb = nvme_addr_is_cmb(n, prp_ent);
-if ((req->is_cmb && !addr_is_cmb) ||
-(!req->is_cmb && addr_is_cmb)) {
+if ((nvme_req_is_cmb(req) && !addr_is_cmb) ||
+(!nvme_req_is_cmb(req) && addr_is_cmb)) {
 status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
 goto unmap;
 }
@@ -269,8 +272,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, 
uint64_t prp1,
 }
 } else {
 bool addr_is_cmb = nvme_addr_is_cmb(n, prp2);
-if ((req->is_cmb && !addr_is_cmb) ||
-(!req->is_cmb && addr_is_cmb)) {
+if ((nvme_req_is_cmb(req) && !addr_is_cmb) ||
+(!nvme_req_is_cmb(req) && addr_is_cmb)) {
 status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
 goto unmap;
 }
@@ -312,7 +315,7 @@ static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t 
*ptr, uint32_t len,
 return status;
 }
 
-if (req->is_cmb) {
+if (nvme_req_is_cmb(req)) {
 QEMUIOVector iov;
 
 qemu_iovec_init(&iov, qsg.nsg);
@@ -341,19 +344,18 @@ static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t 
*ptr, uint32_t len,
 static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
 uint64_t prp1, uint64_t prp2, NvmeRequest *req)
 {
-QEMUSGList qsg;
 uint16_t status = NVME_SUCCESS;
 
-status = nvme_map_prp(n, &qsg, prp1, prp2, len, req);
+status = nvme_map_prp(n, &req->qsg, prp1, prp2, len, req);
 if (status) {
 return status;
 }
 
-if (req->is_cmb) {
+if (nvme_req_is_cmb(req)) {
 QEMUIOVector iov;
 
-qemu_iovec_init(&iov, qsg.nsg);
-dma_to_cmb(n, &qsg, &iov);
+qemu_iovec_init(&iov, req->qsg.nsg);
+dma_to_cmb(n, &req->qsg, &iov);
 
 if (unlikely(qemu_iovec_from_buf(&iov, 0, ptr, len) != len)) {
 trace_nvme_err_invalid_dma();
@@ -365,17 +367,137 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t 
*ptr, uint32_t len,
 goto out;
 }
 
-if (unlikely(dma_buf_read(ptr, len, &qsg))) {
+if (unlikely(dma_buf_read(ptr, len, &req->qsg))) {
 trace_nvme_err_invalid_dma();
 status = NVME_INVALID_FIELD | NVME_DNR;
 }
 
 out:
-qemu_sglist_destroy(&qsg);
+qemu_sglist_destroy(&req->qsg);
 
 return status;
 }
 
+static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+{
+NvmeNamespace *ns = req->ns;
+
+uint32_t len = req->nlb << nvme_ns_lbads(ns);
+uint64_t prp1 = le64_to_cpu(cmd->prp1);
+uint64_t prp2 = le64_to_cpu(cmd->prp2);
+
+return nvme_map_prp(n, &req->qsg, prp1, prp2, len, req);
+}
+
+static void nvme_aio_destroy(NvmeAIO *aio)
+{
+if (aio->iov.nalloc) {
+qemu_iovec_destroy(&aio->iov);
+}
+
+g_free(aio);
+}
+
+static NvmeAIO *nvme_aio_new(BlockBackend *blk, int64_t offset,
+QEMUSGList *qsg, NvmeRequest *req, NvmeAIOCompletionFunc *cb)
+{
+NvmeAIO *aio = g_malloc0(sizeof(*aio));
+
+*aio = (NvmeAIO) {
+.blk = blk,
+.offset = offset,
+.req = req,
+.qs

[PATCH v2 20/20] nvme: handle dma errors

2019-10-15 Thread Klaus Jensen

Handling DMA errors gracefully is required for the device to pass the
block/011 test ("disable PCI device while doing I/O") in the blktests
suite.

With this patch the device passes the test by retrying "critical"
transfers (posting of completion entries and processing of submission
queue entries).

If DMA errors occur at any other point in the execution of the command
(say, while mapping the PRPs or SGLs), the command is aborted with a
Data Transfer Error status code.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c   | 63 +--
 hw/block/trace-events |  2 ++
 include/block/nvme.h  |  2 +-
 3 files changed, 52 insertions(+), 15 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d0103c16cfe9..00c5b843295b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -71,26 +71,26 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr 
addr)
 return addr >= low && addr < hi;
 }
 
-static inline void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf,
+static inline int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf,
 int size)
 {
 if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
 memcpy(buf, (void *) &n->cmbuf[addr - n->ctrl_mem.addr], size);
-return;
+return 0;
 }
 
-pci_dma_read(&n->parent_obj, addr, buf, size);
+return pci_dma_read(&n->parent_obj, addr, buf, size);
 }
 
-static inline void nvme_addr_write(NvmeCtrl *n, hwaddr addr, void *buf,
+static inline int nvme_addr_write(NvmeCtrl *n, hwaddr addr, void *buf,
 int size)
 {
 if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
 memcpy((void *) &n->cmbuf[addr - n->ctrl_mem.addr], buf, size);
-return;
+return 0;
 }
 
-pci_dma_write(&n->parent_obj, addr, buf, size);
+return pci_dma_write(&n->parent_obj, addr, buf, size);
 }
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
@@ -228,7 +228,11 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, 
uint64_t prp1,
 
 nents = (len + n->page_size - 1) >> n->page_bits;
 prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-nvme_addr_read(n, prp2, (void *) prp_list, prp_trans);
+if (nvme_addr_read(n, prp2, (void *) prp_list, prp_trans)) {
+trace_nvme_err_addr_read((void *) prp2);
+status = NVME_DATA_TRANSFER_ERROR;
+goto unmap;
+}
 while (len != 0) {
 bool addr_is_cmb;
 uint64_t prp_ent = le64_to_cpu(prp_list[i]);
@@ -250,7 +254,11 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, 
uint64_t prp1,
 i = 0;
 nents = (len + n->page_size - 1) >> n->page_bits;
 prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-nvme_addr_read(n, prp_ent, (void *) prp_list, prp_trans);
+if (nvme_addr_read(n, prp_ent, (void *) prp_list, 
prp_trans)) {
+trace_nvme_err_addr_read((void *) prp_ent);
+status = NVME_DATA_TRANSFER_ERROR;
+goto unmap;
+}
 prp_ent = le64_to_cpu(prp_list[i]);
 }
 
@@ -402,7 +410,11 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg,
 
 /* read the segment in chunks of 256 descriptors (4k) */
 while (nsgld > MAX_NSGLD) {
-nvme_addr_read(n, addr, segment, sizeof(segment));
+if (nvme_addr_read(n, addr, segment, sizeof(segment))) {
+trace_nvme_err_addr_read((void *) addr);
+status = NVME_DATA_TRANSFER_ERROR;
+goto unmap;
+}
 
 status = nvme_map_sgl_data(n, qsg, segment, MAX_NSGLD, &len, req);
 if (status) {
@@ -413,7 +425,11 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg,
 addr += MAX_NSGLD * sizeof(NvmeSglDescriptor);
 }
 
-nvme_addr_read(n, addr, segment, nsgld * sizeof(NvmeSglDescriptor));
+if (nvme_addr_read(n, addr, segment, nsgld * 
sizeof(NvmeSglDescriptor))) {
+trace_nvme_err_addr_read((void *) addr);
+status = NVME_DATA_TRANSFER_ERROR;
+goto unmap;
+}
 
 sgl = segment[nsgld - 1];
 addr = le64_to_cpu(sgl.addr);
@@ -458,7 +474,11 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg,
 nsgld = le64_to_cpu(sgl.len) / sizeof(NvmeSglDescriptor);
 
 while (nsgld > MAX_NSGLD) {
-nvme_addr_read(n, addr, segment, sizeof(segment));
+if (nvme_addr_read(n, addr, segment, sizeof(segment))) {
+trace_nvme_err_addr_read((void *) addr);
+status = NVME_DATA_TRANSFER_ERROR;
+goto unmap;
+}
 
 status = nvme_map_sgl_data(n, qsg, segment, MAX_NSGLD, &len, req);
 if (status) {
@@ -469,7 +489,11 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *

[PATCH v2 13/20] nvme: refactor prp mapping

2019-10-15 Thread Klaus Jensen

Instead of handling both QSGs and IOVs in multiple places, simply use
QSGs everywhere by assuming that the request does not involve the
controller memory buffer (CMB). If the request is found to involve the
CMB, convert the QSG to an IOV and issue the I/O. The QSG is converted
to an IOV by the dma helpers anyway, so the CMB path is not unfairly
affected by this simplifying change.

As a side-effect, this patch also allows PRPs to be located in the CMB.
The logic ensures that if some of the PRP is in the CMB, all of it must
be located there, as per the specification.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c   | 255 --
 hw/block/nvme.h   |   4 +-
 hw/block/trace-events |   1 +
 include/block/nvme.h  |   1 +
 4 files changed, 174 insertions(+), 87 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 1e2320b38b14..cbc0b6a660b6 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -179,138 +179,200 @@ static void nvme_set_error_page(NvmeCtrl *n, uint16_t 
sqid, uint16_t cid,
 n->elp_index = (n->elp_index + 1) % n->params.elpe;
 }
 
-static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
- uint64_t prp2, uint32_t len, NvmeCtrl *n)
+static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, uint64_t prp1,
+uint64_t prp2, uint32_t len, NvmeRequest *req)
 {
 hwaddr trans_len = n->page_size - (prp1 % n->page_size);
 trans_len = MIN(len, trans_len);
 int num_prps = (len >> n->page_bits) + 1;
+uint16_t status = NVME_SUCCESS;
+bool prp_list_in_cmb = false;
+
+trace_nvme_map_prp(req->cid, req->cmd.opcode, trans_len, len, prp1, prp2,
+num_prps);
 
 if (unlikely(!prp1)) {
 trace_nvme_err_invalid_prp();
 return NVME_INVALID_FIELD | NVME_DNR;
-} else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
-   prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
-qsg->nsg = 0;
-qemu_iovec_init(iov, num_prps);
-qemu_iovec_add(iov, (void *)&n->cmbuf[prp1 - n->ctrl_mem.addr], 
trans_len);
-} else {
-pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
-qemu_sglist_add(qsg, prp1, trans_len);
 }
+
+if (nvme_addr_is_cmb(n, prp1)) {
+req->is_cmb = true;
+}
+
+pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
+qemu_sglist_add(qsg, prp1, trans_len);
+
 len -= trans_len;
 if (len) {
 if (unlikely(!prp2)) {
 trace_nvme_err_invalid_prp2_missing();
+status = NVME_INVALID_FIELD | NVME_DNR;
 goto unmap;
 }
+
 if (len > n->page_size) {
 uint64_t prp_list[n->max_prp_ents];
 uint32_t nents, prp_trans;
 int i = 0;
 
+if (nvme_addr_is_cmb(n, prp2)) {
+prp_list_in_cmb = true;
+}
+
 nents = (len + n->page_size - 1) >> n->page_bits;
 prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
+nvme_addr_read(n, prp2, (void *) prp_list, prp_trans);
 while (len != 0) {
+bool addr_is_cmb;
 uint64_t prp_ent = le64_to_cpu(prp_list[i]);
 
 if (i == n->max_prp_ents - 1 && len > n->page_size) {
 if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
 trace_nvme_err_invalid_prplist_ent(prp_ent);
+status = NVME_INVALID_FIELD | NVME_DNR;
+goto unmap;
+}
+
+addr_is_cmb = nvme_addr_is_cmb(n, prp_ent);
+if ((prp_list_in_cmb && !addr_is_cmb) ||
+(!prp_list_in_cmb && addr_is_cmb)) {
+status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
 goto unmap;
 }
 
 i = 0;
 nents = (len + n->page_size - 1) >> n->page_bits;
 prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-nvme_addr_read(n, prp_ent, (void *)prp_list,
-prp_trans);
+nvme_addr_read(n, prp_ent, (void *) prp_list, prp_trans);
 prp_ent = le64_to_cpu(prp_list[i]);
 }
 
 if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
 trace_nvme_err_invalid_prplist_ent(prp_ent);
+status = NVME_INVALID_FIELD | NVME_DNR;
 goto unmap;
 }
 
-trans_len = MIN(len, n->page_size);
-if (qsg->nsg){
-qemu_sglist_add(qsg, prp_ent, trans_len);
-} else {
-qemu_iovec_add(iov, (void *)&n->cmbuf[prp_ent - 
n->ctrl_mem.addr], trans_len);
+addr_is_cmb = nvme_addr_is_cmb(n, prp_ent)

[PATCH v2 15/20] nvme: add support for scatter gather lists

2019-10-15 Thread Klaus Jensen

For now, support the Data Block, Segment and Last Segment descriptor
types.

See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").

Signed-off-by: Klaus Jensen 
---
 block/nvme.c  |  18 +-
 hw/block/nvme.c   | 380 --
 hw/block/trace-events |   3 +
 include/block/nvme.h  |  62 ++-
 4 files changed, 398 insertions(+), 65 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 5be3a39b632e..8825c19c72c2 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -440,7 +440,7 @@ static void nvme_identify(BlockDriverState *bs, int 
namespace, Error **errp)
 error_setg(errp, "Cannot map buffer for DMA");
 goto out;
 }
-cmd.prp1 = cpu_to_le64(iova);
+cmd.dptr.prp.prp1 = cpu_to_le64(iova);
 
 if (nvme_cmd_sync(bs, s->queues[0], &cmd)) {
 error_setg(errp, "Failed to identify controller");
@@ -529,7 +529,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
**errp)
 }
 cmd = (NvmeCmd) {
 .opcode = NVME_ADM_CMD_CREATE_CQ,
-.prp1 = cpu_to_le64(q->cq.iova),
+.dptr.prp.prp1 = cpu_to_le64(q->cq.iova),
 .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
 .cdw11 = cpu_to_le32(0x3),
 };
@@ -540,7 +540,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
**errp)
 }
 cmd = (NvmeCmd) {
 .opcode = NVME_ADM_CMD_CREATE_SQ,
-.prp1 = cpu_to_le64(q->sq.iova),
+.dptr.prp.prp1 = cpu_to_le64(q->sq.iova),
 .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
 .cdw11 = cpu_to_le32(0x1 | (n << 16)),
 };
@@ -889,16 +889,16 @@ try_map:
 case 0:
 abort();
 case 1:
-cmd->prp1 = pagelist[0];
-cmd->prp2 = 0;
+cmd->dptr.prp.prp1 = pagelist[0];
+cmd->dptr.prp.prp2 = 0;
 break;
 case 2:
-cmd->prp1 = pagelist[0];
-cmd->prp2 = pagelist[1];
+cmd->dptr.prp.prp1 = pagelist[0];
+cmd->dptr.prp.prp2 = pagelist[1];
 break;
 default:
-cmd->prp1 = pagelist[0];
-cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
+cmd->dptr.prp.prp1 = pagelist[0];
+cmd->dptr.prp.prp2 = cpu_to_le64(req->prp_list_iova + 
sizeof(uint64_t));
 break;
 }
 trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index f4b9bd36a04e..0a5cd079df9a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -296,6 +296,198 @@ unmap:
 return status;
 }
 
+static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg,
+NvmeSglDescriptor *segment, uint64_t nsgld, uint32_t *len,
+NvmeRequest *req)
+{
+dma_addr_t addr, trans_len;
+
+for (int i = 0; i < nsgld; i++) {
+if (NVME_SGL_TYPE(segment[i].type) != SGL_DESCR_TYPE_DATA_BLOCK) {
+trace_nvme_err_invalid_sgl_descriptor(req->cid,
+NVME_SGL_TYPE(segment[i].type));
+return NVME_SGL_DESCRIPTOR_TYPE_INVALID | NVME_DNR;
+}
+
+if (*len == 0) {
+if (!NVME_CTRL_SGLS_EXCESS_LENGTH(n->id_ctrl.sgls)) {
+trace_nvme_err_invalid_sgl_excess_length(req->cid);
+return NVME_DATA_SGL_LENGTH_INVALID | NVME_DNR;
+}
+
+break;
+}
+
+addr = le64_to_cpu(segment[i].addr);
+trans_len = MIN(*len, le64_to_cpu(segment[i].len));
+
+if (nvme_addr_is_cmb(n, addr)) {
+/*
+ * All data and metadata, if any, associated with a particular
+ * command shall be located in either the CMB or host memory. Thus,
+ * if an address if found to be in the CMB and we have already
+ * mapped data that is in host memory, the use is invalid.
+ */
+if (!nvme_req_is_cmb(req) && qsg->size) {
+return NVME_INVALID_USE_OF_CMB | NVME_DNR;
+}
+
+nvme_req_set_cmb(req);
+} else {
+/*
+ * Similarly, if the address does not reference the CMB, but we
+ * have already established that the request has data or metadata
+ * in the CMB, the use is invalid.
+ */
+if (nvme_req_is_cmb(req)) {
+return NVME_INVALID_USE_OF_CMB | NVME_DNR;
+}
+}
+
+qemu_sglist_add(qsg, addr, trans_len);
+
+*len -= trans_len;
+}
+
+return NVME_SUCCESS;
+}
+
+static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg,
+NvmeSglDescriptor sgl, uint32_t len, NvmeRequest *req)
+{
+const int MAX_NSGLD = 256;
+
+NvmeSglDescriptor segment[MAX_NSGLD];
+uint64_t nsgld;
+uint16_t status;
+bool sgl_in_cmb = false;
+hwaddr addr = le64_to_cpu(sgl.addr);
+
+trace_nvme_map_sgl(req->cid, NVME_SGL_TYPE(sgl.type), req->nlb, len);
+
+pci_dma_sglist_init(qsg, &n->parent_obj, 1);
+
+/*
+ * If the entire transfer can be described with

[PATCH v2 16/20] nvme: support multiple namespaces

2019-10-15 Thread Klaus Jensen

This adds support for multiple namespaces by introducing a new 'nvme-ns'
device model. The nvme device creates a bus named from the device name
('id'). The nvme-ns devices then connect to this and registers
themselves with the nvme device.

This changes how an nvme device is created. Example with two namespaces:

  -drive file=nvme0n1.img,if=none,id=disk1
  -drive file=nvme0n2.img,if=none,id=disk2
  -device nvme,serial=deadbeef,id=nvme0
  -device nvme-ns,drive=disk1,bus=nvme0,nsid=1
  -device nvme-ns,drive=disk2,bus=nvme0,nsid=2

The drive property is kept on the nvme device to keep the change
backward compatible, but the property is now optional. Specifying a
drive for the nvme device will always create the namespace with nsid 1.

Signed-off-by: Klaus Jensen 
Signed-off-by: Klaus Jensen 
---
 hw/block/Makefile.objs |   2 +-
 hw/block/nvme-ns.c | 139 +++
 hw/block/nvme-ns.h |  58 +++
 hw/block/nvme.c| 212 +
 hw/block/nvme.h|  51 +-
 hw/block/trace-events  |   5 +-
 6 files changed, 352 insertions(+), 115 deletions(-)
 create mode 100644 hw/block/nvme-ns.c
 create mode 100644 hw/block/nvme-ns.h

diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index f5f643f0cc06..d44a2f4b780d 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -7,7 +7,7 @@ common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o
 common-obj-$(CONFIG_XEN) += xen-block.o
 common-obj-$(CONFIG_ECC) += ecc.o
 common-obj-$(CONFIG_ONENAND) += onenand.o
-common-obj-$(CONFIG_NVME_PCI) += nvme.o
+common-obj-$(CONFIG_NVME_PCI) += nvme.o nvme-ns.o
 
 obj-$(CONFIG_SH4) += tc58128.o
 
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
new file mode 100644
index ..aa76bb63ef45
--- /dev/null
+++ b/hw/block/nvme-ns.c
@@ -0,0 +1,139 @@
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/cutils.h"
+#include "qemu/log.h"
+#include "hw/block/block.h"
+#include "hw/pci/msix.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/block-backend.h"
+#include "qapi/error.h"
+
+#include "hw/qdev-properties.h"
+#include "hw/qdev-core.h"
+
+#include "nvme.h"
+#include "nvme-ns.h"
+
+static int nvme_ns_init(NvmeNamespace *ns)
+{
+NvmeIdNs *id_ns = &ns->id_ns;
+
+id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
+id_ns->nuse = id_ns->ncap = id_ns->nsze =
+cpu_to_le64(nvme_ns_nlbas(ns));
+
+return 0;
+}
+
+static int nvme_ns_init_blk(NvmeNamespace *ns, NvmeIdCtrl *id, Error **errp)
+{
+blkconf_blocksizes(&ns->conf);
+
+if (!blkconf_apply_backend_options(&ns->conf,
+blk_is_read_only(ns->conf.blk), false, errp)) {
+return 1;
+}
+
+ns->size = blk_getlength(ns->conf.blk);
+if (ns->size < 0) {
+error_setg_errno(errp, -ns->size, "blk_getlength");
+return 1;
+}
+
+if (!blk_enable_write_cache(ns->conf.blk)) {
+id->vwc = 0;
+}
+
+return 0;
+}
+
+static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
+{
+if (!ns->conf.blk) {
+error_setg(errp, "block backend not configured");
+return 1;
+}
+
+return 0;
+}
+
+int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
+{
+Error *local_err = NULL;
+
+if (nvme_ns_check_constraints(ns, &local_err)) {
+error_propagate_prepend(errp, local_err,
+"nvme_ns_check_constraints: ");
+return 1;
+}
+
+if (nvme_ns_init_blk(ns, &n->id_ctrl, &local_err)) {
+error_propagate_prepend(errp, local_err, "nvme_ns_init_blk: ");
+return 1;
+}
+
+nvme_ns_init(ns);
+if (nvme_register_namespace(n, ns, &local_err)) {
+error_propagate_prepend(errp, local_err, "nvme_register_namespace: ");
+return 1;
+}
+
+return 0;
+}
+
+static void nvme_ns_realize(DeviceState *dev, Error **errp)
+{
+NvmeNamespace *ns = NVME_NS(dev);
+BusState *s = qdev_get_parent_bus(dev);
+NvmeCtrl *n = NVME(s->parent);
+Error *local_err = NULL;
+
+if (nvme_ns_setup(n, ns, &local_err)) {
+error_propagate_prepend(errp, local_err, "nvme_ns_setup: ");
+return;
+}
+}
+
+static Property nvme_ns_props[] = {
+DEFINE_BLOCK_PROPERTIES(NvmeNamespace, conf),
+DEFINE_NVME_NS_PROPERTIES(NvmeNamespace, params),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void nvme_ns_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+
+dc->bus_type = TYPE_NVME_BUS;
+dc->realize = nvme_ns_realize;
+dc->props = nvme_ns_props;
+dc->desc = "virtual nvme namespace";
+}
+
+static void nvme_ns_instance_init(Object *obj)
+{
+NvmeNamespace *ns = NVME_NS(obj);
+char *bootindex = g_strdup_printf("/namespace@%d,0", ns->params.nsid);
+
+device_add_bootindex_property(obj, &ns->conf.bootindex, "bootindex",
+bootindex, DEVICE(obj), &error_abort);
+
+g_free(bootindex);
+}
+
+static const TypeI

Re: [PATCH v5 1/9] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-10-15 Thread Andrew Jones

On Tue, Oct 15, 2019 at 10:59:16AM +0100, Beata Michalska wrote:
> On Tue, 1 Oct 2019 at 14:04, Andrew Jones  wrote:
> > +
> > +obj = object_new(object_class_get_name(oc));
> > +
> > +if (qdict_in) {
> > +Visitor *visitor;
> > +Error *err = NULL;
> > +
> > +visitor = qobject_input_visitor_new(model->props);
> > +visit_start_struct(visitor, NULL, NULL, 0, &err);
> > +if (err) {
> > +object_unref(obj);
> 
> Shouldn't we free the 'visitor' here as well ?

Yes. Good catch. So we also need to fix
target/s390x/cpu_models.c:cpu_model_from_info(), which has the same
construction (the construction from which I derived this)

> 
> > +error_propagate(errp, err);
> > +return NULL;
> > +}
> > +

What about the rest of the patch? With that fixed for v6 can I
add your r-b?

Thanks,
drew

Re: [PULL 0/1] Block patches

2019-10-15 Thread Peter Maydell

On Mon, 14 Oct 2019 at 09:52, Stefan Hajnoczi  wrote:
>
> The following changes since commit 98b2e3c9ab3abfe476a2b02f8f51813edb90e72d:
>
>   Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' 
> into staging (2019-10-08 16:08:35 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to 69de48445a0d6169f1e2a6c5bfab994e1c810e33:
>
>   test-bdrv-drain: fix iothread_join() hang (2019-10-14 09:48:01 +0100)
>
> 
> Pull request
>
> 
>
> Stefan Hajnoczi (1):
>   test-bdrv-drain: fix iothread_join() hang
>

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM

Re: [PATCH 2/2] apic: Use 32bit APIC ID for migration instance ID

2019-10-15 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Tue, Oct 15, 2019 at 10:22:18AM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > Migration is silently broken now with x2apic config like this:
> > > 
> > >  -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
> > >  -device intel-iommu,intremap=on,eim=on
> > > 
> > > After migration, the guest kernel could hang at anything, due to
> > > x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> > > any operations related to x2apic could be broken then (e.g., RDMSR on
> > > x2apic MSRs could fail because KVM would think that the vcpu hasn't
> > > enabled x2apic at all).
> > > 
> > > The issue is that the x2apic bit was never applied correctly for vcpus
> > > whose ID > 255 when migrate completes, and that's because when we
> > > migrate APIC we use the APICCommonState.id as instance ID of the
> > > migration stream, while that's too short for x2apic.
> > > 
> > > Let's use the newly introduced initial_apic_id for that.
> > 
> > I'd like to understand a few things:
> >a) Does this change the instance ID of existing APICs on the
> > migration stream? 
> >  a1) Ever for <256 CPUs?
> 
> No.
> 
> >  a2) For >=256 CPUs?
> 
> Yes.
> 
> > 
> > [Because changing the ID breaks migration]
> 
> But if we don't change it, the stream is broken too. :)
> 
> Then the destination VM will receive e.g. two apic_id==0 instances (I
> think the apic_id==256 instance will wrongly overwrite the apic_id==0
> one), while the vcpu with apic_id==256 will use the initial apic
> values.
> 
> So IMHO we should still fix this, even if it changes the migration
> stream.  At least we start to make it right.

Yes, that makes sense.
It deserves a doc mention somewhere.

> > 
> >   b) Is the instance ID constant - I can see it's a property on the
> >  APIC, but I cna't see who sets it
> 
> For each vcpu, I think yes it should be a constant as long as the
> topology is the same.  This is how I understand it to be set:
> 
> (1) In pc_cpus_init(), we init these:
> 
> possible_cpus = mc->possible_cpu_arch_ids(ms);
> for (i = 0; i < ms->smp.cpus; i++) {
> pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal);
> }
> 
> (2) In x86_cpu_apic_create(), we apply the apic_id to "id" property:
> 
> qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id);

OK, that's fine - as long as it's constaatn and not guest influenced.

> > 
> >   c) In the case where it fails, did we end up registering two
> >  devices with the same name and instance ID?  If so, is it worth
> >  adding a check that would error if we tried?
> 
> Sounds doable.
> 

Great,

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v5 02/55] trace: add mmu_index to mem_info

2019-10-15 Thread Alex Bennée



Richard Henderson  writes:

> On 10/14/19 3:48 AM, Alex Bennée wrote:
>> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
>> index defc8d5929..1210d8f243 100644
>> --- a/accel/tcg/cputlb.c
>> +++ b/accel/tcg/cputlb.c
>> @@ -1811,6 +1811,7 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong 
>> addr, uint64_t val,
>>  #define ATOMIC_MMU_DECLS
>>  #define ATOMIC_MMU_LOOKUP atomic_mmu_lookup(env, addr, oi, retaddr)
>>  #define ATOMIC_MMU_CLEANUP
>> +#define ATOMIC_MMU_IDX oi
>
> That is not the mmu_idx.  That's the whole mmu_idx + MemOp combo.
> Use get_mmuidx(oi).

Oops I missed that from last time. Fixing it for real now!

>
>> --- a/accel/tcg/user-exec.c
>> +++ b/accel/tcg/user-exec.c
>> @@ -751,6 +751,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, 
>> target_ulong addr,
>>  #define ATOMIC_MMU_DECLS do {} while (0)
>>  #define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, DATA_SIZE, GETPC())
>>  #define ATOMIC_MMU_CLEANUP do { clear_helper_retaddr(); } while (0)
>> +#define ATOMIC_MMU_IDX 0
>
> MMU_USER_IDX.  Best to be consistent, even if this is user-only and it isn't
> really used.
>
>> --- a/include/exec/cpu_ldst_useronly_template.h
>> +++ b/include/exec/cpu_ldst_useronly_template.h
>> @@ -73,7 +73,7 @@ glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, 
>> abi_ptr ptr)
>>  #else
>>  trace_guest_mem_before_exec(
>>  env_cpu(env), ptr,
>> -trace_mem_build_info(SHIFT, false, MO_TE, false));
>> +trace_mem_build_info(SHIFT, false, MO_TE, false, 0));
>
> Likewise for the other uses in this file.

Fixed.

>
>
> r~


--
Alex Bennée

[PATCH v9 01/15] hw/virtio: Factorize virtio-mmio headers

2019-10-15 Thread Sergio Lopez

Put QOM and main struct definition in a separate header file, so it
can be accessed from other components.

Signed-off-by: Sergio Lopez 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael S. Tsirkin 
---
 include/hw/virtio/virtio-mmio.h | 73 +
 hw/virtio/virtio-mmio.c | 48 +-
 2 files changed, 74 insertions(+), 47 deletions(-)
 create mode 100644 include/hw/virtio/virtio-mmio.h

diff --git a/include/hw/virtio/virtio-mmio.h b/include/hw/virtio/virtio-mmio.h
new file mode 100644
index 00..7dbfd03dcf
--- /dev/null
+++ b/include/hw/virtio/virtio-mmio.h
@@ -0,0 +1,73 @@
+/*
+ * Virtio MMIO bindings
+ *
+ * Copyright (c) 2011 Linaro Limited
+ *
+ * Author:
+ *  Peter Maydell 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#ifndef HW_VIRTIO_MMIO_H
+#define HW_VIRTIO_MMIO_H
+
+#include "hw/virtio/virtio-bus.h"
+
+/* QOM macros */
+/* virtio-mmio-bus */
+#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
+#define VIRTIO_MMIO_BUS(obj) \
+OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
+#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
+OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
+#define VIRTIO_MMIO_BUS_CLASS(klass) \
+OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
+
+/* virtio-mmio */
+#define TYPE_VIRTIO_MMIO "virtio-mmio"
+#define VIRTIO_MMIO(obj) \
+OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
+
+#define VIRT_MAGIC 0x74726976 /* 'virt' */
+#define VIRT_VERSION 2
+#define VIRT_VERSION_LEGACY 1
+#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
+
+typedef struct VirtIOMMIOQueue {
+uint16_t num;
+bool enabled;
+uint32_t desc[2];
+uint32_t avail[2];
+uint32_t used[2];
+} VirtIOMMIOQueue;
+
+typedef struct {
+/* Generic */
+SysBusDevice parent_obj;
+MemoryRegion iomem;
+qemu_irq irq;
+bool legacy;
+/* Guest accessible state needing migration and reset */
+uint32_t host_features_sel;
+uint32_t guest_features_sel;
+uint32_t guest_page_shift;
+/* virtio-bus */
+VirtioBusState bus;
+bool format_transport_address;
+/* Fields only used for non-legacy (v2) devices */
+uint32_t guest_features[2];
+VirtIOMMIOQueue vqs[VIRTIO_QUEUE_MAX];
+} VirtIOMMIOProxy;
+
+#endif
diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 3d5ca0f667..94d934c44b 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -29,57 +29,11 @@
 #include "qemu/host-utils.h"
 #include "qemu/module.h"
 #include "sysemu/kvm.h"
-#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-mmio.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "trace.h"
 
-/* QOM macros */
-/* virtio-mmio-bus */
-#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
-#define VIRTIO_MMIO_BUS(obj) \
-OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
-OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_CLASS(klass) \
-OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
-
-/* virtio-mmio */
-#define TYPE_VIRTIO_MMIO "virtio-mmio"
-#define VIRTIO_MMIO(obj) \
-OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
-
-#define VIRT_MAGIC 0x74726976 /* 'virt' */
-#define VIRT_VERSION 2
-#define VIRT_VERSION_LEGACY 1
-#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
-
-typedef struct VirtIOMMIOQueue {
-uint16_t num;
-bool enabled;
-uint32_t desc[2];
-uint32_t avail[2];
-uint32_t used[2];
-} VirtIOMMIOQueue;
-
-typedef struct {
-/* Generic */
-SysBusDevice parent_obj;
-MemoryRegion iomem;
-qemu_irq irq;
-bool legacy;
-/* Guest accessible state needing migration and reset */
-uint32_t host_features_sel;
-uint32_t guest_features_sel;
-uint32_t guest_page_shift;
-/* virtio-bus */
-VirtioBusState bus;
-bool format_transport_address;
-/* Fields only used for non-legacy (v2) devices */
-uint32_t guest_features[2];
-VirtIOMMIOQueue vqs[VIRTIO_QUEUE_MAX];
-} VirtIOMMIOProxy;
-
 static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
 {
 return kvm_eventfds_enabled();
-- 
2.21.0

[PATCH v9 00/15] Introduce the microvm machine type

2019-10-15 Thread Sergio Lopez

Microvm is a machine type inspired by Firecracker and constructed
after the its machine model.

It's a minimalist machine type without PCI nor ACPI support, designed
for short-lived guests. Microvm also establishes a baseline for
benchmarking and optimizing both QEMU and guest operating systems,
since it is optimized for both boot time and footprint.

---

Changelog
v9:
 - Fix a typo in "[PATCH v9 05/15] hw/i386/pc: avoid an assignment in
   if condition in x86_load_linux()" (Philippe Mathieu-Daudé)
 - Replace qemu_strtol() with qemu_strtoui() to preserve the original
   type of video_mode (Philippe Mathieu-Daudé)

v8:
 - Split "[PATCH v7 03/12] hw/i386/pc: fix code style issues on
   functions that will be moved out" into four different patches
   (Philippe Mathieu-Daudé)

v7:
 - Fix code style issues on already present code touched by this patch
   series (Michael S. Tsirkin, Philippe Mathieu-Daudé)
 - Add new files to MAINTAINERS (Michael S. Tsirkin, Philippe
   Mathieu-Daudé)
 - Allow starting a microvm machine without a kernel image, fixing
   "qom-test" (Michael S. Tsirkin)
 - Change "bios-microvm.bin" mode to 0644 (Stefano Garzarella)
 - Remove unneeded "hw/i386/pc.h" include from x86.c (Stefano
   Garzarella)

v6:
 - Some style fixes (Philippe Mathieu-Daudé)
 - Fix a documentation bug stating that LAPIC was in userspace (Paolo
   Bonzini)
 - Update Xen HVM code after X86MachineState introduction (Philippe
   Mathieu-Daudé)
 - Rename header guard from QEMU_VIRTIO_MMIO_H to HW_VIRTIO_MMIO_H
   (Philippe Mathieu-Daudé)

v5:
 - Drop unneeded "[PATCH v4 2/8] hw/i386: Factorize e820 related
   functions" (Philippe Mathieu-Daudé)
 - Drop unneeded "[PATCH v4 1/8] hw/i386: Factorize PVH related
   functions" (Stefano Garzarella)
 - Split X86MachineState introduction into smaller patches (Philippe
   Mathieu-Daudé)
 - Change option-roms to x-option-roms and kernel-cmdline to
   auto-kernel-cmdline (Paolo Bonzini)
 - Make i8259 PIT and i8254 PIC optional (Paolo Bonzini)
 - Some fixes to the documentation (Paolo Bonzini)
 - Switch documentation format from txt to rst (Peter Maydell)
 - Move NMI interface to X86_MACHINE (Philippe Mathieu-Daudé, Paolo
   Bonzini)

v4:
 - This is a complete rewrite of the whole patchset, with a focus on
   reusing as much existing code as possible to ease the maintenance burden
   and making the machine type as compatible as possible by default. As
   a result, the number of lines dedicated specifically to microvm is
   383 (code lines measured by "cloc") and, with the default
   configuration, it's now able to boot both PVH ELF images and
   bzImages with either SeaBIOS or qboot.

v3:
  - Add initrd support (thanks Stefano).

v2:
  - Drop "[PATCH 1/4] hw/i386: Factorize CPU routine".
  - Simplify machine definition (thanks Eduardo).
  - Remove use of unneeded NUMA-related callbacks (thanks Eduardo).
  - Add a patch to factorize PVH-related functions.
  - Replace use of Linux's Zero Page with PVH (thanks Maran and Paolo).

---

Sergio Lopez (15):
  hw/virtio: Factorize virtio-mmio headers
  hw/i386/pc: rename functions shared with non-PC machines
  hw/i386/pc: fix code style issues on functions that will be moved out
  hw/i386/pc: replace use of strtol with qemu_strtol in x86_load_linux()
  hw/i386/pc: avoid an assignment in if condition in x86_load_linux()
  hw/i386/pc: remove commented out code from x86_load_linux()
  hw/i386/pc: move shared x86 functions to x86.c and export them
  hw/i386: split PCMachineState deriving X86MachineState from it
  hw/i386: make x86.c independent from PCMachineState
  fw_cfg: add "modify" functions for all types
  hw/intc/apic: reject pic ints if isa_pic == NULL
  roms: add microvm-bios (qboot) as binary and git submodule
  docs/microvm.rst: document the new microvm machine type
  hw/i386: Introduce the microvm machine type
  MAINTAINERS: add microvm related files

 docs/microvm.rst |  98 
 default-configs/i386-softmmu.mak |   1 +
 include/hw/i386/microvm.h|  83 
 include/hw/i386/pc.h |  28 +-
 include/hw/i386/x86.h|  96 
 include/hw/nvram/fw_cfg.h|  42 ++
 include/hw/virtio/virtio-mmio.h  |  73 +++
 hw/acpi/cpu_hotplug.c|  10 +-
 hw/i386/acpi-build.c |  29 +-
 hw/i386/amd_iommu.c  |   3 +-
 hw/i386/intel_iommu.c|   3 +-
 hw/i386/microvm.c| 572 ++
 hw/i386/pc.c | 781 +++---
 hw/i386/pc_piix.c|  46 +-
 hw/i386/pc_q35.c |  38 +-
 hw/i386/pc_sysfw.c   |  60 +--
 hw/i386/x86.c| 795 +++
 hw/i386/xen/xen-hvm.c|  28 +-
 hw/intc/apic.c   |   2 +-
 hw/intc/ioapic.c |   2 +-
 hw/nvram/fw_cfg.c|  29 ++
 hw/virtio/virtio-mmio.c  |  48 +-
 .gitmodules  |   3 +
 MAINTAINERS

[PATCH v9 03/15] hw/i386/pc: fix code style issues on functions that will be moved out

2019-10-15 Thread Sergio Lopez

Fix code style issues detected by checkpatch.pl on functions that will
be moved out to x86.c.

Signed-off-by: Sergio Lopez 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/i386/pc.c | 53 
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index fd08c6704b..77e86bfc3d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -866,7 +866,8 @@ static void handle_a20_line_change(void *opaque, int irq, 
int level)
 x86_cpu_set_a20(cpu, level);
 }
 
-/* Calculates initial APIC ID for a specific CPU index
+/*
+ * Calculates initial APIC ID for a specific CPU index
  *
  * Currently we need to be able to calculate the APIC ID from the CPU index
  * alone (without requiring a CPU object), as the QEMU<->Seabios interfaces 
have
@@ -1039,7 +1040,7 @@ static void x86_load_linux(PCMachineState *pcms,
 const char *kernel_cmdline = machine->kernel_cmdline;
 
 /* Align to 16 bytes as a paranoia measure */
-cmdline_size = (strlen(kernel_cmdline)+16) & ~15;
+cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
 
 /* load the kernel header */
 f = fopen(kernel_filename, "rb");
@@ -1055,8 +1056,8 @@ static void x86_load_linux(PCMachineState *pcms,
 #if 0
 fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
 #endif
-if (ldl_p(header+0x202) == 0x53726448) {
-protocol = lduw_p(header+0x206);
+if (ldl_p(header + 0x202) == 0x53726448) {
+protocol = lduw_p(header + 0x206);
 } else {
 /*
  * This could be a multiboot kernel. If it is, let's stop treating it
@@ -1158,7 +1159,7 @@ static void x86_load_linux(PCMachineState *pcms,
 
 /* highest address for loading the initrd */
 if (protocol >= 0x20c &&
-lduw_p(header+0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
+lduw_p(header + 0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
 /*
  * Linux has supported initrd up to 4 GB for a very long time (2007,
  * long before XLF_CAN_BE_LOADED_ABOVE_4G which was added in 2013),
@@ -1177,7 +1178,7 @@ static void x86_load_linux(PCMachineState *pcms,
  */
 initrd_max = UINT32_MAX;
 } else if (protocol >= 0x203) {
-initrd_max = ldl_p(header+0x22c);
+initrd_max = ldl_p(header + 0x22c);
 } else {
 initrd_max = 0x37ff;
 }
@@ -1187,14 +1188,14 @@ static void x86_load_linux(PCMachineState *pcms,
 }
 
 fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
-fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline)+1);
+fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) + 1);
 fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
 
 if (protocol >= 0x202) {
-stl_p(header+0x228, cmdline_addr);
+stl_p(header + 0x228, cmdline_addr);
 } else {
-stw_p(header+0x20, 0xA33F);
-stw_p(header+0x22, cmdline_addr-real_addr);
+stw_p(header + 0x20, 0xA33F);
+stw_p(header + 0x22, cmdline_addr - real_addr);
 }
 
 /* handle vga= parameter */
@@ -1212,20 +1213,22 @@ static void x86_load_linux(PCMachineState *pcms,
 } else {
 video_mode = strtol(vmode, NULL, 0);
 }
-stw_p(header+0x1fa, video_mode);
+stw_p(header + 0x1fa, video_mode);
 }
 
 /* loader type */
-/* High nybble = B reserved for QEMU; low nybble is revision number.
-   If this code is substantially changed, you may want to consider
-   incrementing the revision. */
+/*
+ * High nybble = B reserved for QEMU; low nybble is revision number.
+ * If this code is substantially changed, you may want to consider
+ * incrementing the revision.
+ */
 if (protocol >= 0x200) {
 header[0x210] = 0xB0;
 }
 /* heap */
 if (protocol >= 0x201) {
-header[0x211] |= 0x80; /* CAN_USE_HEAP */
-stw_p(header+0x224, cmdline_addr-real_addr-0x200);
+header[0x211] |= 0x80; /* CAN_USE_HEAP */
+stw_p(header + 0x224, cmdline_addr - real_addr - 0x200);
 }
 
 /* load initrd */
@@ -1257,14 +1260,14 @@ static void x86_load_linux(PCMachineState *pcms,
 exit(1);
 }
 
-initrd_addr = (initrd_max-initrd_size) & ~4095;
+initrd_addr = (initrd_max - initrd_size) & ~4095;
 
 fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
 fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
 
-stl_p(header+0x218, initrd_addr);
-stl_p(header+0x21c, initrd_size);
+stl_p(header + 0x218, initrd_addr);
+stl_p(header + 0x21c, initrd_size);
 }
 
 /* load kernel and setup */
@@ -1272,7 +1275,7 @@ static void x86_load_linux(PCMachineState *pcms,
 if (setup_size == 0) {
 setup_size = 4;
 }
-setup_size = (setup_size+1)*512;
+setup_size = (setup_size + 1) * 512;
 i

[PATCH v9 06/15] hw/i386/pc: remove commented out code from x86_load_linux()

2019-10-15 Thread Sergio Lopez

Follow checkpatch.pl recommendation and remove commented out code from
x86_load_linux().

Signed-off-by: Sergio Lopez 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/i386/pc.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 90c2263a33..612bfe9c95 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1061,9 +1061,6 @@ static void x86_load_linux(PCMachineState *pcms,
 }
 
 /* kernel protocol version */
-#if 0
-fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
-#endif
 if (ldl_p(header + 0x202) == 0x53726448) {
 protocol = lduw_p(header + 0x206);
 } else {
@@ -1155,16 +1152,6 @@ static void x86_load_linux(PCMachineState *pcms,
 prot_addr= 0x10;
 }
 
-#if 0
-fprintf(stderr,
-"qemu: real_addr = 0x" TARGET_FMT_plx "\n"
-"qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
-"qemu: prot_addr = 0x" TARGET_FMT_plx "\n",
-real_addr,
-cmdline_addr,
-prot_addr);
-#endif
-
 /* highest address for loading the initrd */
 if (protocol >= 0x20c &&
 lduw_p(header + 0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
-- 
2.21.0

[PATCH v9 02/15] hw/i386/pc: rename functions shared with non-PC machines

2019-10-15 Thread Sergio Lopez

The following functions are named *pc* but are not PC-machine specific
but generic to the X86 architecture, rename them:

  load_linux -> x86_load_linux
  pc_new_cpu -> x86_new_cpu
  pc_cpus_init   -> x86_cpus_init
  pc_cpu_index_to_props  -> x86_cpu_index_to_props
  pc_get_default_cpu_node_id -> x86_get_default_cpu_node_id
  pc_possible_cpu_arch_ids   -> x86_possible_cpu_arch_ids
  old_pc_system_rom_init -> x86_system_rom_init

Signed-off-by: Sergio Lopez 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefano Garzarella 
Reviewed-by: Michael S. Tsirkin 
---
 include/hw/i386/pc.h |  2 +-
 hw/i386/pc.c | 28 ++--
 hw/i386/pc_piix.c|  2 +-
 hw/i386/pc_q35.c |  2 +-
 hw/i386/pc_sysfw.c   |  6 +++---
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 6df4f4b6fb..d12f42e9e5 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -195,7 +195,7 @@ bool pc_machine_is_smm_enabled(PCMachineState *pcms);
 void pc_register_ferr_irq(qemu_irq irq);
 void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
 
-void pc_cpus_init(PCMachineState *pcms);
+void x86_cpus_init(PCMachineState *pcms);
 void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
 void pc_smp_parse(MachineState *ms, QemuOpts *opts);
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index bcda50efcc..fd08c6704b 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1019,8 +1019,8 @@ static bool load_elfboot(const char *kernel_filename,
 return true;
 }
 
-static void load_linux(PCMachineState *pcms,
-   FWCfgState *fw_cfg)
+static void x86_load_linux(PCMachineState *pcms,
+   FWCfgState *fw_cfg)
 {
 uint16_t protocol;
 int setup_size, kernel_size, cmdline_size;
@@ -1374,7 +1374,7 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
level)
 }
 }
 
-static void pc_new_cpu(PCMachineState *pcms, int64_t apic_id, Error **errp)
+static void x86_cpu_new(PCMachineState *pcms, int64_t apic_id, Error **errp)
 {
 Object *cpu = NULL;
 Error *local_err = NULL;
@@ -1490,14 +1490,14 @@ void pc_hot_add_cpu(MachineState *ms, const int64_t id, 
Error **errp)
 return;
 }
 
-pc_new_cpu(PC_MACHINE(ms), apic_id, &local_err);
+x86_cpu_new(PC_MACHINE(ms), apic_id, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
 }
 }
 
-void pc_cpus_init(PCMachineState *pcms)
+void x86_cpus_init(PCMachineState *pcms)
 {
 int i;
 const CPUArchIdList *possible_cpus;
@@ -1518,7 +1518,7 @@ void pc_cpus_init(PCMachineState *pcms)
  ms->smp.max_cpus - 1) + 1;
 possible_cpus = mc->possible_cpu_arch_ids(ms);
 for (i = 0; i < ms->smp.cpus; i++) {
-pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal);
+x86_cpu_new(pcms, possible_cpus->cpus[i].arch_id, &error_fatal);
 }
 }
 
@@ -1621,7 +1621,7 @@ void xen_load_linux(PCMachineState *pcms)
 fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
 rom_set_fw(fw_cfg);
 
-load_linux(pcms, fw_cfg);
+x86_load_linux(pcms, fw_cfg);
 for (i = 0; i < nb_option_roms; i++) {
 assert(!strcmp(option_rom[i].name, "linuxboot.bin") ||
!strcmp(option_rom[i].name, "linuxboot_dma.bin") ||
@@ -1756,7 +1756,7 @@ void pc_memory_init(PCMachineState *pcms,
 }
 
 if (linux_boot) {
-load_linux(pcms, fw_cfg);
+x86_load_linux(pcms, fw_cfg);
 }
 
 for (i = 0; i < nb_option_roms; i++) {
@@ -2678,7 +2678,7 @@ static void pc_machine_wakeup(MachineState *machine)
 }
 
 static CpuInstanceProperties
-pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
+x86_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
 const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
@@ -2687,7 +2687,7 @@ pc_cpu_index_to_props(MachineState *ms, unsigned 
cpu_index)
 return possible_cpus->cpus[cpu_index].props;
 }
 
-static int64_t pc_get_default_cpu_node_id(const MachineState *ms, int idx)
+static int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
X86CPUTopoInfo topo;
PCMachineState *pcms = PC_MACHINE(ms);
@@ -2699,7 +2699,7 @@ static int64_t pc_get_default_cpu_node_id(const 
MachineState *ms, int idx)
return topo.pkg_id % ms->numa_state->num_nodes;
 }
 
-static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
+static const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
 {
 PCMachineState *pcms = PC_MACHINE(ms);
 int i;
@@ -2801,9 +2801,9 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = pc_get_hotplug_handler;
 mc->hotplug_allowed = pc_hotplug_allowed;
-

[PATCH v9 04/15] hw/i386/pc: replace use of strtol with qemu_strtol in x86_load_linux()

2019-10-15 Thread Sergio Lopez

Follow checkpatch.pl recommendation and replace the use of strtol with
qemu_strtol in x86_load_linux().

Signed-off-by: Sergio Lopez 
---
 hw/i386/pc.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 77e86bfc3d..c8608b8007 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -68,6 +68,7 @@
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
 #include "qemu/option.h"
+#include "qemu/cutils.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/boards.h"
@@ -1202,6 +1203,7 @@ static void x86_load_linux(PCMachineState *pcms,
 vmode = strstr(kernel_cmdline, "vga=");
 if (vmode) {
 unsigned int video_mode;
+int ret;
 /* skip "vga=" */
 vmode += 4;
 if (!strncmp(vmode, "normal", 6)) {
@@ -1211,7 +1213,12 @@ static void x86_load_linux(PCMachineState *pcms,
 } else if (!strncmp(vmode, "ask", 3)) {
 video_mode = 0xfffd;
 } else {
-video_mode = strtol(vmode, NULL, 0);
+ret = qemu_strtoui(vmode, NULL, 0, &video_mode);
+if (ret != 0) {
+fprintf(stderr, "qemu: can't parse 'vga' parameter: %s\n",
+strerror(-ret));
+exit(1);
+}
 }
 stw_p(header + 0x1fa, video_mode);
 }
-- 
2.21.0

[PATCH v9 15/15] MAINTAINERS: add microvm related files

2019-10-15 Thread Sergio Lopez

Add a new "Microvm" section under "X86 Machines" with the new files
related to this machine type.

Signed-off-by: Sergio Lopez 
Reviewed-by: Michael S. Tsirkin 
---
 MAINTAINERS | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fe4dc51b08..9744f07727 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1275,6 +1275,16 @@ F: include/hw/timer/hpet.h
 F: include/hw/timer/i8254*
 F: include/hw/timer/mc146818rtc*
 
+Microvm
+M: Sergio Lopez 
+M: Paolo Bonzini 
+S: Maintained
+F: docs/microvm.rst
+F: hw/i386/microvm.c
+F: include/hw/i386/microvm.h
+F: roms/qboot
+F: pc-bios/bios-microvm.bin
+
 Machine core
 M: Eduardo Habkost 
 M: Marcel Apfelbaum 
-- 
2.21.0

[PATCH v9 07/15] hw/i386/pc: move shared x86 functions to x86.c and export them

2019-10-15 Thread Sergio Lopez

Move x86 functions that will be shared between PC and non-PC machine
types to x86.c, along with their helpers.

Signed-off-by: Sergio Lopez 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefano Garzarella 
Reviewed-by: Michael S. Tsirkin 
---
 include/hw/i386/pc.h  |   1 -
 include/hw/i386/x86.h |  35 +++
 hw/i386/pc.c  | 587 +--
 hw/i386/pc_piix.c |   1 +
 hw/i386/pc_q35.c  |   1 +
 hw/i386/pc_sysfw.c|  56 +---
 hw/i386/x86.c | 690 ++
 hw/i386/Makefile.objs |   1 +
 8 files changed, 730 insertions(+), 642 deletions(-)
 create mode 100644 include/hw/i386/x86.h
 create mode 100644 hw/i386/x86.c

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index d12f42e9e5..73e2847e87 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -195,7 +195,6 @@ bool pc_machine_is_smm_enabled(PCMachineState *pcms);
 void pc_register_ferr_irq(qemu_irq irq);
 void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
 
-void x86_cpus_init(PCMachineState *pcms);
 void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
 void pc_smp_parse(MachineState *ms, QemuOpts *opts);
 
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
new file mode 100644
index 00..71e2b6985d
--- /dev/null
+++ b/include/hw/i386/x86.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef HW_I386_X86_H
+#define HW_I386_X86_H
+
+#include "hw/boards.h"
+
+uint32_t x86_cpu_apic_id_from_index(PCMachineState *pcms,
+unsigned int cpu_index);
+void x86_cpu_new(PCMachineState *pcms, int64_t apic_id, Error **errp);
+void x86_cpus_init(PCMachineState *pcms);
+CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
+ unsigned cpu_index);
+int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx);
+const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms);
+
+void x86_bios_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw);
+
+void x86_load_linux(PCMachineState *x86ms, FWCfgState *fw_cfg);
+
+#endif
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 612bfe9c95..05de536a2b 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -24,6 +24,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/units.h"
+#include "hw/i386/x86.h"
 #include "hw/i386/pc.h"
 #include "hw/char/serial.h"
 #include "hw/char/parallel.h"
@@ -103,9 +104,6 @@
 
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
-/* Physical Address of PVH entry point read from kernel ELF NOTE */
-static size_t pvh_start_addr;
-
 GlobalProperty pc_compat_4_1[] = {};
 const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
 
@@ -867,481 +865,6 @@ static void handle_a20_line_change(void *opaque, int irq, 
int level)
 x86_cpu_set_a20(cpu, level);
 }
 
-/*
- * Calculates initial APIC ID for a specific CPU index
- *
- * Currently we need to be able to calculate the APIC ID from the CPU index
- * alone (without requiring a CPU object), as the QEMU<->Seabios interfaces 
have
- * no concept of "CPU index", and the NUMA tables on fw_cfg need the APIC ID of
- * all CPUs up to max_cpus.
- */
-static uint32_t x86_cpu_apic_id_from_index(PCMachineState *pcms,
-   unsigned int cpu_index)
-{
-MachineState *ms = MACHINE(pcms);
-PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
-uint32_t correct_id;
-static bool warned;
-
-correct_id = x86_apicid_from_cpu_idx(pcms->smp_dies, ms->smp.cores,
- ms->smp.threads, cpu_index);
-if (pcmc->compat_apic_id_mode) {
-if (cpu_index != correct_id && !warned && !qtest_enabled()) {
-error_report("APIC IDs set in compatibility mode, "
- "CPU topology won't match the configuration");
-warned = true;
-}
-return cpu_index;
-} else {
-return correct_id;
-}
-}
-
-static long get_file_size(FILE *f)
-{
-long where, size;
-
-/* XXX: on Unix systems, using fstat() probably makes more sense */
-
-where = ftell(f);
-fseek(f, 0, SEEK_END);
-size = ftell(f);
-fseek(f, where, SEEK_SET);
-
-return size;
-}
-
-struct setup_data {
-uint64_t next;
-uint32_t type;
-uint32_t len;
-

[PATCH v9 11/15] hw/intc/apic: reject pic ints if isa_pic == NULL

2019-10-15 Thread Sergio Lopez

In apic_accept_pic_intr(), reject PIC interruptions if a i8259 PIC has
not been instantiated (isa_pic == NULL).

Suggested-by: Paolo Bonzini 
Signed-off-by: Sergio Lopez 
Reviewed-by: Michael S. Tsirkin 
---
 hw/intc/apic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index bce89911dc..2a74f7b4bf 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -610,7 +610,7 @@ int apic_accept_pic_intr(DeviceState *dev)
 
 if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) == 0 ||
 (lvt0 & APIC_LVT_MASKED) == 0)
-return 1;
+return isa_pic != NULL;
 
 return 0;
 }
-- 
2.21.0

[PATCH v9 05/15] hw/i386/pc: avoid an assignment in if condition in x86_load_linux()

2019-10-15 Thread Sergio Lopez

Follow checkpatch.pl recommendation and avoid an assignment in if
condition in x86_load_linux().

Signed-off-by: Sergio Lopez 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/i386/pc.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c8608b8007..90c2263a33 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1045,7 +1045,14 @@ static void x86_load_linux(PCMachineState *pcms,
 
 /* load the kernel header */
 f = fopen(kernel_filename, "rb");
-if (!f || !(kernel_size = get_file_size(f)) ||
+if (!f) {
+fprintf(stderr, "qemu: could not open kernel file '%s': %s\n",
+kernel_filename, strerror(errno));
+exit(1);
+}
+
+kernel_size = get_file_size(f);
+if (!kernel_size ||
 fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
 MIN(ARRAY_SIZE(header), kernel_size)) {
 fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
-- 
2.21.0

[PATCH v9 08/15] hw/i386: split PCMachineState deriving X86MachineState from it

2019-10-15 Thread Sergio Lopez

Split up PCMachineState and PCMachineClass and derive X86MachineState
and X86MachineClass from them. This allows sharing code with non-PC
x86 machine types.

Signed-off-by: Sergio Lopez 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael S. Tsirkin 
---
 include/hw/i386/pc.h  |  27 +--
 include/hw/i386/x86.h |  58 +-
 hw/acpi/cpu_hotplug.c |  10 +--
 hw/i386/acpi-build.c  |  29 ---
 hw/i386/amd_iommu.c   |   3 +-
 hw/i386/intel_iommu.c |   3 +-
 hw/i386/pc.c  | 178 ++
 hw/i386/pc_piix.c |  43 +-
 hw/i386/pc_q35.c  |  35 +
 hw/i386/x86.c | 140 +
 hw/i386/xen/xen-hvm.c |  28 ---
 hw/intc/ioapic.c  |   2 +-
 12 files changed, 326 insertions(+), 230 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 73e2847e87..d2a690d05e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -8,6 +8,7 @@
 #include "hw/block/flash.h"
 #include "net/net.h"
 #include "hw/i386/ioapic.h"
+#include "hw/i386/x86.h"
 
 #include "qemu/range.h"
 #include "qemu/bitmap.h"
@@ -27,7 +28,7 @@
  */
 struct PCMachineState {
 /*< private >*/
-MachineState parent_obj;
+X86MachineState parent_obj;
 
 /*  */
 
@@ -36,16 +37,11 @@ struct PCMachineState {
 
 /* Pointers to devices and objects: */
 HotplugHandler *acpi_dev;
-ISADevice *rtc;
 PCIBus *bus;
 I2CBus *smbus;
-FWCfgState *fw_cfg;
-qemu_irq *gsi;
 PFlashCFI01 *flash[2];
-GMappedFile *initrd_mapped_file;
 
 /* Configuration options: */
-uint64_t max_ram_below_4g;
 OnOffAuto vmport;
 OnOffAuto smm;
 
@@ -54,27 +50,13 @@ struct PCMachineState {
 bool sata_enabled;
 bool pit_enabled;
 
-/* RAM information (sizes, addresses, configuration): */
-ram_addr_t below_4g_mem_size, above_4g_mem_size;
-
-/* CPU and apic information: */
-bool apic_xrupt_override;
-unsigned apic_id_limit;
-uint16_t boot_cpus;
-unsigned smp_dies;
-
 /* NUMA information: */
 uint64_t numa_nodes;
 uint64_t *node_mem;
-
-/* Address space used by IOAPIC device. All IOAPIC interrupts
- * will be translated to MSI messages in the address space. */
-AddressSpace *ioapic_as;
 };
 
 #define PC_MACHINE_ACPI_DEVICE_PROP "acpi-device"
 #define PC_MACHINE_DEVMEM_REGION_SIZE "device-memory-region-size"
-#define PC_MACHINE_MAX_RAM_BELOW_4G "max-ram-below-4g"
 #define PC_MACHINE_VMPORT   "vmport"
 #define PC_MACHINE_SMM  "smm"
 #define PC_MACHINE_SMBUS"smbus"
@@ -99,7 +81,7 @@ struct PCMachineState {
  */
 typedef struct PCMachineClass {
 /*< private >*/
-MachineClass parent_class;
+X86MachineClass parent_class;
 
 /*< public >*/
 
@@ -141,9 +123,6 @@ typedef struct PCMachineClass {
 
 /* use PVH to load kernels that support this feature */
 bool pvh_enabled;
-
-/* Enables contiguous-apic-ID mode */
-bool compat_apic_id_mode;
 } PCMachineClass;
 
 #define TYPE_PC_MACHINE "generic-pc-machine"
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 71e2b6985d..d15713e92e 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -17,7 +17,63 @@
 #ifndef HW_I386_X86_H
 #define HW_I386_X86_H
 
+#include "qemu-common.h"
+#include "exec/hwaddr.h"
+#include "qemu/notify.h"
+
 #include "hw/boards.h"
+#include "hw/nmi.h"
+
+typedef struct {
+/*< private >*/
+MachineClass parent;
+
+/*< public >*/
+
+/* Enables contiguous-apic-ID mode */
+bool compat_apic_id_mode;
+} X86MachineClass;
+
+typedef struct {
+/*< private >*/
+MachineState parent;
+
+/*< public >*/
+
+/* Pointers to devices and objects: */
+ISADevice *rtc;
+FWCfgState *fw_cfg;
+qemu_irq *gsi;
+GMappedFile *initrd_mapped_file;
+
+/* Configuration options: */
+uint64_t max_ram_below_4g;
+
+/* RAM information (sizes, addresses, configuration): */
+ram_addr_t below_4g_mem_size, above_4g_mem_size;
+
+/* CPU and apic information: */
+bool apic_xrupt_override;
+unsigned apic_id_limit;
+uint16_t boot_cpus;
+unsigned smp_dies;
+
+/*
+ * Address space used by IOAPIC device. All IOAPIC interrupts
+ * will be translated to MSI messages in the address space.
+ */
+AddressSpace *ioapic_as;
+} X86MachineState;
+
+#define X86_MACHINE_MAX_RAM_BELOW_4G "max-ram-below-4g"
+
+#define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
+#define X86_MACHINE(obj) \
+OBJECT_CHECK(X86MachineState, (obj), TYPE_X86_MACHINE)
+#define X86_MACHINE_GET_CLASS(obj) \
+OBJECT_GET_CLASS(X86MachineClass, obj, TYPE_X86_MACHINE)
+#define X86_MACHINE_CLASS(class) \
+OBJECT_CLASS_CHECK(X86MachineClass, class, TYPE_X86_MACHINE)
 
 uint32_t x86_cpu_apic_id_from_index(PCMachineState *pcms,
 unsigned int cpu_index);
@@ -30,6 +86,6 @@ const CPUArchIdList *x86_possible_c

[PATCH v9 12/15] roms: add microvm-bios (qboot) as binary and git submodule

2019-10-15 Thread Sergio Lopez

qboot is a minimalist x86 firmware for booting Linux kernels. It does
the mininum amount of work required for the task, and it's able to
boot both PVH images and bzImages without relying on option roms.

This characteristics make it an ideal companion for the microvm
machine type.

Signed-off-by: Sergio Lopez 
Reviewed-by: Stefano Garzarella 
Reviewed-by: Michael S. Tsirkin 
---
 .gitmodules  |   3 +++
 pc-bios/bios-microvm.bin | Bin 0 -> 65536 bytes
 roms/Makefile|   6 ++
 roms/qboot   |   1 +
 4 files changed, 10 insertions(+)
 create mode 100644 pc-bios/bios-microvm.bin
 create mode 16 roms/qboot

diff --git a/.gitmodules b/.gitmodules
index c5c474169d..19792c9a11 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -58,3 +58,6 @@
 [submodule "roms/opensbi"]
path = roms/opensbi
url =   https://git.qemu.org/git/opensbi.git
+[submodule "roms/qboot"]
+   path = roms/qboot
+   url = https://github.com/bonzini/qboot
diff --git a/pc-bios/bios-microvm.bin b/pc-bios/bios-microvm.bin
new file mode 100644
index 
..45eabc516692e2d134bbb630d133c7c2dcc9a9b6
GIT binary patch
literal 65536
zcmeI2eS8zwneS)hF_vth5y2#;brv;O^x_6mtAUMO%tn5}IEf)jY``G|XbMBqa_^-@
zBBUXSyplo3+VMx9Ci}VFwz~@rx!t?By&v!0UdNb=VGyaU`%sJ0_&U3!!Ij`fT?33lo8bTirhX$S6(dmK^Nu2qLAPZJO
zd+n=&@XLYwWFdS~Zh2c2gidFUEU-}gkOUzx)FqJ6
z;W433AY+Noh;tibd?18@VxCH}v4GeX#`0DNc_4sRW0TC1A
zmJsrGlX|tHBmSvHZ7h?b^=>;mu1$tb(P>mvG{5}(=7p;C?X{{UgF+=c4I
zP{`<|^mTPhV|UNEdEHc{)HAxSYaiW>P|0;&C$X5aR9UVpQyP@Vl`hhv$gdx{Z+-NR
zwvW~;(eFuX@wtUs8?fTZ0
z-^y7ZnXZ(N2F|Wml7g>taRZ)SYa#R~3)d_2XS+8|g~IPiN|W-WvPxO4Jf$R*7zq(M
zVPZ7&u2-7NT$&*G^Vf&!N<}4sxYR-&zO@e~z~qj%l+aa&|1SJK8kh_(z79nsokwAD^1AC7pf@Oj~FIF9#
zF9b$C2U-gqk-~z?@R7iuo?L~z4ZVWkIiSQ^i}NGJ*2?fn#8cjeR;%3om9j(r$>}0*
zmHG01U~>3C;Jl}&R|!j%@?|5s#Espmjtu)+#k%inqSKe2u4Y^$dyZBy&S@?QTm*4Jv!DYIMrLsd*G
z=`T-i{>XEL^*_04^-~G9FB2d;TMqbVppb+**NV)Wh3cyEi~fAMBS-GYFX{6SqmnR3
zi7j8o-YehtBPZl-;$fEDRenBjZPn_8`k0QW%duAboe{h9;n1k=PmPBITKXgysm0cG
zff9plWlq19^_3qFT=eu943;%0F}d3r9Ci6~gQK=UjyF9V9G&A|6db+Rf1l@a_=w-<
zIgp;B+L?Gjt)J5Gg)}0q>Wcp0HQVM-TQ2)8SKeoRFckXbZl8$MyIG&-ayp6njK|qn
zUpF;;x*hu7L6uw4aN!=m!{{NE=USzLa8KY0G!Yj<9~vxX3HENZ(Onw#yXUrCeraCo
zVgU^`gK6GqgVb`wZ;#Gbmy2v_MAjdX^lEFX6)k+K<#J$AXn(Om8@eKmZkcJ?-+r#^
zCA}?|Tk-m02jZiTNML7==D)CNvx3sYjI-Lq-*XZa2UU;-+c$WsKoy6%2oD<>eine
z!#Ejf&|7)}XW?0sdUNKePg&pstJ#T?3+&za*%{)yhd+P##k*mz+)*)!?YnZKwSKA|
zu}@-GBM5lQwXLUnbA-^KRrkB=>1G$AHSS_ujsr9%IKnN3NV_f2z%xyG&<)WQp>tn@_V;6awYf
z{q=0PWP~N+=^0|;@HO^_x)<+6Z;;S0r?eL5MGEsG)4dPD(&64oU$Ar(mKJI91WQp*
z*fzBkMi!_#i^;Q2JgahPiGQ8gvStZuJR~
zt#rF1Q*=b?$N+zvO5*<=;%c=B9T?OE0Z$h_QD(6#I65=Xr9N+IZ4eRkS9#6`N53nF
zgGh%{`-7vUa`;ocs8#P&SmkZ6Ac(#qhoeSdgWU2t0t?xQx
ztJi2*MGFO0nhpzHsS48`Vp6I$SzypCIQ|r)66h+xi_VhantBb5r^GsquKknX=-R+M
zR5)+3-109(YL#R<<}5hotld~R3DG;%8uwi7=x5}ePWAz;ei|y+wczL$|D53HM*s5Q
z=%zrc`r^c_nOvX1R?2lfbszGepq#~lx-UxZrlnOz{2p{&Q(UKz(M1ePy8b6tOokmV
zLn9Qdcp}|RixRp+gLWhp+axvgJlad?eTee+@?{jXu
zfYy+<4a9q#+Xg%bFc#Lkt?+6)8^}a<6=7{Pl(xY4C3hw+G;lp|9;H5+Abs%xXQ(ef
zlf)pDipQPXQqUx2%7Egdw^JoenzQOp1R=J}G
z#oOYxtAp`@B3qnI9t8DHEcgY=VF>p@4)U7qZI&7f#y_FB9^0F2j)oj89X~b6STSQI
zJGB67=8qA3IhV9q{PWleZJ*%`FMcgLjZ#$m)L7J#t+PdmS4q~;K5zNqKq{1(6=GPR
zzt1jQ+&}|ddcP9G9Na5+7s=gK*0Mxk2BxTKvC!2A{4?b4l@75|?ykwFgh^OnXr7ZL
z33i+&JdDZR31b~%GO@GXorc9VZcG&~Hp$(SA~o)u=mIh;l(c%zVi2>ct4HK+8P1VY
z))%OjyZbRz-UVHukq7*D$=!{UL`<^vIo7g+DDMc$J5q8GgR;CZl=RNu;Fbd2d(lC4
zriJ#~jnN$*YE7SvVGH7y;{%g&a;dneK=#pEW<1I(by1@!Ly6_^4fgx76!~?plm%+n
z5{9EYS7Y6gk?*3`y|@8xwK@?azsl16k9t%NY`T@Nl08`i3bYI6;DEBwPKPHJZrYud
zoU9dPO@-cD*-GuwJh;JyN!V~#+6S;vWoVD#t|#DT!#BC>`HZ{PyBj;gRfXAAGOT^wY+@zX`N53g(TAO^u5eMPrzo_quzVwk?s9XOZ#u`Wf%1iwZW^pj&5Bay_-h4&?!svD7EzF;^s5-#C%kdOM#Y?
za@Yk<+8fAV&^B6`nhQ&_8YH*uMTN7?oyxijCbmWhc*lVK22!2VV6WNICN4;=#ENPf
zBHT7SIiG;A!)*l2Phb~}x8987b92)D_ZSoaU%3bdVOzo&{{{O9OoL)L>`+Wn!p
z#QN3ZzZ5U3#V4ZW(Pt!9hI5eUbJplHcDBXJzGsg<=VV))p?D$OxjYd#!KTeJY?z&~
zfz2yL=s!A+^!Xn!dcxQkGNBI)$4@(L=h^RU*E49yoIC59*PBRXEDujoIhOBniwe>5W9{HXg
z<;*qm=Ha%JZHcy1ZRxh7>{uw_YVawZlM2o+JVz9du6WKKYVQ0xe6?=vj~e8b9^|0;
z>oFzzQ}W>GJQ}vt7j=f?u;y#JC~w~hkuG6}O}N)r6#8fwn`OJZ868i5U~}$ndO*%%
zk;7}@H#p&{dGJ+*`D(ib^ude<8&a@8wFyz53;WeD?Nv(-XAWBru?UW11Qzq+SMG>S
z!-IrAfROsI82`}5>YUNK;*!G{16h+D4z&?}B(;kxjNw3u(fZ?Za31oBflXhq)U<5d
zBe2r2$2211gXny_dvgwI!eaG!>kZDuZ#(1#i@a?qz=4cjF9Ph?ZS^EoxCBfo5*W#(xICY8?vDl{f1nn!9n&<6<
zv)Q`-&R=6X(U$xP-rTDlqDLY$5G>h`BRqR_UB9~Sk~xlJb^H+<*4~9gJe}wNn3fjt
zQ+evv;tKY{TPzW+-X9vV_&3Sh$rrFY+}BzyMHU{zc}#VmcJ7j){|BT$9=eW`@gUrc
zjfQmRrrgYU6@1~Cg)KfNho>M(n*|6s7?b|57cL%Mdxg3o#JS6;Q*dOa^c7HkUJJ0
zQ(qXsF=PDVu4tgoqlPrUhKrUNj>c(e|dX3&vva7^I^n
zyjNWyk<4}Dm#!2Y1EC)tN$=8zFo?l_XiTBCzjdad>-cnmo~(X9uX_Z?_c7Kt^E2eK
z9iNQfAiFRmcN}g$_!*Mg?>|>r{&wI4Hgbam8<9KuGcu*;GX0*)*i9Nc4Lt{k_K*&C
z;_QNL!3XY;!)~+@f;}aE8%`qWSX7CCc@AEV8=c6--h<0+c`uzQUa^%_Ra-Y{0eBLs$Cy9f*$
zF<;E}3$dX^Nk)L!Fb%aVFvyJ6GHAZHi%``W%O+xxPYC)mi8iR&{RFC+8ZrEb9rW}V
z`@)ojTG4rQ!=CZxJu$S~Xx@X=ykl)j`_Jw*Iy!5}-1betc|rhY(S
zEcgzZA){m@O+SYpJ1m=y!_Q)>y%^exYRcjNftpg#Du;iCAd)5A_Ee

[PATCH v9 09/15] hw/i386: make x86.c independent from PCMachineState

2019-10-15 Thread Sergio Lopez

As a last step into splitting PCMachineState and deriving
X86MachineState from it, make the functions previously extracted from
pc.c to x86.c independent from PCMachineState, using X86MachineState
instead.

Signed-off-by: Sergio Lopez 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael S. Tsirkin 
---
 include/hw/i386/x86.h | 13 +++
 hw/i386/pc.c  | 14 
 hw/i386/pc_piix.c |  2 +-
 hw/i386/pc_q35.c  |  2 +-
 hw/i386/x86.c | 53 ---
 5 files changed, 44 insertions(+), 40 deletions(-)

diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index d15713e92e..82d09fd7d0 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -75,10 +75,11 @@ typedef struct {
 #define X86_MACHINE_CLASS(class) \
 OBJECT_CLASS_CHECK(X86MachineClass, class, TYPE_X86_MACHINE)
 
-uint32_t x86_cpu_apic_id_from_index(PCMachineState *pcms,
+uint32_t x86_cpu_apic_id_from_index(X86MachineState *pcms,
 unsigned int cpu_index);
-void x86_cpu_new(PCMachineState *pcms, int64_t apic_id, Error **errp);
-void x86_cpus_init(PCMachineState *pcms);
+
+void x86_cpu_new(X86MachineState *pcms, int64_t apic_id, Error **errp);
+void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
 CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
  unsigned cpu_index);
 int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx);
@@ -86,6 +87,10 @@ const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState 
*ms);
 
 void x86_bios_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw);
 
-void x86_load_linux(PCMachineState *pcms, FWCfgState *fw_cfg);
+void x86_load_linux(X86MachineState *x86ms,
+FWCfgState *fw_cfg,
+int acpi_data_size,
+bool pvh_enabled,
+bool linuxboot_dma_enabled);
 
 #endif
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1457a45101..a4d3a284fb 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -983,8 +983,8 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts)
 
 void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp)
 {
-PCMachineState *pcms = PC_MACHINE(ms);
-int64_t apic_id = x86_cpu_apic_id_from_index(pcms, id);
+X86MachineState *x86ms = X86_MACHINE(ms);
+int64_t apic_id = x86_cpu_apic_id_from_index(x86ms, id);
 Error *local_err = NULL;
 
 if (id < 0) {
@@ -999,7 +999,8 @@ void pc_hot_add_cpu(MachineState *ms, const int64_t id, 
Error **errp)
 return;
 }
 
-x86_cpu_new(PC_MACHINE(ms), apic_id, &local_err);
+
+x86_cpu_new(X86_MACHINE(ms), apic_id, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -1100,6 +1101,7 @@ void xen_load_linux(PCMachineState *pcms)
 {
 int i;
 FWCfgState *fw_cfg;
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
 
 assert(MACHINE(pcms)->kernel_filename != NULL);
@@ -1108,7 +1110,8 @@ void xen_load_linux(PCMachineState *pcms)
 fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
 rom_set_fw(fw_cfg);
 
-x86_load_linux(pcms, fw_cfg);
+x86_load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
+   pcmc->pvh_enabled, pcmc->linuxboot_dma_enabled);
 for (i = 0; i < nb_option_roms; i++) {
 assert(!strcmp(option_rom[i].name, "linuxboot.bin") ||
!strcmp(option_rom[i].name, "linuxboot_dma.bin") ||
@@ -1244,7 +1247,8 @@ void pc_memory_init(PCMachineState *pcms,
 }
 
 if (linux_boot) {
-x86_load_linux(pcms, fw_cfg);
+x86_load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
+   pcmc->pvh_enabled, pcmc->linuxboot_dma_enabled);
 }
 
 for (i = 0; i < nb_option_roms; i++) {
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 0afa8fe6ea..a86317cdff 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -154,7 +154,7 @@ static void pc_init1(MachineState *machine,
 }
 }
 
-x86_cpus_init(pcms);
+x86_cpus_init(x86ms, pcmc->default_cpu_version);
 
 if (kvm_enabled() && pcmc->kvmclock_enabled) {
 kvmclock_create();
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 374ac6c068..75c8caf7c2 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -181,7 +181,7 @@ static void pc_q35_init(MachineState *machine)
 xen_hvm_init(pcms, &ram_memory);
 }
 
-x86_cpus_init(pcms);
+x86_cpus_init(x86ms, pcmc->default_cpu_version);
 
 kvmclock_create();
 
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index de4fed0164..fd84b23124 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -36,7 +36,6 @@
 #include "sysemu/sysemu.h"
 
 #include "hw/i386/x86.h"
-#include "hw/i386/pc.h"
 #include "target/i386/cpu.h"
 #include "hw/i386/topology.h"
 #include "hw/i386/fw_cfg.h"
@@ -61,11 +60,10 @@ static size_t pvh_s

Re: [PATCH v9 04/15] hw/i386/pc: replace use of strtol with qemu_strtol in x86_load_linux()

2019-10-15 Thread Philippe Mathieu-Daudé


Hi Sergio,

On 10/15/19 1:23 PM, Sergio Lopez wrote:

Follow checkpatch.pl recommendation and replace the use of strtol with
qemu_strtol in x86_load_linux().


"with qemu_strtoui"



Signed-off-by: Sergio Lopez 
---
  hw/i386/pc.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 77e86bfc3d..c8608b8007 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -68,6 +68,7 @@
  #include "qemu/config-file.h"
  #include "qemu/error-report.h"
  #include "qemu/option.h"
+#include "qemu/cutils.h"
  #include "hw/acpi/acpi.h"
  #include "hw/acpi/cpu_hotplug.h"
  #include "hw/boards.h"
@@ -1202,6 +1203,7 @@ static void x86_load_linux(PCMachineState *pcms,
  vmode = strstr(kernel_cmdline, "vga=");
  if (vmode) {
  unsigned int video_mode;
+int ret;
  /* skip "vga=" */
  vmode += 4;
  if (!strncmp(vmode, "normal", 6)) {
@@ -1211,7 +1213,12 @@ static void x86_load_linux(PCMachineState *pcms,
  } else if (!strncmp(vmode, "ask", 3)) {
  video_mode = 0xfffd;
  } else {
-video_mode = strtol(vmode, NULL, 0);
+ret = qemu_strtoui(vmode, NULL, 0, &video_mode);
+if (ret != 0) {
+fprintf(stderr, "qemu: can't parse 'vga' parameter: %s\n",
+strerror(-ret));


(Cc'ing Markus/Daniel just in case)

I'm wondering if using fprintf() is appropriate, thinking about 
instantiating a machine via libvirt, is this error reported to the user?


I first thought about using error_report() instead:

error_report("qemu: can't parse 'vga' parameter: %s",
 strerror(-ret));

But this API is meaningful when used in console/monitor. We can't get 
here from the monitor, so:

Reviewed-by: Philippe Mathieu-Daudé 


+exit(1);
+}
  }
  stw_p(header + 0x1fa, video_mode);
  }

[PATCH v9 10/15] fw_cfg: add "modify" functions for all types

2019-10-15 Thread Sergio Lopez

This allows to alter the contents of an already added item.

Signed-off-by: Sergio Lopez 
Reviewed-by: Michael S. Tsirkin 
---
 include/hw/nvram/fw_cfg.h | 42 +++
 hw/nvram/fw_cfg.c | 29 +++
 2 files changed, 71 insertions(+)

diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 80e435d303..b5291eefad 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -98,6 +98,20 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void 
*data, size_t len);
  */
 void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value);
 
+/**
+ * fw_cfg_modify_string:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ * @value: NUL-terminated ascii string
+ *
+ * Replace the fw_cfg item available by selecting the given key. The new
+ * data will consist of a dynamically allocated copy of the provided string,
+ * including its NUL terminator. The data being replaced, assumed to have
+ * been dynamically allocated during an earlier call to either
+ * fw_cfg_add_string() or fw_cfg_modify_string(), is freed before returning.
+ */
+void fw_cfg_modify_string(FWCfgState *s, uint16_t key, const char *value);
+
 /**
  * fw_cfg_add_i16:
  * @s: fw_cfg device being modified
@@ -136,6 +150,20 @@ void fw_cfg_modify_i16(FWCfgState *s, uint16_t key, 
uint16_t value);
  */
 void fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t value);
 
+/**
+ * fw_cfg_modify_i32:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ * @value: 32-bit integer
+ *
+ * Replace the fw_cfg item available by selecting the given key. The new
+ * data will consist of a dynamically allocated copy of the given 32-bit
+ * value, converted to little-endian representation. The data being replaced,
+ * assumed to have been dynamically allocated during an earlier call to
+ * either fw_cfg_add_i32() or fw_cfg_modify_i32(), is freed before returning.
+ */
+void fw_cfg_modify_i32(FWCfgState *s, uint16_t key, uint32_t value);
+
 /**
  * fw_cfg_add_i64:
  * @s: fw_cfg device being modified
@@ -148,6 +176,20 @@ void fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t 
value);
  */
 void fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t value);
 
+/**
+ * fw_cfg_modify_i64:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ * @value: 64-bit integer
+ *
+ * Replace the fw_cfg item available by selecting the given key. The new
+ * data will consist of a dynamically allocated copy of the given 64-bit
+ * value, converted to little-endian representation. The data being replaced,
+ * assumed to have been dynamically allocated during an earlier call to
+ * either fw_cfg_add_i64() or fw_cfg_modify_i64(), is freed before returning.
+ */
+void fw_cfg_modify_i64(FWCfgState *s, uint16_t key, uint64_t value);
+
 /**
  * fw_cfg_add_file:
  * @s: fw_cfg device being modified
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 7dc3ac378e..aef1727250 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -690,6 +690,15 @@ void fw_cfg_add_string(FWCfgState *s, uint16_t key, const 
char *value)
 fw_cfg_add_bytes(s, key, g_memdup(value, sz), sz);
 }
 
+void fw_cfg_modify_string(FWCfgState *s, uint16_t key, const char *value)
+{
+size_t sz = strlen(value) + 1;
+char *old;
+
+old = fw_cfg_modify_bytes_read(s, key, g_memdup(value, sz), sz);
+g_free(old);
+}
+
 void fw_cfg_add_i16(FWCfgState *s, uint16_t key, uint16_t value)
 {
 uint16_t *copy;
@@ -720,6 +729,16 @@ void fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t 
value)
 fw_cfg_add_bytes(s, key, copy, sizeof(value));
 }
 
+void fw_cfg_modify_i32(FWCfgState *s, uint16_t key, uint32_t value)
+{
+uint32_t *copy, *old;
+
+copy = g_malloc(sizeof(value));
+*copy = cpu_to_le32(value);
+old = fw_cfg_modify_bytes_read(s, key, copy, sizeof(value));
+g_free(old);
+}
+
 void fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t value)
 {
 uint64_t *copy;
@@ -730,6 +749,16 @@ void fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t 
value)
 fw_cfg_add_bytes(s, key, copy, sizeof(value));
 }
 
+void fw_cfg_modify_i64(FWCfgState *s, uint16_t key, uint64_t value)
+{
+uint64_t *copy, *old;
+
+copy = g_malloc(sizeof(value));
+*copy = cpu_to_le64(value);
+old = fw_cfg_modify_bytes_read(s, key, copy, sizeof(value));
+g_free(old);
+}
+
 void fw_cfg_set_order_override(FWCfgState *s, int order)
 {
 assert(s->fw_cfg_order_override == 0);
-- 
2.21.0

[PATCH v9 14/15] hw/i386: Introduce the microvm machine type

2019-10-15 Thread Sergio Lopez

Microvm is a machine type inspired by Firecracker and constructed
after the its machine model.

It's a minimalist machine type without PCI nor ACPI support, designed
for short-lived guests. Microvm also establishes a baseline for
benchmarking and optimizing both QEMU and guest operating systems,
since it is optimized for both boot time and footprint.

Signed-off-by: Sergio Lopez 
Reviewed-by: Michael S. Tsirkin 
---
 default-configs/i386-softmmu.mak |   1 +
 include/hw/i386/microvm.h|  83 +
 hw/i386/microvm.c| 572 +++
 hw/i386/Kconfig  |   4 +
 hw/i386/Makefile.objs|   1 +
 5 files changed, 661 insertions(+)
 create mode 100644 include/hw/i386/microvm.h
 create mode 100644 hw/i386/microvm.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 4229900f57..4cc64dafa2 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -28,3 +28,4 @@
 CONFIG_ISAPC=y
 CONFIG_I440FX=y
 CONFIG_Q35=y
+CONFIG_MICROVM=y
diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
new file mode 100644
index 00..faaa2e60b8
--- /dev/null
+++ b/include/hw/i386/microvm.h
@@ -0,0 +1,83 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef HW_I386_MICROVM_H
+#define HW_I386_MICROVM_H
+
+#include "qemu-common.h"
+#include "exec/hwaddr.h"
+#include "qemu/notify.h"
+
+#include "hw/boards.h"
+#include "hw/i386/x86.h"
+
+/* Microvm memory layout */
+#define PVH_START_INFO0x6000
+#define MEMMAP_START  0x7000
+#define MODLIST_START 0x7800
+#define BOOT_STACK_POINTER0x8ff0
+#define PML4_START0x9000
+#define PDPTE_START   0xa000
+#define PDE_START 0xb000
+#define KERNEL_CMDLINE_START  0x2
+#define EBDA_START0x9fc00
+#define HIMEM_START   0x10
+
+/* Platform virtio definitions */
+#define VIRTIO_MMIO_BASE  0xc000
+#define VIRTIO_IRQ_BASE   5
+#define VIRTIO_NUM_TRANSPORTS 8
+#define VIRTIO_CMDLINE_MAXLEN 64
+
+/* Machine type options */
+#define MICROVM_MACHINE_PIT "pit"
+#define MICROVM_MACHINE_PIC "pic"
+#define MICROVM_MACHINE_RTC "rtc"
+#define MICROVM_MACHINE_ISA_SERIAL  "isa-serial"
+#define MICROVM_MACHINE_OPTION_ROMS "x-option-roms"
+#define MICROVM_MACHINE_AUTO_KERNEL_CMDLINE "auto-kernel-cmdline"
+
+typedef struct {
+X86MachineClass parent;
+HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
+   DeviceState *dev);
+} MicrovmMachineClass;
+
+typedef struct {
+X86MachineState parent;
+
+/* Machine type options */
+OnOffAuto pic;
+OnOffAuto pit;
+OnOffAuto rtc;
+bool isa_serial;
+bool option_roms;
+bool auto_kernel_cmdline;
+
+/* Machine state */
+bool kernel_cmdline_fixed;
+} MicrovmMachineState;
+
+#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
+#define MICROVM_MACHINE(obj) \
+OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
+#define MICROVM_MACHINE_GET_CLASS(obj) \
+OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
+#define MICROVM_MACHINE_CLASS(class) \
+OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
+
+#endif
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
new file mode 100644
index 00..4fd933c001
--- /dev/null
+++ b/hw/i386/microvm.c
@@ -0,0 +1,572 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/cutils.h"
+#include "qemu/units.h"
+#include "qapi/error.h"

[PATCH v9 13/15] docs/microvm.rst: document the new microvm machine type

2019-10-15 Thread Sergio Lopez

Document the new microvm machine type.

Signed-off-by: Sergio Lopez 
Reviewed-by: Michael S. Tsirkin 
---
 docs/microvm.rst | 98 
 1 file changed, 98 insertions(+)
 create mode 100644 docs/microvm.rst

diff --git a/docs/microvm.rst b/docs/microvm.rst
new file mode 100644
index 00..dc36ecf7c3
--- /dev/null
+++ b/docs/microvm.rst
@@ -0,0 +1,98 @@
+
+Microvm Machine Type
+
+
+Microvm is a machine type inspired by ``Firecracker`` and constructed
+after the its machine model.
+
+It's a minimalist machine type without ``PCI`` nor ``ACPI`` support,
+designed for short-lived guests. Microvm also establishes a baseline
+for benchmarking and optimizing both QEMU and guest operating systems,
+since it is optimized for both boot time and footprint.
+
+
+Supported devices
+-
+
+The microvm machine type supports the following devices:
+
+- ISA bus
+- i8259 PIC (optional)
+- i8254 PIT (optional)
+- MC146818 RTC (optional)
+- One ISA serial port (optional)
+- LAPIC
+- IOAPIC (with kernel-irqchip=split by default)
+- kvmclock (if using KVM)
+- fw_cfg
+- Up to eight virtio-mmio devices (configured by the user)
+
+
+Using the microvm machine type
+--
+
+Machine-specific options
+
+
+It supports the following machine-specific options:
+
+- microvm.x-option-roms=bool (Set off to disable loading option ROMs)
+- microvm.pit=OnOffAuto (Enable i8254 PIT)
+- microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial 
port)
+- microvm.pic=OnOffAuto (Enable i8259 PIC)
+- microvm.rtc=OnOffAuto (Enable MC146818 RTC)
+- microvm.auto-kernel-cmdline=bool (Set off to disable adding virtio-mmio 
devices to the kernel cmdline)
+
+
+Boot options
+
+
+By default, microvm uses ``qboot`` as its BIOS, to obtain better boot
+times, but it's also compatible with ``SeaBIOS``.
+
+As no current FW is able to boot from a block device using
+``virtio-mmio`` as its transport, a microvm-based VM needs to be run
+using a host-side kernel and, optionally, an initrd image.
+
+
+Running a microvm-based VM
+~~
+
+By default, microvm aims for maximum compatibility, enabling both
+legacy and non-legacy devices. In this example, a VM is created
+without passing any additional machine-specific option, using the
+legacy ``ISA serial`` device as console::
+
+  $ qemu-system-x86_64 -M microvm \
+ -enable-kvm -cpu host -m 512m -smp 2 \
+ -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
+ -nodefaults -no-user-config -nographic \
+ -serial stdio \
+ -drive id=test,file=test.img,format=raw,if=none \
+ -device virtio-blk-device,drive=test \
+ -netdev tap,id=tap0,script=no,downscript=no \
+ -device virtio-net-device,netdev=tap0
+
+While the example above works, you might be interested in reducing the
+footprint further by disabling some legacy devices. If you're using
+``KVM``, you can disable the ``RTC``, making the Guest rely on
+``kvmclock`` exclusively. Additionally, if your host's CPUs have the
+``TSC_DEADLINE`` feature, you can also disable both the i8259 PIC and
+the i8254 PIT (make sure you're also emulating a CPU with such feature
+in the guest).
+
+This is an example of a VM with all optional legacy features
+disabled::
+
+  $ qemu-system-x86_64 \
+ -M microvm,x-option-roms=off,pit=off,pic=off,isa-serial=off,rtc=off \
+ -enable-kvm -cpu host -m 512m -smp 2 \
+ -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
+ -nodefaults -no-user-config -nographic \
+ -chardev stdio,id=virtiocon0,server \
+ -device virtio-serial-device \
+ -device virtconsole,chardev=virtiocon0 \
+ -drive id=test,file=test.img,format=raw,if=none \
+ -device virtio-blk-device,drive=test \
+ -netdev tap,id=tap0,script=no,downscript=no \
+ -device virtio-net-device,netdev=tap0
-- 
2.21.0

Re: [PATCH v9 15/15] MAINTAINERS: add microvm related files

2019-10-15 Thread Philippe Mathieu-Daudé


On 10/15/19 1:23 PM, Sergio Lopez wrote:

Add a new "Microvm" section under "X86 Machines" with the new files
related to this machine type.

Signed-off-by: Sergio Lopez 
Reviewed-by: Michael S. Tsirkin 
---
  MAINTAINERS | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fe4dc51b08..9744f07727 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1275,6 +1275,16 @@ F: include/hw/timer/hpet.h
  F: include/hw/timer/i8254*
  F: include/hw/timer/mc146818rtc*
  
+Microvm

+M: Sergio Lopez 
+M: Paolo Bonzini 
+S: Maintained
+F: docs/microvm.rst
+F: hw/i386/microvm.c
+F: include/hw/i386/microvm.h
+F: roms/qboot


This is a submodule, change there won't be committed within QEMU.

Without the 'F: roms/qboot' line:
Reviewed-by: Philippe Mathieu-Daudé 


+F: pc-bios/bios-microvm.bin
+
  Machine core
  M: Eduardo Habkost 
  M: Marcel Apfelbaum

Re: [PATCH v9 15/15] MAINTAINERS: add microvm related files

2019-10-15 Thread Philippe Mathieu-Daudé


On 10/15/19 1:23 PM, Sergio Lopez wrote:

Add a new "Microvm" section under "X86 Machines" with the new files
related to this machine type.

Signed-off-by: Sergio Lopez 
Reviewed-by: Michael S. Tsirkin 
---
  MAINTAINERS | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fe4dc51b08..9744f07727 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1275,6 +1275,16 @@ F: include/hw/timer/hpet.h
  F: include/hw/timer/i8254*
  F: include/hw/timer/mc146818rtc*
  
+Microvm


Since it is written "microvm" in the documentation, you can use this 
form here too, it won't be an exception:


$ egrep '^[a-z]' MAINTAINERS
i.MX25 PDK
i.MX31 (kzm)
nSeries
milkymist
an5206
mcf5208
petalogix_s3adsp1800
petalogix_ml605
or1k-sim
e500
mpc8544ds
sPAPR
virtex_ml507
sam460ex
sim
ppc4xx
vfio-ccw
vfio-ap
vhost
virtio
virtio-9p
virtio-blk
virtio-ccw
virtio-input
virtio-serial
virtio-rng
virtio-crypto
nvme
megasas
e1000x
e1000e
eepro100
ramfb
virtio-gpu
vhost-user-gpu
qtest
elf2dmp
i386 TCG target
iSCSI
blklogwrites
blkverify
bochs
cloop
dmg
parallels
qed
raw
qcow2
qcow
blkdebug
vpc
vvfat


+M: Sergio Lopez 
+M: Paolo Bonzini 
+S: Maintained
+F: docs/microvm.rst
+F: hw/i386/microvm.c
+F: include/hw/i386/microvm.h
+F: roms/qboot
+F: pc-bios/bios-microvm.bin
+
  Machine core
  M: Eduardo Habkost 
  M: Marcel Apfelbaum

Re: [PATCH 2/2] core: replace getpagesize() with qemu_real_host_page_size

2019-10-15 Thread Yuval Shaia

On Sun, Oct 13, 2019 at 10:11:45AM +0800, Wei Yang wrote:
> There are three page size in qemu:
> 
>   real host page size
>   host page size
>   target page size
> 
> All of them have dedicate variable to represent. For the last two, we
> use the same form in the whole qemu project, while for the first one we
> use two forms: qemu_real_host_page_size and getpagesize().
> 
> qemu_real_host_page_size is defined to be a replacement of
> getpagesize(), so let it serve the role.
> 
> [Note] Not fully tested for some arch or device.
> 
> Signed-off-by: Wei Yang 
> ---
>  accel/kvm/kvm-all.c|  6 +++---
>  backends/hostmem.c |  2 +-
>  block.c|  4 ++--
>  block/file-posix.c |  9 +
>  block/io.c |  2 +-
>  block/parallels.c  |  2 +-
>  block/qcow2-cache.c|  2 +-
>  contrib/vhost-user-gpu/vugbm.c |  2 +-
>  exec.c |  6 +++---
>  hw/intc/s390_flic_kvm.c|  2 +-
>  hw/ppc/mac_newworld.c  |  2 +-
>  hw/ppc/spapr_pci.c |  2 +-
>  hw/rdma/vmw/pvrdma_main.c  |  2 +-

for pvrdma stuff:

Reviewed-by: Yuval Shaia 
Tested-by: Yuval Shaia 

>  hw/vfio/spapr.c|  7 ---
>  include/exec/ram_addr.h|  2 +-
>  include/qemu/osdep.h   |  4 ++--
>  migration/migration.c  |  2 +-
>  migration/postcopy-ram.c   |  4 ++--
>  monitor/misc.c |  2 +-
>  target/ppc/kvm.c   |  2 +-
>  tests/vhost-user-bridge.c  |  8 
>  util/mmap-alloc.c  | 10 +-
>  util/oslib-posix.c |  4 ++--
>  util/oslib-win32.c |  2 +-
>  util/vfio-helpers.c| 12 ++--
>  25 files changed, 52 insertions(+), 50 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index d2d96d73e8..140b0bd8f6 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -52,7 +52,7 @@
>  /* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
>   * need to use the real host PAGE_SIZE, as that's what KVM will use.
>   */
> -#define PAGE_SIZE getpagesize()
> +#define PAGE_SIZE qemu_real_host_page_size
>  
>  //#define DEBUG_KVM
>  
> @@ -507,7 +507,7 @@ static int 
> kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
>  {
>  ram_addr_t start = section->offset_within_region +
> memory_region_get_ram_addr(section->mr);
> -ram_addr_t pages = int128_get64(section->size) / getpagesize();
> +ram_addr_t pages = int128_get64(section->size) / 
> qemu_real_host_page_size;
>  
>  cpu_physical_memory_set_dirty_lebitmap(bitmap, start, pages);
>  return 0;
> @@ -1841,7 +1841,7 @@ static int kvm_init(MachineState *ms)
>   * even with KVM.  TARGET_PAGE_SIZE is assumed to be the minimum
>   * page size for the system though.
>   */
> -assert(TARGET_PAGE_SIZE <= getpagesize());
> +assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size);
>  
>  s->sigmask_len = 8;
>  
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 6d333dc23c..e773bdfa6e 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -304,7 +304,7 @@ size_t host_memory_backend_pagesize(HostMemoryBackend 
> *memdev)
>  #else
>  size_t host_memory_backend_pagesize(HostMemoryBackend *memdev)
>  {
> -return getpagesize();
> +return qemu_real_host_page_size;
>  }
>  #endif
>  
> diff --git a/block.c b/block.c
> index 5944124845..98f47e2902 100644
> --- a/block.c
> +++ b/block.c
> @@ -106,7 +106,7 @@ size_t bdrv_opt_mem_align(BlockDriverState *bs)
>  {
>  if (!bs || !bs->drv) {
>  /* page size or 4k (hdd sector size) should be on the safe side */
> -return MAX(4096, getpagesize());
> +return MAX(4096, qemu_real_host_page_size);
>  }
>  
>  return bs->bl.opt_mem_alignment;
> @@ -116,7 +116,7 @@ size_t bdrv_min_mem_align(BlockDriverState *bs)
>  {
>  if (!bs || !bs->drv) {
>  /* page size or 4k (hdd sector size) should be on the safe side */
> -return MAX(4096, getpagesize());
> +return MAX(4096, qemu_real_host_page_size);
>  }
>  
>  return bs->bl.min_mem_alignment;
> diff --git a/block/file-posix.c b/block/file-posix.c
> index f12c06de2d..f60ac3f93f 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -322,7 +322,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
> fd, Error **errp)
>  {
>  BDRVRawState *s = bs->opaque;
>  char *buf;
> -size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
> +size_t max_align = MAX(MAX_BLOCKSIZE, qemu_real_host_page_size);
>  size_t alignments[] = {1, 512, 1024, 2048, 4096};
>  
>  /* For SCSI generic devices the alignment is not really used.
> @@ -1131,13 +1131,14 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  
>  ret = sg_get_max_segments(s->fd);
>  if (ret > 0) {
> -bs->bl.max_transfer = MIN

[Bug 1846451] Re: K800 keyboard no longer works when attached to a VM

2019-10-15 Thread Gerd Hoffmann

qemu: -device usb-host,...,guest-resets-all=true

libvirt:

   
   

... assuming this is is the only pass-through device.

If you pass through more devices you''l have hostdev0, hostdev1, ... and
have to pick the correct one.

(see also http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-
line-options.html)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1846451

Title:
  K800 keyboard no longer works when attached to a VM

Status in QEMU:
  New

Bug description:
  I use Logitech K800 keyboard which is connected to a PC through
  Logitech unifying receiver. In order to control my windows VM i attach
  unifying receiver USB device to a VM using "virsh attach-device VM-
  Name ./device.xml". Device ID as seen in lsusb is 046d:c52b.

  As of v4.1.0 keyboard no longer works when attached to a windows VM.
  When attached receiver is still at least partially functional.
  Logitech pairing utility properly displays paired keyboard, pressing
  buttons on the keyboard shows changing indicator icon in pairing
  utility. Pairing and unpairing works. Pressing keys however fails to
  register any key presses.

  Downgrading to v4.0.0 fixes the issue.

  device.xml used to attach USB device:
  ```
  
  
  
  
  
  

  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1846451/+subscriptions

Re: [PATCH v9 13/15] docs/microvm.rst: document the new microvm machine type

2019-10-15 Thread Greg Kurz

On Tue, 15 Oct 2019 13:23:44 +0200
Sergio Lopez  wrote:

> Document the new microvm machine type.
> 
> Signed-off-by: Sergio Lopez 
> Reviewed-by: Michael S. Tsirkin 
> ---
>  docs/microvm.rst | 98 
>  1 file changed, 98 insertions(+)
>  create mode 100644 docs/microvm.rst
> 
> diff --git a/docs/microvm.rst b/docs/microvm.rst
> new file mode 100644
> index 00..dc36ecf7c3
> --- /dev/null
> +++ b/docs/microvm.rst
> @@ -0,0 +1,98 @@
> +
> +Microvm Machine Type
> +
> +
> +Microvm is a machine type inspired by ``Firecracker`` and constructed
> +after the its machine model.
> +

Same typo as in the cover.

s/the //

> +It's a minimalist machine type without ``PCI`` nor ``ACPI`` support,
> +designed for short-lived guests. Microvm also establishes a baseline
> +for benchmarking and optimizing both QEMU and guest operating systems,
> +since it is optimized for both boot time and footprint.
> +
> +
> +Supported devices
> +-
> +
> +The microvm machine type supports the following devices:
> +
> +- ISA bus
> +- i8259 PIC (optional)
> +- i8254 PIT (optional)
> +- MC146818 RTC (optional)
> +- One ISA serial port (optional)
> +- LAPIC
> +- IOAPIC (with kernel-irqchip=split by default)
> +- kvmclock (if using KVM)
> +- fw_cfg
> +- Up to eight virtio-mmio devices (configured by the user)
> +
> +
> +Using the microvm machine type
> +--
> +
> +Machine-specific options
> +
> +
> +It supports the following machine-specific options:
> +
> +- microvm.x-option-roms=bool (Set off to disable loading option ROMs)
> +- microvm.pit=OnOffAuto (Enable i8254 PIT)
> +- microvm.isa-serial=bool (Set off to disable the instantiation an ISA 
> serial port)
> +- microvm.pic=OnOffAuto (Enable i8259 PIC)
> +- microvm.rtc=OnOffAuto (Enable MC146818 RTC)
> +- microvm.auto-kernel-cmdline=bool (Set off to disable adding virtio-mmio 
> devices to the kernel cmdline)
> +
> +
> +Boot options
> +
> +
> +By default, microvm uses ``qboot`` as its BIOS, to obtain better boot
> +times, but it's also compatible with ``SeaBIOS``.
> +
> +As no current FW is able to boot from a block device using
> +``virtio-mmio`` as its transport, a microvm-based VM needs to be run
> +using a host-side kernel and, optionally, an initrd image.
> +
> +
> +Running a microvm-based VM
> +~~
> +
> +By default, microvm aims for maximum compatibility, enabling both
> +legacy and non-legacy devices. In this example, a VM is created
> +without passing any additional machine-specific option, using the
> +legacy ``ISA serial`` device as console::
> +
> +  $ qemu-system-x86_64 -M microvm \
> + -enable-kvm -cpu host -m 512m -smp 2 \
> + -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" 
> \
> + -nodefaults -no-user-config -nographic \
> + -serial stdio \
> + -drive id=test,file=test.img,format=raw,if=none \
> + -device virtio-blk-device,drive=test \
> + -netdev tap,id=tap0,script=no,downscript=no \
> + -device virtio-net-device,netdev=tap0
> +
> +While the example above works, you might be interested in reducing the
> +footprint further by disabling some legacy devices. If you're using
> +``KVM``, you can disable the ``RTC``, making the Guest rely on
> +``kvmclock`` exclusively. Additionally, if your host's CPUs have the
> +``TSC_DEADLINE`` feature, you can also disable both the i8259 PIC and
> +the i8254 PIT (make sure you're also emulating a CPU with such feature
> +in the guest).
> +
> +This is an example of a VM with all optional legacy features
> +disabled::
> +
> +  $ qemu-system-x86_64 \
> + -M microvm,x-option-roms=off,pit=off,pic=off,isa-serial=off,rtc=off \
> + -enable-kvm -cpu host -m 512m -smp 2 \
> + -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
> + -nodefaults -no-user-config -nographic \
> + -chardev stdio,id=virtiocon0,server \
> + -device virtio-serial-device \
> + -device virtconsole,chardev=virtiocon0 \
> + -drive id=test,file=test.img,format=raw,if=none \
> + -device virtio-blk-device,drive=test \
> + -netdev tap,id=tap0,script=no,downscript=no \
> + -device virtio-net-device,netdev=tap0

Re: [PATCH v5 1/9] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-10-15 Thread Beata Michalska

On Tue, 15 Oct 2019 at 11:56, Andrew Jones  wrote:
>
> On Tue, Oct 15, 2019 at 10:59:16AM +0100, Beata Michalska wrote:
> > On Tue, 1 Oct 2019 at 14:04, Andrew Jones  wrote:
> > > +
> > > +obj = object_new(object_class_get_name(oc));
> > > +
> > > +if (qdict_in) {
> > > +Visitor *visitor;
> > > +Error *err = NULL;
> > > +
> > > +visitor = qobject_input_visitor_new(model->props);
> > > +visit_start_struct(visitor, NULL, NULL, 0, &err);
> > > +if (err) {
> > > +object_unref(obj);
> >
> > Shouldn't we free the 'visitor' here as well ?
>
> Yes. Good catch. So we also need to fix
> target/s390x/cpu_models.c:cpu_model_from_info(), which has the same
> construction (the construction from which I derived this)
>
> >
> > > +error_propagate(errp, err);
> > > +return NULL;
> > > +}
> > > +
>
> What about the rest of the patch? With that fixed for v6 can I
> add your r-b?
>

I still got this feeling that we could optimize that a bit - which I'm
currently on, so hopefully I'll be able to add more comments soon if
that proves to be the case.

BR
Beata

> Thanks,
> drew

Re: [PATCH v4 1/8] target/mips: Clean up helper.c

2019-10-15 Thread Aleksandar Markovic

>
>
>
Markus wrote:


> However, I find the common pattern applied here
>
> case 3: /* ERL */
> /* If EU is set, always unmapped */
> if (eu) {
> return 0;
> }
>
> more readable ...


>
>
 I am going to do it this way in v5.

Thanks,
Aleksandar



> ... than the unusual (to my eyes)
>
> case 3:
> /*
>  * ERL
>  * If EU is set, always unmapped
>  */
> if (eu) {
> return 0;
> }
>
> The first line of the comment applies to the value preceding it, the
> second to the code following it.  Making these connections doesn't
> exactly take genius, but neither is it effortless.
>
> Nice and consistent coding style is all about reducing the effort of
> reading code.
>
> For what it's worth, the pattern
>
> case VALUE: /* comment on VALUE */
> /* comment on CODE */
> CODE
>
> occurs almost 300 times.
>
> > I don't see any reason to change this patch. Please let me know it you
> > still think I should do something else. And you are welcome to analyse
> any
> > patches of mine.
>
> Please consider keeping two separate comments, i.e. just move the colon
> to its usual place.
>
> Thanks!
>

Re: [PATCH v26 00/21] Add RX archtecture support

2019-10-15 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20191014115757.51866-1-ys...@users.sourceforge.jp/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v26 00/21] Add RX archtecture support
Type: series
Message-id: 20191014115757.51866-1-ys...@users.sourceforge.jp

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20191014115757.51866-1-ys...@users.sourceforge.jp 
-> patchew/20191014115757.51866-1-ys...@users.sourceforge.jp
Switched to a new branch 'test'
3dc4ce3 BootLinuxConsoleTest: Test the RX-Virt machine
7db9f9b Add rx-softmmu
23cabcc hw/rx: Restrict the RX62N microcontroller to the RX62N CPU core
762fe7c hw/rx: Honor -accel qtest
d91f89f hw/rx: RX Target hardware definition
0c6ffe0 hw/char: RX62N serial communication interface (SCI)
e6a5a42 hw/timer: RX62N internal timer modules
38febf0 hw/intc: RX62N interrupt controller (ICUa)
0871fb9 target/rx: Dump bytes for each insn during disassembly
136dde2 target/rx: Collect all bytes during disassembly
d541b2f target/rx: Emit all disassembly in one prt()
56e42be target/rx: Use prt_ldmi for XCHG_mr disassembly
3a4e100 target/rx: Replace operand with prt_ldmi in disassembler
152bc63 target/rx: Disassemble rx_index_addr into a string
c54404e target/rx: RX disassembler
a3f9cb6 target/rx: CPU definition
3766354 target/rx: TCG helper
44c05b7 target/rx: TCG translation
89989a0 hw/registerfields.h: Add 8bit and 16bit register macros
5aa6e53 qemu/bitops.h: Add extract8 and extract16
fcbaac0 MAINTAINERS: Add RX

=== OUTPUT BEGIN ===
1/21 Checking commit fcbaac0c6ed3 (MAINTAINERS: Add RX)
2/21 Checking commit 5aa6e538fe97 (qemu/bitops.h: Add extract8 and extract16)
3/21 Checking commit 89989a01d9ae (hw/registerfields.h: Add 8bit and 16bit 
register macros)
Use of uninitialized value in concatenation (.) or string at 
./scripts/checkpatch.pl line 2484.
ERROR: Macros with multiple statements should be enclosed in a do - while loop
#27: FILE: include/hw/registerfields.h:25:
+#define REG8(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) };

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#31: FILE: include/hw/registerfields.h:29:
+#define REG16(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) / 2 };

total: 2 errors, 0 warnings, 56 lines checked

Patch 3/21 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/21 Checking commit 44c05b78c659 (target/rx: TCG translation)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#20: 
new file mode 100644

total: 0 errors, 1 warnings, 3065 lines checked

Patch 4/21 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/21 Checking commit 3766354ea070 (target/rx: TCG helper)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#21: 
new file mode 100644

total: 0 errors, 1 warnings, 650 lines checked

Patch 5/21 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
6/21 Checking commit a3f9cb69bcd3 (target/rx: CPU definition)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#32: 
new file mode 100644

total: 0 errors, 1 warnings, 588 lines checked

Patch 6/21 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
7/21 Checking commit c54404e8c1f4 (target/rx: RX disassembler)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#38: 
new file mode 100644

total: 0 errors, 1 warnings, 1497 lines checked

Patch 7/21 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
8/21 Checking commit 152bc63f993c (target/rx: Disassemble rx_index_addr into a 
string)
9/21 Checking commit 3a4e100bebf4 (target/rx: Replace operand with prt_ldmi in 
disassembler)
10/21 Checking commit 56e42be2740a (target/rx: Use prt_ldmi for XCHG_mr 
disassembly)
11/21 Checking commit d541b2f36ec7 (target/rx: Emit all disassembly in one 
prt())
12/21 Checking commit 136dde281494 (target/rx: Collect all bytes during 
disassembly)
13/21 Checking commit 0871fb92b151 (target/rx: Dump bytes for each insn during 
disassembly)
14/21 Checking commit

Re: [PULL 0/2] Tracing patches

2019-10-15 Thread Peter Maydell

On Mon, 14 Oct 2019 at 09:57, Stefan Hajnoczi  wrote:
>
> The following changes since commit 98b2e3c9ab3abfe476a2b02f8f51813edb90e72d:
>
>   Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' 
> into staging (2019-10-08 16:08:35 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/stefanha/qemu.git tags/tracing-pull-request
>
> for you to fetch changes up to a1f4fc951a277c49a25418cafb028ec5529707fa:
>
>   trace: avoid "is" with a literal Python 3.8 warnings (2019-10-14 09:54:46 
> +0100)
>
> 
> Pull request
>
> 
>
> Stefan Hajnoczi (2):
>   trace: add --group=all to tracing.txt
>   trace: avoid "is" with a literal Python 3.8 warnings
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM

[PATCH v2] migration: Support QLIST migration

2019-10-15 Thread Eric Auger

Support QLIST migration using the same principle as QTAILQ:
94869d5c52 ("migration: migrate QTAILQ").

The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
The change mainly resides in QLIST_RAW_INSERT_TAIL implementation.

Tests also are provided.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- rebase on top of gtree addition
- add trace points
- add g_free on error
---
 include/migration/vmstate.h |  21 ++
 include/qemu/queue.h|  30 
 migration/trace-events  |   5 ++
 migration/vmstate-types.c   |  69 ++
 tests/test-vmstate.c| 135 
 5 files changed, 260 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index b9ee563aa4..ea2f1f4749 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -225,6 +225,7 @@ extern const VMStateInfo vmstate_info_tmp;
 extern const VMStateInfo vmstate_info_bitmap;
 extern const VMStateInfo vmstate_info_qtailq;
 extern const VMStateInfo vmstate_info_gtree;
+extern const VMStateInfo vmstate_info_qlist;
 
 #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
 /*
@@ -794,6 +795,26 @@ extern const VMStateInfo vmstate_info_gtree;
 .offset   = offsetof(_state, _field),  
\
 }
 
+/*
+ * For migrating a QLIST
+ * Target QLIST needs be properly initialized.
+ * _type: type of QLIST element
+ * _next: name of QLIST_ENTRY entry field in QLIST element
+ * _vmsd: VMSD for QLIST element
+ * size: size of QLIST element
+ * start: offset of QLIST_ENTRY in QTAILQ element
+ */
+#define VMSTATE_QLIST_V(_field, _state, _version, _vmsd, _type, _next)  \
+{\
+.name = (stringify(_field)), \
+.version_id   = (_version),  \
+.vmsd = &(_vmsd),\
+.size = sizeof(_type),   \
+.info = &vmstate_info_qlist, \
+.offset   = offsetof(_state, _field),\
+.start= offsetof(_type, _next),  \
+}
+
 /* _f : field name
_f_n : num of elements field_name
_n : num of elements
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 73bf4a984d..e965b4d18d 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -491,4 +491,34 @@ union {
 \
 QTAILQ_RAW_TQH_CIRC(head)->tql_prev = QTAILQ_RAW_TQE_CIRC(elm, entry); 
 \
 } while (/*CONSTCOND*/0)
 
+#define QLIST_RAW_FIRST(head)  
\
+field_at_offset(head, 0, void *)
+
+#define QLIST_RAW_NEXT(elm, entry) 
\
+field_at_offset(elm, entry, void *)
+
+#define QLIST_RAW_PREVIOUS(elm, entry) 
\
+field_at_offset(elm, entry + sizeof(void *), void *)
+
+#define QLIST_RAW_FOREACH(elm, head, entry)
\
+for ((elm) = *QLIST_RAW_FIRST(head);   
\
+ (elm);
\
+ (elm) = *QLIST_RAW_NEXT(elm, entry))
+
+#define QLIST_RAW_INSERT_TAIL(head, elm, entry) do {   
\
+void *iter, *last = NULL;  
\
+*QLIST_RAW_NEXT(elm, entry) = NULL;
\
+if (!*QLIST_RAW_FIRST(head)) { 
\
+*QLIST_RAW_FIRST(head) = elm;  
\
+*QLIST_RAW_PREVIOUS(elm, entry) = head;
\
+break; 
\
+}  
\
+for (iter = *QLIST_RAW_FIRST(head);
\
+ iter; last = iter, iter = *QLIST_RAW_NEXT(iter, entry))   
\
+{ }
\
+*QLIST_RAW_NEXT(last, entry) = elm;
\
+*QLIST_RAW_PREVIOUS(elm, entry) = last;
\
+} while (0)
+
+
 #endif /* QEMU_SYS_QUEUE_H */
diff --git a/migration/trace-events b/migration/trace-events
index 6dee7b5389..e0a33cffca 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -76,6 +76,11 @@ get_gtree_end(const char *field_name, const char 
*key_vmsd_name, const char *val
 put_gtree(const char *field_name, const char *key_vmsd_name, const char 
*val_vmsd_name, uint32_t nnodes) "%s(%s/%s) nnodes=%d"
 put_gtree_end(const char *field_name, co

[PATCH] doc: Describe missing generic -blockdev options

2019-10-15 Thread Kevin Wolf

We added more generic options after introducing -blockdev and forgot to
update the documentation (man page and --help output) accordingly. Do
that now.

Signed-off-by: Kevin Wolf 
---
 qemu-options.hx | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 793d70ff93..9f6aa3dde3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -849,7 +849,8 @@ ETEXI
 DEF("blockdev", HAS_ARG, QEMU_OPTION_blockdev,
 "-blockdev [driver=]driver[,node-name=N][,discard=ignore|unmap]\n"
 "  [,cache.direct=on|off][,cache.no-flush=on|off]\n"
-"  [,read-only=on|off][,detect-zeroes=on|off|unmap]\n"
+"  [,read-only=on|off][,auto-read-only=on|off]\n"
+"  [,force-share=on|off][,detect-zeroes=on|off|unmap]\n"
 "  [,driver specific parameters...]\n"
 "configure a block backend\n", QEMU_ARCH_ALL)
 STEXI
@@ -885,6 +886,22 @@ name is not intended to be predictable and changes between 
QEMU invocations.
 For the top level, an explicit node name must be specified.
 @item read-only
 Open the node read-only. Guest write attempts will fail.
+
+Note that some block drivers support only read-only access, either generally or
+in certain configurations. In this case, the default value
+@option{read-only=off} does not work and the option must be specified
+explicitly.
+@item auto-read-only
+If @option{auto-read-only=on} is set, QEMU is allowed not to open the image
+read-write even if @option{read-only=off} is requested, but fall back to
+read-only instead (and switch between the modes later), e.g. depending on
+whether the image file is writable or whether a writing user is attached to the
+node.
+@item force-share
+Override the image locking system of QEMU and force the node to allowing
+sharing all permissions with other uses.
+
+Enabling @option{force-share=on} requires @option{read-only=on}.
 @item cache.direct
 The host page cache can be avoided with @option{cache.direct=on}. This will
 attempt to do disk IO directly to the guest's memory. QEMU may still perform an
-- 
2.20.1

[PATCH] blockdev: Use error_report() in hmp_commit()

2019-10-15 Thread Kevin Wolf

Instead of using monitor_printf() to report errors, hmp_commit() should
use error_report() like other places do.

Signed-off-by: Kevin Wolf 
---
 blockdev.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index f89e48fc79..e2358966c3 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1088,11 +1088,11 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 
 blk = blk_by_name(device);
 if (!blk) {
-monitor_printf(mon, "Device '%s' not found\n", device);
+error_report("Device '%s' not found", device);
 return;
 }
 if (!blk_is_available(blk)) {
-monitor_printf(mon, "Device '%s' has no medium\n", device);
+error_report("Device '%s' has no medium", device);
 return;
 }
 
@@ -1105,8 +1105,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 aio_context_release(aio_context);
 }
 if (ret < 0) {
-monitor_printf(mon, "'commit' error for '%s': %s\n", device,
-   strerror(-ret));
+error_report("'commit' error for '%s': %s", device, strerror(-ret));
 }
 }
 
-- 
2.20.1

[PATCH v3] migration: Support QLIST migration

2019-10-15 Thread Eric Auger

Support QLIST migration using the same principle as QTAILQ:
94869d5c52 ("migration: migrate QTAILQ").

The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
The change mainly resides in QLIST_RAW_INSERT_TAIL implementation.

Tests also are provided.

Signed-off-by: Eric Auger 

---

v2 -> v3:
- remove 2 spurious changes in gtree tests

v1 -> v2:
- rebase on top of gtree addition
- add trace points
- add g_free on error
---
 include/migration/vmstate.h |  21 ++
 include/qemu/queue.h|  30 
 migration/trace-events  |   5 ++
 migration/vmstate-types.c   |  69 +++
 tests/test-vmstate.c| 133 
 5 files changed, 258 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index b9ee563aa4..ea2f1f4749 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -225,6 +225,7 @@ extern const VMStateInfo vmstate_info_tmp;
 extern const VMStateInfo vmstate_info_bitmap;
 extern const VMStateInfo vmstate_info_qtailq;
 extern const VMStateInfo vmstate_info_gtree;
+extern const VMStateInfo vmstate_info_qlist;
 
 #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
 /*
@@ -794,6 +795,26 @@ extern const VMStateInfo vmstate_info_gtree;
 .offset   = offsetof(_state, _field),  
\
 }
 
+/*
+ * For migrating a QLIST
+ * Target QLIST needs be properly initialized.
+ * _type: type of QLIST element
+ * _next: name of QLIST_ENTRY entry field in QLIST element
+ * _vmsd: VMSD for QLIST element
+ * size: size of QLIST element
+ * start: offset of QLIST_ENTRY in QTAILQ element
+ */
+#define VMSTATE_QLIST_V(_field, _state, _version, _vmsd, _type, _next)  \
+{\
+.name = (stringify(_field)), \
+.version_id   = (_version),  \
+.vmsd = &(_vmsd),\
+.size = sizeof(_type),   \
+.info = &vmstate_info_qlist, \
+.offset   = offsetof(_state, _field),\
+.start= offsetof(_type, _next),  \
+}
+
 /* _f : field name
_f_n : num of elements field_name
_n : num of elements
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 73bf4a984d..e965b4d18d 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -491,4 +491,34 @@ union {
 \
 QTAILQ_RAW_TQH_CIRC(head)->tql_prev = QTAILQ_RAW_TQE_CIRC(elm, entry); 
 \
 } while (/*CONSTCOND*/0)
 
+#define QLIST_RAW_FIRST(head)  
\
+field_at_offset(head, 0, void *)
+
+#define QLIST_RAW_NEXT(elm, entry) 
\
+field_at_offset(elm, entry, void *)
+
+#define QLIST_RAW_PREVIOUS(elm, entry) 
\
+field_at_offset(elm, entry + sizeof(void *), void *)
+
+#define QLIST_RAW_FOREACH(elm, head, entry)
\
+for ((elm) = *QLIST_RAW_FIRST(head);   
\
+ (elm);
\
+ (elm) = *QLIST_RAW_NEXT(elm, entry))
+
+#define QLIST_RAW_INSERT_TAIL(head, elm, entry) do {   
\
+void *iter, *last = NULL;  
\
+*QLIST_RAW_NEXT(elm, entry) = NULL;
\
+if (!*QLIST_RAW_FIRST(head)) { 
\
+*QLIST_RAW_FIRST(head) = elm;  
\
+*QLIST_RAW_PREVIOUS(elm, entry) = head;
\
+break; 
\
+}  
\
+for (iter = *QLIST_RAW_FIRST(head);
\
+ iter; last = iter, iter = *QLIST_RAW_NEXT(iter, entry))   
\
+{ }
\
+*QLIST_RAW_NEXT(last, entry) = elm;
\
+*QLIST_RAW_PREVIOUS(elm, entry) = last;
\
+} while (0)
+
+
 #endif /* QEMU_SYS_QUEUE_H */
diff --git a/migration/trace-events b/migration/trace-events
index 6dee7b5389..e0a33cffca 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -76,6 +76,11 @@ get_gtree_end(const char *field_name, const char 
*key_vmsd_name, const char *val
 put_gtree(const char *field_name, const char *key_vmsd_name, const char 
*val_vmsd_name, uint32_t nnodes) "%s(%s/%

RE: [PATCH v4 2/2] i386: Add support to get/set/migrate Intel Processor Trace feature

2019-10-15 Thread Kang, Luwei

qemu> > diff --git a/target/i386/kvm.c b/target/i386/kvm.c index
> > f9f4cd1..097c953 100644
> > --- a/target/i386/kvm.c
> > +++ b/target/i386/kvm.c
> > @@ -1811,6 +1811,25 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> >  kvm_msr_entry_add(cpu, MSR_MTRRphysMask(i), mask);
> >  }
> >  }
> > +if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
> > +int addr_num = kvm_arch_get_supported_cpuid(kvm_state,
> > +0x14, 1, R_EAX) &
> > + 0x7;
> > +
> > +kvm_msr_entry_add(cpu, MSR_IA32_RTIT_CTL,
> > +env->msr_rtit_ctrl);
> > +kvm_msr_entry_add(cpu, MSR_IA32_RTIT_STATUS,
> > +env->msr_rtit_status);
> > +kvm_msr_entry_add(cpu, MSR_IA32_RTIT_OUTPUT_BASE,
> > +env->msr_rtit_output_base);
> 
> This causes the following crash on some hosts:
> 
>   qemu-system-x86_64: error: failed to set MSR 0x560 to 0x0
>   qemu-system-x86_64: target/i386/kvm.c:2673: kvm_put_msrs: Assertion `ret == 
> cpu->kvm_msr_buf->nmsrs' failed.
> 
> Checking for CPUID_7_0_EBX_INTEL_PT is not enough: KVM has additional 
> conditions that might prevent writing to this MSR
> (PT_CAP_topa_output && PT_CAP_single_range_output).  This causes QEMU to 
> crash if some of the conditions aren't met.
> 
> Writing and reading this MSR (and the ones below) need to be conditional on 
> KVM_GET_MSR_INDEX_LIST.
> 

Hi Eduardo,
I found this issue can't be reproduced in upstream source code but can be 
reproduced on RHEL8.1. I haven't got the qemu source code of RHEL8.1. But after 
adding some trace in KVM, I found the KVM has reported the complete Intel PT 
CPUID information to qemu but the Intel PT CPUID (0x14) is lost when qemu 
setting the CPUID to KVM (cpuid level is 0xd). It looks like lost the below 
patch.

commit f24c3a79a415042f6dc195f029a2ba7247d14cac
Author: Luwei Kang 
Date:   Tue Jan 29 18:52:59 2019 -0500
i386: extended the cpuid_level when Intel PT is enabled

Intel Processor Trace required CPUID[0x14] but the cpuid_level
have no change when create a kvm guest with
e.g. "-cpu qemu64,+intel-pt".

Thanks,
Luwei Kang

Re: [PULL 01/19] util/hbitmap: strict hbitmap_reset

2019-10-15 Thread John Snow




On 10/15/19 4:44 AM, Kevin Wolf wrote:
> Am 14.10.2019 um 20:10 hat John Snow geschrieben:
>>
>>
>> On 10/11/19 7:18 PM, John Snow wrote:
>>>
>>>
>>> On 10/11/19 5:48 PM, Eric Blake wrote:
 On 10/11/19 4:25 PM, John Snow wrote:
> From: Vladimir Sementsov-Ogievskiy 
>
> hbitmap_reset has an unobvious property: it rounds requested region up.
> It may provoke bugs, like in recently fixed write-blocking mode of
> mirror: user calls reset on unaligned region, not keeping in mind that
> there are possible unrelated dirty bytes, covered by rounded-up region
> and information of this unrelated "dirtiness" will be lost.
>
> Make hbitmap_reset strict: assert that arguments are aligned, allowing
> only one exception when @start + @count == hb->orig_size. It's needed
> to comfort users of hbitmap_next_dirty_area, which cares about
> hb->orig_size.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Max Reitz 
> Message-Id: <20190806152611.280389-1-vsement...@virtuozzo.com>
> [Maintainer edit: Max's suggestions from on-list. --js]
> Signed-off-by: John Snow 
> ---
>   include/qemu/hbitmap.h | 5 +
>   tests/test-hbitmap.c   | 2 +-
>   util/hbitmap.c | 4 
>   3 files changed, 10 insertions(+), 1 deletion(-)
>

> +++ b/util/hbitmap.c
> @@ -476,6 +476,10 @@ void hbitmap_reset(HBitmap *hb, uint64_t start,
> uint64_t count)
>   /* Compute range in the last layer.  */
>   uint64_t first;
>   uint64_t last = start + count - 1;
> +    uint64_t gran = 1ULL << hb->granularity;
> +
> +    assert(!(start & (gran - 1)));
> +    assert(!(count & (gran - 1)) || (start + count == hb->orig_size));

 I know I'm replying a bit late (since this is now a pull request), but
 would it be worth using the dedicated macro:

 assert(QEMU_IS_ALIGNED(start, gran));
 assert(QEMU_IS_ALIGNED(count, gran) || start + count == hb->orig_size);

 instead of open-coding it?  (I would also drop the extra () around the
 right half of ||). If we want it, that would now be a followup patch.
>>
>> I've noticed that seasoned C programmers hate extra parentheses a lot.
>> I've noticed that I cannot remember operator precedence enough to ever
>> feel like this is actually an improvement.
>>
>> Something about a nice weighted tree of ((expr1) || (expr2)) feels
>> soothing to my weary eyes. So, if it's not terribly important, I'd
>> prefer to leave it as-is.
> 
> I don't mind the parentheses, but I do prefer QEMU_IS_ALIGNED() to the
> open-coded version. Would that be a viable compromise?
> 

Oh, I'm sorry! I did change that. I didn't mean to appear any more
stubborn than I actually am.

--js

Re: [PATCH 1/2] migration: Boost SaveStateEntry.instance_id to 64 bits

2019-10-15 Thread Juan Quintela

"Dr. David Alan Gilbert"  wrote:
> * Juan Quintela (quint...@redhat.com) wrote:
>> Peter Xu  wrote:
>> > It was "int" and used as 32bits fields (see save_section_header()).
>> > It's unsafe already because sizeof(int) could be 2 on i386, I think.
>> > So at least uint32_t would suite more.  While it also uses "-1" as a
>> > placeholder of "we want to generate the instance ID automatically".
>> > Hence a more proper value should be int64_t.
>> >
>> > This will start to be useful after next patch in which we can start to
>> > convert a real uint32_t value as instance ID.
>> >
>> > Signed-off-by: Peter Xu 
>> 
>> Hi
>> 
>> Being more helpful,  I think that it is better to just:
>> 
>> * change instance_id to be an uint32_t (notice that for all architectures
>>   that we support, it is actually int32_t).
>> 
>> * export calculate_new_instance_id() and adjust callers that use -1.
>> 
>> or
>> 
>> * export a new function that just use the calculate_new_instance_id()
>
> Do you mean that we end up with two functions, one that does it
> automatically, and one that takes an ID?

That is one option.

The other is that we export calculate_new_instance_id(), and we use that
instead of -1.

Later, Juan.

Re: [PATCH v4 2/2] i386: Add support to get/set/migrate Intel Processor Trace feature

2019-10-15 Thread Eduardo Habkost

On Tue, Oct 15, 2019 at 12:51:48PM +, Kang, Luwei wrote:
> qemu> > diff --git a/target/i386/kvm.c b/target/i386/kvm.c index
> > > f9f4cd1..097c953 100644
> > > --- a/target/i386/kvm.c
> > > +++ b/target/i386/kvm.c
> > > @@ -1811,6 +1811,25 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> > >  kvm_msr_entry_add(cpu, MSR_MTRRphysMask(i), mask);
> > >  }
> > >  }
> > > +if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
> > > +int addr_num = kvm_arch_get_supported_cpuid(kvm_state,
> > > +0x14, 1, R_EAX) &
> > > + 0x7;
> > > +
> > > +kvm_msr_entry_add(cpu, MSR_IA32_RTIT_CTL,
> > > +env->msr_rtit_ctrl);
> > > +kvm_msr_entry_add(cpu, MSR_IA32_RTIT_STATUS,
> > > +env->msr_rtit_status);
> > > +kvm_msr_entry_add(cpu, MSR_IA32_RTIT_OUTPUT_BASE,
> > > +env->msr_rtit_output_base);
> > 
> > This causes the following crash on some hosts:
> > 
> >   qemu-system-x86_64: error: failed to set MSR 0x560 to 0x0
> >   qemu-system-x86_64: target/i386/kvm.c:2673: kvm_put_msrs: Assertion `ret 
> > == cpu->kvm_msr_buf->nmsrs' failed.
> > 
> > Checking for CPUID_7_0_EBX_INTEL_PT is not enough: KVM has additional 
> > conditions that might prevent writing to this MSR
> > (PT_CAP_topa_output && PT_CAP_single_range_output).  This causes QEMU to 
> > crash if some of the conditions aren't met.
> > 
> > Writing and reading this MSR (and the ones below) need to be conditional on 
> > KVM_GET_MSR_INDEX_LIST.
> > 
> 
> Hi Eduardo,
> I found this issue can't be reproduced in upstream source code but can be 
> reproduced on RHEL8.1. I haven't got the qemu source code of RHEL8.1. But 
> after adding some trace in KVM, I found the KVM has reported the complete 
> Intel PT CPUID information to qemu but the Intel PT CPUID (0x14) is lost when 
> qemu setting the CPUID to KVM (cpuid level is 0xd). It looks like lost the 
> below patch.
> 
> commit f24c3a79a415042f6dc195f029a2ba7247d14cac
> Author: Luwei Kang 
> Date:   Tue Jan 29 18:52:59 2019 -0500
> i386: extended the cpuid_level when Intel PT is enabled
> 
> Intel Processor Trace required CPUID[0x14] but the cpuid_level
> have no change when create a kvm guest with
> e.g. "-cpu qemu64,+intel-pt".

Thanks for the pointer.  This may avoid triggering the bug in the
default configuration, but we still need to make the MSR writing
conditional on KVM_GET_MSR_INDEX_LIST.  Older machine-types have
x-intel-pt-auto-level=off, and the user may set `level` manually.

-- 
Eduardo

Re: [PATCH] doc: Describe missing generic -blockdev options

2019-10-15 Thread Peter Maydell

On Tue, 15 Oct 2019 at 13:40, Kevin Wolf  wrote:
>
> We added more generic options after introducing -blockdev and forgot to
> update the documentation (man page and --help output) accordingly. Do
> that now.
>
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-options.hx | 19 ++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 793d70ff93..9f6aa3dde3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -849,7 +849,8 @@ ETEXI
>  DEF("blockdev", HAS_ARG, QEMU_OPTION_blockdev,
>  "-blockdev [driver=]driver[,node-name=N][,discard=ignore|unmap]\n"
>  "  [,cache.direct=on|off][,cache.no-flush=on|off]\n"
> -"  [,read-only=on|off][,detect-zeroes=on|off|unmap]\n"
> +"  [,read-only=on|off][,auto-read-only=on|off]\n"
> +"  [,force-share=on|off][,detect-zeroes=on|off|unmap]\n"
>  "  [,driver specific parameters...]\n"
>  "configure a block backend\n", QEMU_ARCH_ALL)
>  STEXI
> @@ -885,6 +886,22 @@ name is not intended to be predictable and changes 
> between QEMU invocations.
>  For the top level, an explicit node name must be specified.
>  @item read-only
>  Open the node read-only. Guest write attempts will fail.
> +
> +Note that some block drivers support only read-only access, either generally 
> or
> +in certain configurations. In this case, the default value
> +@option{read-only=off} does not work and the option must be specified
> +explicitly.
> +@item auto-read-only
> +If @option{auto-read-only=on} is set, QEMU is allowed not to open the image
> +read-write even if @option{read-only=off} is requested, but fall back to
> +read-only instead (and switch between the modes later), e.g. depending on
> +whether the image file is writable or whether a writing user is attached to 
> the
> +node.
> +@item force-share
> +Override the image locking system of QEMU and force the node to allowing
> +sharing all permissions with other uses.

Grammar nit: "to allow sharing"; but maybe the phrasing could
be clarified anyway -- I'm not entirely sure what 'sharing
permissions" would be. The first part of the sentence suggests
this option is "force the image file to be opened even if some
other QEMU instance has it open already", but the second half
soudns like "don't lock the image, so that some other use later
is allowed to open it" ? Or is it both, or something else?

> +
> +Enabling @option{force-share=on} requires @option{read-only=on}.

thanks
-- PMM

Re: [PATCH] doc: Describe missing generic -blockdev options

2019-10-15 Thread Eric Blake


On 10/15/19 7:38 AM, Kevin Wolf wrote:

We added more generic options after introducing -blockdev and forgot to
update the documentation (man page and --help output) accordingly. Do
that now.

Signed-off-by: Kevin Wolf 
---
  qemu-options.hx | 19 ++-
  1 file changed, 18 insertions(+), 1 deletion(-)




@@ -885,6 +886,22 @@ name is not intended to be predictable and changes between 
QEMU invocations.
  For the top level, an explicit node name must be specified.
  @item read-only
  Open the node read-only. Guest write attempts will fail.
+
+Note that some block drivers support only read-only access, either generally or
+in certain configurations. In this case, the default value
+@option{read-only=off} does not work and the option must be specified
+explicitly.
+@item auto-read-only
+If @option{auto-read-only=on} is set, QEMU is allowed not to open the image
+read-write even if @option{read-only=off} is requested, but fall back to
+read-only instead (and switch between the modes later), e.g. depending on
+whether the image file is writable or whether a writing user is attached to the
+node.


Hard to read.  Maybe:

If @option{auto-read-only=on} is set, QEMU may fall back to read-only 
usage even when @option{read-only=off} is requested, or even switch 
between modes as needed, e.g. depending on whether the image file is 
writable or whether a writing user is attached to the node.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] blockdev: Use error_report() in hmp_commit()

2019-10-15 Thread Eric Blake


On 10/15/19 7:39 AM, Kevin Wolf wrote:

Instead of using monitor_printf() to report errors, hmp_commit() should
use error_report() like other places do.

Signed-off-by: Kevin Wolf 
---
  blockdev.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

1 2 3 >

1 - 100 of 299 matches

Mail list logo