date:20170804

Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME

2017-08-04 Thread Peter Xu

On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > +{
> > +/*
> > + * This means source VM is ready to resume the postcopy migration.
> > + * It's time to switch state and release the fault thread to
> > + * continue service page faults.
> > + */
> > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > +  MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > +qemu_sem_post(&mis->postcopy_pause_sem_fault);
> 
> Is it worth sanity checking that you were in RECOVER at this point?

Yeah, it never hurts.  Will do.

-- 
Peter Xu

Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME

2017-08-04 Thread Peter Xu

On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote:
> On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:
> 
> [...]
> 
> > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > > +{
> > > +/*
> > > + * This means source VM is ready to resume the postcopy migration.
> > > + * It's time to switch state and release the fault thread to
> > > + * continue service page faults.
> > > + */
> > > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > > +  MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > +qemu_sem_post(&mis->postcopy_pause_sem_fault);
> > 
> > Is it worth sanity checking that you were in RECOVER at this point?
> 
> Yeah, it never hurts.  Will do.

Not sure whether this would be good (note: I returned 0 in the if):

diff --git a/migration/savevm.c b/migration/savevm.c
index b7843c2..b34f59b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1709,6 +1709,12 @@ static int 
loadvm_postcopy_handle_run(MigrationIncomingState *mis)
 
 static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
 {
+if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+error_report("%s: illegal resume received", __func__);
+/* Don't fail the load, only for this. */
+return 0;
+}
+
 /*
  * This means source VM is ready to resume the postcopy migration.
  * It's time to switch state and release the fault thread to

Basically I just don't want to crash the dest VM (it holds hot dirty
pages) even if it receives a faulty RESUME command.

-- 
Peter Xu

Re: [Qemu-devel] [PATCH for-2.11 0/8] tcg/s390 improvments

2017-08-04 Thread Fam Zheng

On Thu, 08/03 22:50, no-re...@patchew.org wrote:
> /var/tmp/patchew-tester-tmp-_vw086tj/src/accel/tcg/translate-all.c:2218:1: 
> fatal error: error writing to /tmp/cctUT4t6.s: No space left on device

Sorry for the false positive. I've cleaned up the disk space and restarted the
test.

Fam

Re: [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK

2017-08-04 Thread Peter Xu

On Thu, Aug 03, 2017 at 12:21:41PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > +static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
> > +{
> > +trace_source_return_path_thread_resume_ack(value);
> > +
> > +/*
> > + * Currently value will always be one. It can be used in the
> > + * future to notify source that destination cannot continue.
> > + */
> > +assert(value == 1);
> 
> Again I prefer the routine to fail than to assert.
> Maybe it's worth having a constant rather than the magic 1.

Will do.  Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH 0/2] scsi: enclosure support

2017-08-04 Thread Hannes Reinecke

On 08/04/2017 08:10 AM, Paolo Bonzini wrote:
> 
>> On 08/03/2017 05:10 PM, Paolo Bonzini wrote:
>>> On 03/08/2017 15:26, Hannes Reinecke wrote:
 Hi all,

 due to a customer issue I've added simple subenclosure support
 to the SCSI emulation. The patch simply converts the current invisible
 LUN0 into an enclosure device; existing setups using LUN0 as disks or
 CD-ROMs will not be affected.
>>>
>>> What is the issue exactly?  That is, for what is it necessary to have a
>>> dummy enclosure?
>>>
>> Well, stock linux displays some very interesting error messages for
>> these types of enclosures. Which was the prime mover for doing this.
> 
> --verbose?
> 
[   12.958454] scsi 1:0:0:254: Wrong diagnostic page; asked for 2 got 0
[   12.958456] scsi 1:0:0:254: Failed to get diagnostic page 0xffea
[   12.958457] scsi 1:0:0:254: Failed to bind enclosure -19
[   12.959392] ses 1:0:0:254: Attached Enclosure device

>>> I agree with Dan that this need machine type compatibility gunk.  For
>>> example, could the new device affect /dev/sgN numbering?
>>
>> Yes, indeed it would.
>>
>> What about a new option to the scsi driver?
> 
> If you do that, you've done 99% of the work to do compatibility so I
> won't complain and do the 1% myself. :)
> 
Okay, will be doing so.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare

2017-08-04 Thread Peter Xu

On Thu, Aug 03, 2017 at 12:38:01PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > This is hook function to be called when a postcopy migration wants to
> > resume from a failure. For each module, it should provide its own
> > recovery logic before we switch to the postcopy-active state.
> 
> Would a change-state handler be able to do this,

We don't have such a change-state handler, do we?

> or perhaps
> the notifier chain I have in my shared memory world:
>  https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06459.html

The postcopy_notify() can do this as well. I just need a way to hook
up all the system modules for migration. In our case, it's only RAM,
but I think maybe one day we need block support. So as long as the
mechanism (either current SaveVMHandlers interface, or
postcopy_notify) can do the notification, IMHO it'll be fine.

-- 
Peter Xu

Re: [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume

2017-08-04 Thread Peter Xu

On Thu, Aug 03, 2017 at 12:56:31PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > @@ -256,6 +257,8 @@ struct RAMState {
> >  RAMBlock *last_req_rb;
> >  /* Queue of outstanding page requests from the destination */
> >  QemuMutex src_page_req_mutex;
> > +/* Ramblock counts to sync dirty bitmap. Only used for recovery */
> > +int ramblock_to_sync;
> >  QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
> >  };
> >  typedef struct RAMState RAMState;
> > @@ -2731,6 +2734,57 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> > version_id)
> >  return ret;
> >  }
> >  
> > +/* Sync all the dirty bitmap with destination VM.  */
> > +static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
> > +{
> > +RAMBlock *block;
> > +QEMUFile *file = s->to_dst_file;
> > +int ramblock_count = 0;
> > +
> > +trace_ram_dirty_bitmap_sync("start");
> 
> Most (but not all) of our trace_ uses have separate trace_ entries for
> each step; e.g.   trace_ram_dirty_bitmap_sync_start9)

Okay.

> 
> > +/*
> > + * We need to take the resume lock to make sure that the send
> > + * thread (current thread) and the rp-thread will do their work in
> > + * order.
> > + */
> > +qemu_mutex_lock(&s->resume_lock);
> > +
> > +/* Request for receive-bitmap for each block */
> > +RAMBLOCK_FOREACH(block) {
> > +ramblock_count++;
> > +qemu_savevm_send_recv_bitmap(file, block->idstr);
> > +}
> > +
> > +/* Init the ramblock count to total */
> > +atomic_set(&rs->ramblock_to_sync, ramblock_count);
> > +
> > +trace_ram_dirty_bitmap_sync("wait-bitmap");
> > +
> > +/* Wait until all the ramblocks' dirty bitmap synced */
> > +while (rs->ramblock_to_sync) {
> > +qemu_cond_wait(&s->resume_cond, &s->resume_lock);
> > +}
> 
> Does the locking here get simpler if you:
>   a) count the number of RAMBlocks 'n'
>   b) Initialise a sempahore to -(n-1)
>   c) Call qemu_savevm_send_recv_bitmap for each bitmap
>   d) sem_wait on the semaphore - which is waiting for the semaphore to
>  be >0
> 
> as you receive each bitmap do a sem_post; on the last one
> it should go from 0->1 and the sem_wait should wake up?

I think you are right. :-) A single semaphore suffice here (and also
for the following up handshake).

I will touch up the commit message as well.  Thanks,

-- 
Peter Xu

[Qemu-devel] [Bug 1336794] Re: 9pfs does not honor open file handles on unlinked files

2017-08-04 Thread Greg Kurz

Hi Maxim,

No I didn't...

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1336794

Title:
  9pfs does not honor open file handles on unlinked files

Status in QEMU:
  In Progress
Status in Ubuntu:
  Confirmed

Bug description:
  This was originally filed over here:
  https://bugzilla.redhat.com/show_bug.cgi?id=1114221

  The open-unlink-fstat idiom used in some places to create an anonymous
  private temporary file does not work in a QEMU guest over a virtio-9p
  filesystem.

  Version-Release number of selected component (if applicable):

  qemu-kvm-1.6.2-6.fc20.x86_64
  qemu-system-x86-1.6.2-6.fc20.x86_64
  (those are fedora RPMs)

  How reproducible:

  Always. See this example C program:

  https://bugzilla.redhat.com/attachment.cgi?id=913069

  Steps to Reproduce:
  1. Export a filesystem with virt-manager for the guest.
(type: mount, driver: default, mode: passthrough)
  2. Start guest and mount that filesystem
(mount -t 9p -o trans=virtio,version=9p2000.L  ...)
  3. Run a program that uses open-unlink-fstat
(in my case it was trying to compile Perl 5.20)

  Actual results:

  fstat fails:

  open("/home/tst/filename", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
  unlink("/home/tst/filename")= 0
  fstat(3, 0x23aa1a8) = -1 ENOENT (No such file or 
directory)
  close(3)

  Expected results:

  open("/home/tst/filename", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
  unlink("/home/tst/filename")= 0
  fstat(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
  fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
  close(3) 

  Additional info:

  There was a patch put into the kernel back in '07 to handle this very
  problem for other filesystems; maybe its helpful:

http://lwn.net/Articles/251228/

  There is also a thread on LKML from last December specifically about
  this very problem:

https://lkml.org/lkml/2013/12/31/163

  There was a discussion on the QEMU list back in '11 that doesn't seem
  to have come to a conclusion, but did provide the test program that
  i've attached to this report:

http://marc.info/?l=qemu-devel&m=130443605720648&w=2

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1336794/+subscriptions

Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME

2017-08-04 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote:
> > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:
> > 
> > [...]
> > 
> > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > > > +{
> > > > +/*
> > > > + * This means source VM is ready to resume the postcopy migration.
> > > > + * It's time to switch state and release the fault thread to
> > > > + * continue service page faults.
> > > > + */
> > > > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > > > +  MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > > +qemu_sem_post(&mis->postcopy_pause_sem_fault);
> > > 
> > > Is it worth sanity checking that you were in RECOVER at this point?
> > 
> > Yeah, it never hurts.  Will do.
> 
> Not sure whether this would be good (note: I returned 0 in the if):
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b7843c2..b34f59b 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1709,6 +1709,12 @@ static int 
> loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>  
>  static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
>  {
> +if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +error_report("%s: illegal resume received", __func__);
> +/* Don't fail the load, only for this. */
> +return 0;
> +}
> +
>  /*
>   * This means source VM is ready to resume the postcopy migration.
>   * It's time to switch state and release the fault thread to
> 
> Basically I just don't want to crash the dest VM (it holds hot dirty
> pages) even if it receives a faulty RESUME command.

Yes, so now that's a fun problem; effectively you then have 3 valid
failure modes:
a) An IO failure so we need to go into POSTCOPY_PAUSE
b) A fatal migration stream problem to quit
c) A non-fatal migration stream problem to go .. back into PAUSE?

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCHv2 1/4] scsi: Make LUN 0 a simple enclosure

2017-08-04 Thread Hannes Reinecke

Instead of having an 'invisible' LUN0 (in case LUN 0 is not connected)
this patch maks LUN0 a enclosure service, exposing it to the OS.

Signed-off-by: Hannes Reinecke 
---
 hw/scsi/scsi-bus.c | 56 +-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 23c51de..c89e82d 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -493,10 +493,11 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq *r)
 if (r->req.lun != 0) {
 r->buf[0] = TYPE_NO_LUN;
 } else {
-r->buf[0] = TYPE_NOT_PRESENT | TYPE_INACTIVE;
+r->buf[0] = TYPE_ENCLOSURE;
 r->buf[2] = 5; /* Version */
 r->buf[3] = 2 | 0x10; /* HiSup, response data format */
 r->buf[4] = r->len - 5; /* Additional Length = (Len - 1) - 4 */
+r->buf[6] = 0x40; /* Enclosure service */
 r->buf[7] = 0x10 | (r->req.bus->info->tcq ? 0x02 : 0); /* Sync, TCQ.  
*/
 memcpy(&r->buf[8], "QEMU", 8);
 memcpy(&r->buf[16], "QEMU TARGET ", 16);
@@ -505,6 +506,54 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq *r)
 return true;
 }
 
+static bool scsi_target_emulate_receive_diagnostic(SCSITargetReq *r)
+{
+uint8_t page_code = r->req.cmd.buf[2];
+unsigned char *enc_desc, *type_desc;
+
+assert(r->req.dev->lun != r->req.lun);
+
+scsi_target_alloc_buf(&r->req, 0x38);
+
+switch (page_code) {
+case 0x00:
+r->buf[r->len++] = page_code ; /* this page */
+r->buf[r->len++] = 0x00;
+r->buf[r->len++] = 0x00;
+r->buf[r->len++] = 0x03;
+r->buf[r->len++] = 0x00;
+r->buf[r->len++] = 0x01;
+r->buf[r->len++] = 0x08;
+break;
+case 0x01:
+memset(r->buf, 0, 0x38);
+r->buf[0] = page_code;
+r->buf[3] = 0x30;
+enc_desc = &r->buf[8];
+enc_desc[0] = 0x09;
+enc_desc[2] = 1;
+enc_desc[3] = 0x24;
+memcpy(&enc_desc[12], "QEMU", 8);
+memcpy(&enc_desc[20], "QEMU TARGET ", 16);
+pstrcpy((char *)&enc_desc[36], 4, qemu_hw_version());
+type_desc = &r->buf[48];
+type_desc[1] = 1;
+r->len = 0x38;
+break;
+case 0x08:
+r->buf[0] = page_code;
+r->buf[1] = 0x00;
+r->buf[2] = 0x00;
+r->buf[3] = 0x00;
+r->len = 4;
+break;
+default:
+return false;
+}
+r->len = MIN(r->req.cmd.xfer, r->len);
+return true;
+}
+
 static size_t scsi_sense_len(SCSIRequest *req)
 {
 if (req->dev->type == TYPE_SCANNER)
@@ -528,6 +577,11 @@ static int32_t scsi_target_send_command(SCSIRequest *req, 
uint8_t *buf)
 goto illegal_request;
 }
 break;
+case RECEIVE_DIAGNOSTIC:
+if (!scsi_target_emulate_receive_diagnostic(r)) {
+goto illegal_request;
+}
+break;
 case REQUEST_SENSE:
 scsi_target_alloc_buf(&r->req, scsi_sense_len(req));
 r->len = scsi_device_get_sense(r->req.dev, r->buf,
-- 
1.8.5.6

[Qemu-devel] [PATCHv2 2/4] scsi: use qemu_uuid to generate logical identifier for SES

2017-08-04 Thread Hannes Reinecke

The SES enclosure descriptor requires a logical identifier,
so generate one using the qemu_uuid and the Qumranet OUI.

Signed-off-by: Hannes Reinecke 
---
 hw/scsi/scsi-bus.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index c89e82d..8419c75 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -10,6 +10,7 @@
 #include "trace.h"
 #include "sysemu/dma.h"
 #include "qemu/cutils.h"
+#include "qemu/crc32c.h"
 
 static char *scsibus_get_dev_path(DeviceState *dev);
 static char *scsibus_get_fw_dev_path(DeviceState *dev);
@@ -533,6 +534,22 @@ static bool 
scsi_target_emulate_receive_diagnostic(SCSITargetReq *r)
 enc_desc[0] = 0x09;
 enc_desc[2] = 1;
 enc_desc[3] = 0x24;
+if (qemu_uuid_set) {
+uint32_t crc;
+
+/*
+ * Make this a NAA IEEE Registered identifier
+ * using Qumranet OUI (0x001A4A) and the
+ * crc32 from the system UUID.
+ */
+enc_desc[4] = 0x50;
+enc_desc[5] = 0x01;
+enc_desc[6] = 0xa4;
+enc_desc[7] = 0xa0;
+crc = crc32c(0x, qemu_uuid.data, 16);
+cpu_to_le32s(&crc);
+memcpy(&enc_desc[8], &crc, 4);
+}
 memcpy(&enc_desc[12], "QEMU", 8);
 memcpy(&enc_desc[20], "QEMU TARGET ", 16);
 pstrcpy((char *)&enc_desc[36], 4, qemu_hw_version());
-- 
1.8.5.6

[Qemu-devel] [PATCHv2 0/4] scsi: enclosure support

2017-08-04 Thread Hannes Reinecke

Hi all,

due to a customer issue I've added simple subenclosure support
to the SCSI emulation. By setting the 'enclosure' option to a
SCSI device we will now present an enclosure device on LUN0
(if LUN0 is otherwise unassigned) or setting the 'EncServ' bit
in the inquiry data if LUN0 is assigned to a device.

Changes to v1:
- Add patch to clarify sense code responses
- Add 'enclosure' option for SCSI devices

Hannes Reinecke (4):
  scsi: Make LUN 0 a simple enclosure
  scsi: use qemu_uuid to generate logical identifier for SES
  scsi: clarify sense codes for LUN0 emulation
  scsi: Add 'enclosure' option for scsi devices

 hw/scsi/scsi-bus.c | 85 --
 hw/scsi/scsi-disk.c|  4 ++-
 include/hw/scsi/scsi.h |  1 +
 3 files changed, 87 insertions(+), 3 deletions(-)

-- 
1.8.5.6

[Qemu-devel] [PATCHv2 4/4] scsi: Add 'enclosure' option for scsi devices

2017-08-04 Thread Hannes Reinecke

Make enclosure support optional with the 'enclosure' argument to
the scsi device.
Adding 'enclosure=on' as option to the SCSI device will present
an enclosure service device on LUN0, either as a stand-alone
LUN (in case LUN0 is not assigned) or by setting the EncServ bit
int the inquiry data if LUN0 is assigned to a block devices.

Signed-off-by: Hannes Reinecke 
---
 hw/scsi/scsi-bus.c | 11 ---
 hw/scsi/scsi-disk.c|  4 +++-
 include/hw/scsi/scsi.h |  1 +
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 79a222f..c11422b 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -22,6 +22,7 @@ static Property scsi_props[] = {
 DEFINE_PROP_UINT32("channel", SCSIDevice, channel, 0),
 DEFINE_PROP_UINT32("scsi-id", SCSIDevice, id, -1),
 DEFINE_PROP_UINT32("lun", SCSIDevice, lun, -1),
+DEFINE_PROP_BOOL("enclosure", SCSIDevice, enclosure, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -494,11 +495,14 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq *r)
 if (r->req.lun != 0) {
 r->buf[0] = TYPE_NO_LUN;
 } else {
-r->buf[0] = TYPE_ENCLOSURE;
+r->buf[0] = r->req.dev->enclosure ?
+TYPE_ENCLOSURE : TYPE_NOT_PRESENT | TYPE_INACTIVE;
 r->buf[2] = 5; /* Version */
 r->buf[3] = 2 | 0x10; /* HiSup, response data format */
 r->buf[4] = r->len - 5; /* Additional Length = (Len - 1) - 4 */
-r->buf[6] = 0x40; /* Enclosure service */
+if (r->req.dev->enclosure) {
+r->buf[6] = 0x40; /* Enclosure service */
+}
 r->buf[7] = 0x10 | (r->req.bus->info->tcq ? 0x02 : 0); /* Sync, TCQ.  
*/
 memcpy(&r->buf[8], "QEMU", 8);
 memcpy(&r->buf[16], "QEMU TARGET ", 16);
@@ -600,7 +604,8 @@ static int32_t scsi_target_send_command(SCSIRequest *req, 
uint8_t *buf)
 }
 break;
 case RECEIVE_DIAGNOSTIC:
-if (!scsi_target_emulate_receive_diagnostic(r)) {
+if (!r->req.dev->enclosure ||
+!scsi_target_emulate_receive_diagnostic(r)) {
 goto illegal_request;
 }
 break;
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 5f1e5e8..153d97d 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -792,7 +792,9 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
uint8_t *outbuf)
the additional length is not adjusted */
 outbuf[4] = 36 - 5;
 }
-
+if (s->qdev.lun == 0 && s->qdev.enclosure) {
+outbuf[6] = 0x40; /* Enclosure service */
+}
 /* Sync data transfer and TCQ.  */
 outbuf[7] = 0x10 | (req->bus->info->tcq ? 0x02 : 0);
 return buflen;
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 6b85786..243c185 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -100,6 +100,7 @@ struct SCSIDevice
 BlockConf conf;
 SCSISense unit_attention;
 bool sense_is_ua;
+bool enclosure;
 uint8_t sense[SCSI_SENSE_BUF_SIZE];
 uint32_t sense_len;
 QTAILQ_HEAD(, SCSIRequest) requests;
-- 
1.8.5.6

[Qemu-devel] [PATCHv2 3/4] scsi: clarify sense codes for LUN0 emulation

2017-08-04 Thread Hannes Reinecke

The LUN0 emulation is just that, an emulation for a non-existing
LUN0. So we should be returning LUN_NOT_SUPPORTED for any request
coming from any other LUN.
And we should be aborting unhandled commands with INVALID OPCODE,
not LUN NOT SUPPORTED.

Signed-off-by: Hannes Reinecke 
---
 hw/scsi/scsi-bus.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 8419c75..79a222f 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -583,6 +583,11 @@ static int32_t scsi_target_send_command(SCSIRequest *req, 
uint8_t *buf)
 {
 SCSITargetReq *r = DO_UPCAST(SCSITargetReq, req, req);
 
+if (req->lun != 0) {
+scsi_req_build_sense(req, SENSE_CODE(LUN_NOT_SUPPORTED));
+scsi_req_complete(req, CHECK_CONDITION);
+return 0;
+}
 switch (buf[0]) {
 case REPORT_LUNS:
 if (!scsi_target_emulate_report_luns(r)) {
@@ -613,7 +618,7 @@ static int32_t scsi_target_send_command(SCSIRequest *req, 
uint8_t *buf)
 case TEST_UNIT_READY:
 break;
 default:
-scsi_req_build_sense(req, SENSE_CODE(LUN_NOT_SUPPORTED));
+scsi_req_build_sense(req, SENSE_CODE(INVALID_OPCODE));
 scsi_req_complete(req, CHECK_CONDITION);
 return 0;
 illegal_request:
-- 
1.8.5.6

[Qemu-devel] [Bug 1708617] [NEW] qemu2.9 meet a question using reconnect about ovs+dpdk

2017-08-04 Thread jk

Public bug reported:

env:
qemu-2.9
dpdk-16.11
ovs-2.6.1
host os:Linux compute 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux
guest os is same with host

1. ovs service is normal
2. start 2 qemu vm, using vhostuser port as server
taskset 0x3f qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive 
file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object 
memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on 
-numa node,memdev=mem -mem-prealloc -chardev 
socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0,server -netdev 
type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 -device 
virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6  -nographic 
-vnc :0 
qemu-system-x86_64: -chardev 
socket,id=char1,path=/usr/local/var/run/openvswitch/dpdkvhostuser0,server: QEMU 
waiting for connection on: 
disconnected:unix:/usr/local/var/run/openvswitch/dpdkvhostuser0,server

taskset 0x3f qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive 
file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object 
memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on 
-numa node,memdev=mem -mem-prealloc -chardev 
socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0,server -netdev 
type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 -device 
virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6  -nographic 
-vnc :0 
qemu-system-x86_64: -chardev 
socket,id=char1,path=/usr/local/var/run/openvswitch/dpdkvhostuser0,server: QEMU 
waiting for connection on: 
disconnected:unix:/usr/local/var/run/openvswitch/dpdkvhostuser1,server

3. add br and vhostuser port as client
ovs-vsctl add-br br1 -- set bridge br1 datapath_type=netdev
ovs-vsctl add-port br1 dpdkvhostuser0 -- set Interface dpdkvhostuser0 
type=dpdkvhostuserclient  
options:vhost-server-path="/usr/local/var/run/openvswitch/dpdkvhostuser0"   

ovs-vsctl add-port br1 dpdkvhostuser1 -- set Interface dpdkvhostuser1 
type=dpdkvhostuserclient  
options:vhost-server-path="/usr/local/var/run/openvswitch/dpdkvhostuser1"   


4. ping between 2 vm is ok
5. restart ovs process
 systemctl restart openvswitch

6. there is a question ping between 2 vm error

7. change qemu from 2.9 to 2.7, everything become ok

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1708617

Title:
  qemu2.9 meet a question using reconnect about ovs+dpdk

Status in QEMU:
  New

Bug description:
  env:
  qemu-2.9
  dpdk-16.11
  ovs-2.6.1
  host os:Linux compute 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 
2016 x86_64 x86_64 x86_64 GNU/Linux
  guest os is same with host

  1. ovs service is normal
  2. start 2 qemu vm, using vhostuser port as server
  taskset 0x3f qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive 
file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object 
memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on 
-numa node,memdev=mem -mem-prealloc -chardev 
socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0,server -netdev 
type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 -device 
virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6  -nographic 
-vnc :0 
  qemu-system-x86_64: -chardev 
socket,id=char1,path=/usr/local/var/run/openvswitch/dpdkvhostuser0,server: QEMU 
waiting for connection on: 
disconnected:unix:/usr/local/var/run/openvswitch/dpdkvhostuser0,server

  taskset 0x3f qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive 
file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object 
memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on 
-numa node,memdev=mem -mem-prealloc -chardev 
socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0,server -netdev 
type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 -device 
virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6  -nographic 
-vnc :0 
  qemu-system-x86_64: -chardev 
socket,id=char1,path=/usr/local/var/run/openvswitch/dpdkvhostuser0,server: QEMU 
waiting for connection on: 
disconnected:unix:/usr/local/var/run/openvswitch/dpdkvhostuser1,server

  3. add br and vhostuser port as client
  ovs-vsctl add-br br1 -- set bridge br1 datapath_type=netdev
  ovs-vsctl add-port br1 dpdkvhostuser0 -- set Interface dpdkvhostuser0 
type=dpdkvhostuserclient  
options:vhost-server-path="/usr/local/var/run/openvswitch/dpdkvhostuser0"   
  
  ovs-vsctl add-port br1 dpdkvhostuser1 -- set Interface dpdkvhostuser1 
type=dpdkvhostuserclient  
options:vhost-server-path="/usr/local/var/run/openvswitch/dpdkvhostuser1"   
  

  4. ping between 2 vm is ok
  5. restart ovs process
   systemctl restart openvswitch

  6. there is a que

Re: [Qemu-devel] [RFC 27/29] migration: setup ramstate for resume

2017-08-04 Thread Peter Xu

On Thu, Aug 03, 2017 at 01:37:04PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > After we updated the dirty bitmaps of ramblocks, we also need to update
> > the critical fields in RAMState to make sure it is ready for a resume.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  migration/ram.c | 35 ++-
> >  1 file changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index c695b13..427bf6e 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1947,6 +1947,31 @@ static int ram_state_init(RAMState **rsp)
> >  return 0;
> >  }
> >  
> > +static void ram_state_resume_prepare(RAMState *rs)
> > +{
> > +RAMBlock *block;
> > +long pages = 0;
> > +
> > +/*
> > + * Postcopy is not using xbzrle/compression, so no need for that.
> > + * Also, since source are already halted, we don't need to care
> > + * about dirty page logging as well.
> > + */
> > +
> > +RAMBLOCK_FOREACH(block) {
> > +pages += bitmap_count_one(block->bmap,
> > +  block->max_length >> TARGET_PAGE_BITS);
> 
> Again I think that needs to be block->used_length (see
> migration_bitmap_sync).

Fixing.

> 
> > +}
> > +
> > +/* This may not be aligned with current bitmaps. Recalculate. */
> > +rs->migration_dirty_pages = pages;
> > +
> > +rs->last_seen_block = NULL;
> > +rs->last_sent_block = NULL;
> > +rs->last_page = 0;
> > +rs->last_version = ram_list.version;
> 
> A trace at this point with the pages count might be worthwhile.

Added.

> 
> > +}
> > +
> >  /*
> >   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
> >   * long-running RCU critical section.  When rcu-reclaims in the code
> > @@ -2842,8 +2867,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, 
> > RAMBlock *block)
> >  static int ram_resume_prepare(MigrationState *s, void *opaque)
> >  {
> >  RAMState *rs = *(RAMState **)opaque;
> > +int ret;
> >  
> > -return ram_dirty_bitmap_sync_all(s, rs);
> > +ret = ram_dirty_bitmap_sync_all(s, rs);
> 
> Interesting; I'd assumed you'd load directly into this
> bitmap rather than loading into the bitmap on each block.
> Do we ever get the case where a bitmap is set on the source
> bitmap but not in the loaded bitmap?

(confirmed with Dave offlist that this is not a problem, blame myself
 on using a poor function name "ram_dirty_bitmap_sync_all" which is
 too close to existng "migration_bitmap_sync")

-- 
Peter Xu

Re: [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed

2017-08-04 Thread Peter Xu

On Thu, Aug 03, 2017 at 02:54:35PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > Firstly, MigThrError enumeration is introduced to describe the error in
> > migration_detect_error() better. This gives the migration_thread() a
> > chance to know whether a recovery has happened.
> > 
> > Then, if a recovery is detected, migration_thread() will reset its local
> > variables to prepare for that.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  migration/migration.c | 40 +---
> >  1 file changed, 29 insertions(+), 11 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index ecebe30..439bc22 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2159,6 +2159,15 @@ static bool postcopy_should_start(MigrationState *s)
> >  return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
> >  }
> >  
> > +typedef enum MigThrError {
> > +/* No error detected */
> > +MIG_THR_ERR_NONE = 0,
> > +/* Detected error, but resumed successfully */
> > +MIG_THR_ERR_RECOVERED = 1,
> > +/* Detected fatal error, need to exit */
> > +MIG_THR_ERR_FATAL = 2,
> > +} MigThrError;
> > +
> 
> Could you move this patch earlier to when postcopy_pause is created
> so it's created with this enum?

Sure.

[...]

> > @@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque)
> >  /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> >  enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
> >  bool enable_colo = migrate_colo_enabled();
> > +MigThrError thr_error;
> >  
> >  rcu_register_thread();
> >  
> > @@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque)
> >   * Try to detect any kind of failures, and see whether we
> >   * should stop the migration now.
> >   */
> > -if (migration_detect_error(s)) {
> > +thr_error = migration_detect_error(s);
> > +if (thr_error == MIG_THR_ERR_FATAL) {
> > +/* Stop migration */
> >  break;
> > +} else if (thr_error == MIG_THR_ERR_RECOVERED) {
> > +/*
> > + * Just recovered from a e.g. network failure, reset all
> > + * the local variables.
> > + */
> > +initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +initial_bytes = 0;
> 
> They don't seem that important to reset?

The problem is that we have this in migration_thread():

if (current_time >= initial_time + BUFFER_DELAY) {
uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
 initial_bytes;
uint64_t time_spent = current_time - initial_time;
double bandwidth = (double)transferred_bytes / time_spent;
threshold_size = bandwidth * s->parameters.downtime_limit;
...
}

Here qemu_ftell() would possibly be very small since we have just
resumed... and then transferred_bytes will be extremely huge since
"qemu_ftell(s->to_dst_file) - initial_bytes" is actually negative...
Then, with luck, we'll got extremely huge "bandwidth" as well.

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] pc/acpi: Fix booting of macOS with Clover EFI bootloader

2017-08-04 Thread Igor Mammedov

On Fri,  4 Aug 2017 12:15:40 +0530
Dhiru Kholia  wrote:

> This was tested with macOS 10.12.5 and Clover r4114.
> 
> Without this patch, the macOS boot process gets stuck at the Apple logo
> without showing any progress bar.
> 
> I have documented the process of running macOS on QEMU/KVM at,
> 
> https://github.com/kholia/OSX-KVM/
> 
> Instead of using this patch, adding an additional command-line knob
> which exposes this setting (force_rev1_fadt) to the user might be a more
> general solution.
it's been reported that OVMF had issues that were fixed,
you probably want to read this thread:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg468456.html

CCing interested parties

> ---
>  hw/i386/acpi-build.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index b9c245c..0f8df19 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -145,6 +145,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
>  object_property_get_uint(obj, ACPI_PCIHP_IO_LEN_PROP, NULL);
>  }
>  if (lpc) {
> +pm->force_rev1_fadt = true;
>  obj = lpc;
>  pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>  }

Re: [Qemu-devel] [RFC 28/29] migration: final handshake for the resume

2017-08-04 Thread Peter Xu

On Thu, Aug 03, 2017 at 02:47:44PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > +static int postcopy_resume_handshake(MigrationState *s)
> > +{
> > +qemu_mutex_lock(&s->resume_lock);
> > +
> > +qemu_savevm_send_postcopy_resume(s->to_dst_file);
> > +
> > +while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > +qemu_cond_wait(&s->resume_cond, &s->resume_lock);
> > +}
> > +
> > +qemu_mutex_unlock(&s->resume_lock);
> > +
> > +if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> > +return 0;
> > +}
> 
> That feels to be a small racy - couldn't that validly become a
> MIGRATION_STATUS_COMPLETED before that check?

Since postcopy_resume_handshake() is called in migration_thread()
context, so it won't change to complete at this point (confirmed with
Dave offlist on the question).

> 
> I wonder if we need to change migrate_fd_cancel to be able to
> cause a cancel in this case?

Yeah that's important, but haven't considered in current series. Do
you mind to postpone it as TODO as well (along with the work to allow
the user to manually switch to PAUSED state, as Dan suggested)?

-- 
Peter Xu

[Qemu-devel] [PATCH 1/3] i386/kvm: use a switch statement for MSR detection

2017-08-04 Thread Ladi Prosek

Switch is easier on the eye and might lead to better codegen.

Signed-off-by: Ladi Prosek 
---
 target/i386/kvm.c | 76 +++
 1 file changed, 32 insertions(+), 44 deletions(-)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 6db7783..ed119ca 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1081,66 +1081,54 @@ static int kvm_get_supported_msrs(KVMState *s)
 int i;
 
 for (i = 0; i < kvm_msr_list->nmsrs; i++) {
-if (kvm_msr_list->indices[i] == MSR_STAR) {
+switch (kvm_msr_list->indices[i]) {
+case MSR_STAR:
 has_msr_star = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == MSR_VM_HSAVE_PA) {
+break;
+case MSR_VM_HSAVE_PA:
 has_msr_hsave_pa = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == MSR_TSC_AUX) {
+break;
+case MSR_TSC_AUX:
 has_msr_tsc_aux = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == MSR_TSC_ADJUST) {
+break;
+case MSR_TSC_ADJUST:
 has_msr_tsc_adjust = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == MSR_IA32_TSCDEADLINE) {
+break;
+case MSR_IA32_TSCDEADLINE:
 has_msr_tsc_deadline = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == MSR_IA32_SMBASE) {
+break;
+case MSR_IA32_SMBASE:
 has_msr_smbase = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == MSR_IA32_MISC_ENABLE) {
+break;
+case MSR_IA32_MISC_ENABLE:
 has_msr_misc_enable = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == MSR_IA32_BNDCFGS) {
+break;
+case MSR_IA32_BNDCFGS:
 has_msr_bndcfgs = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == MSR_IA32_XSS) {
+break;
+case MSR_IA32_XSS:
 has_msr_xss = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == HV_X64_MSR_CRASH_CTL) {
+break;;
+case HV_X64_MSR_CRASH_CTL:
 has_msr_hv_crash = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == HV_X64_MSR_RESET) {
+break;
+case HV_X64_MSR_RESET:
 has_msr_hv_reset = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == HV_X64_MSR_VP_INDEX) {
+break;
+case HV_X64_MSR_VP_INDEX:
 has_msr_hv_vpindex = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == HV_X64_MSR_VP_RUNTIME) {
+break;
+case HV_X64_MSR_VP_RUNTIME:
 has_msr_hv_runtime = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == HV_X64_MSR_SCONTROL) {
+break;
+case HV_X64_MSR_SCONTROL:
 has_msr_hv_synic = true;
-continue;
-}
-if (kvm_msr_list->indices[i] == HV_X64_MSR_STIMER0_CONFIG) {
+break;
+case HV_X64_MSR_STIMER0_CONFIG:
 has_msr_hv_stimer = true;
-continue;
+break;
 }
+
 }
 }
 
-- 
2.9.3

[Qemu-devel] [PATCH 3/3] i386/kvm: advertise Hyper-V frequency MSRs

2017-08-04 Thread Ladi Prosek

As of kernel commit eb82feea59d6 ("KVM: hyperv: support HV_X64_MSR_TSC_FREQUENCY
and HV_X64_MSR_APIC_FREQUENCY"), KVM supports two new MSRs which are required
for nested Hyper-V to read timestamps with RDTSC + TSC page.

This commit makes QEMU advertise the MSRs with CPUID.4003H:EAX[11] and
CPUID.4003H:EDX[8] as specified in the Hyper-V TLFS and experimentally
verified on a Hyper-V host. The feature is enabled with the existing hv-time CPU
flag, and only if the TSC frequency is stable across migration and known.

Signed-off-by: Ladi Prosek 
---
 target/i386/kvm.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 77b6373..7e484a7 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -89,6 +89,7 @@ static bool has_msr_hv_vpindex;
 static bool has_msr_hv_runtime;
 static bool has_msr_hv_synic;
 static bool has_msr_hv_stimer;
+static bool has_msr_hv_frequencies;
 static bool has_msr_xss;
 
 static bool has_msr_architectural_pmu;
@@ -631,7 +632,17 @@ static int hyperv_handle_properties(CPUState *cs)
 if (cpu->hyperv_time) {
 env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_HYPERCALL_AVAILABLE;
 env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE;
-env->features[FEAT_HYPERV_EAX] |= 0x200;
+env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_REFERENCE_TSC_AVAILABLE;
+if (has_msr_hv_frequencies
+/* TSC clock must be stable and known for this feature. */
+&& ((env->features[FEAT_8000_0007_EDX] & CPUID_APM_INVTSC)
+|| env->user_tsc_khz != 0)
+&& env->tsc_khz != 0) {
+
+env->features[FEAT_HYPERV_EAX] |= HV_X64_ACCESS_FREQUENCY_MSRS;
+env->features[FEAT_HYPERV_EDX] |=
+HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
+}
 }
 if (cpu->hyperv_crash && has_msr_hv_crash) {
 env->features[FEAT_HYPERV_EDX] |= HV_X64_GUEST_CRASH_MSR_AVAILABLE;
@@ -1127,6 +1138,9 @@ static int kvm_get_supported_msrs(KVMState *s)
 case HV_X64_MSR_STIMER0_CONFIG:
 has_msr_hv_stimer = true;
 break;
+case HV_X64_MSR_TSC_FREQUENCY:
+has_msr_hv_frequencies = true;
+break;
 }
 
 }
-- 
2.9.3

[Qemu-devel] [PATCH 0/3] i386/kvm: advertise Hyper-V frequency MSRs

2017-08-04 Thread Ladi Prosek

This is the QEMU part of the changes required for nested Hyper-V to read
timestamps with RDTSC + TSC page. Without exposing the frequency MSRs,
Windows with the Hyper-V role enabled use the much slower
HV_X64_MSR_TIME_REF_COUNT (0x4020) RDMSR to read timestamps.

The new registers are exposed only if the TSC frequency is stable across
migration and known, as suggested by Paolo.

Ladi Prosek (3):
  i386/kvm: use a switch statement for MSR detection
  i386/kvm: set tsc_khz before configuring Hyper-V CPUID
  i386/kvm: advertise Hyper-V frequency MSRs

 target/i386/kvm.c | 130 +++---
 1 file changed, 66 insertions(+), 64 deletions(-)

-- 
2.9.3

[Qemu-devel] [PATCH 2/3] i386/kvm: set tsc_khz before configuring Hyper-V CPUID

2017-08-04 Thread Ladi Prosek

Timing-related Hyper-V enlightenments will benefit from knowing the final
tsc_khz value. This commit just moves the code in preparation for further
changes.

Signed-off-by: Ladi Prosek 
---
 target/i386/kvm.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index ed119ca..77b6373 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -695,6 +695,25 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 cpuid_i = 0;
 
+r = kvm_arch_set_tsc_khz(cs);
+if (r < 0) {
+goto fail;
+}
+
+/* vcpu's TSC frequency is either specified by user, or following
+ * the value used by KVM if the former is not present. In the
+ * latter case, we query it from KVM and record in env->tsc_khz,
+ * so that vcpu's TSC frequency can be migrated later via this field.
+ */
+if (!env->tsc_khz) {
+r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
+kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
+-ENOTSUP;
+if (r > 0) {
+env->tsc_khz = r;
+}
+}
+
 /* Paravirtualization CPUIDs */
 if (hyperv_enabled(cpu)) {
 c = &cpuid_data.entries[cpuid_i++];
@@ -961,25 +980,6 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 }
 
-r = kvm_arch_set_tsc_khz(cs);
-if (r < 0) {
-goto fail;
-}
-
-/* vcpu's TSC frequency is either specified by user, or following
- * the value used by KVM if the former is not present. In the
- * latter case, we query it from KVM and record in env->tsc_khz,
- * so that vcpu's TSC frequency can be migrated later via this field.
- */
-if (!env->tsc_khz) {
-r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
-kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
--ENOTSUP;
-if (r > 0) {
-env->tsc_khz = r;
-}
-}
-
 if (cpu->vmware_cpuid_freq
 /* Guests depend on 0x4000 to detect this feature, so only expose
  * it if KVM exposes leaf 0x4000. (Conflicts with Hyper-V) */
-- 
2.9.3

Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME

2017-08-04 Thread Peter Xu

On Fri, Aug 04, 2017 at 09:30:01AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote:
> > > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:
> > > 
> > > [...]
> > > 
> > > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > > > > +{
> > > > > +/*
> > > > > + * This means source VM is ready to resume the postcopy 
> > > > > migration.
> > > > > + * It's time to switch state and release the fault thread to
> > > > > + * continue service page faults.
> > > > > + */
> > > > > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > > > > +  MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > > > +qemu_sem_post(&mis->postcopy_pause_sem_fault);
> > > > 
> > > > Is it worth sanity checking that you were in RECOVER at this point?
> > > 
> > > Yeah, it never hurts.  Will do.
> > 
> > Not sure whether this would be good (note: I returned 0 in the if):
> > 
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index b7843c2..b34f59b 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1709,6 +1709,12 @@ static int 
> > loadvm_postcopy_handle_run(MigrationIncomingState *mis)
> >  
> >  static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> >  {
> > +if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > +error_report("%s: illegal resume received", __func__);
> > +/* Don't fail the load, only for this. */
> > +return 0;
> > +}
> > +
> >  /*
> >   * This means source VM is ready to resume the postcopy migration.
> >   * It's time to switch state and release the fault thread to
> > 
> > Basically I just don't want to crash the dest VM (it holds hot dirty
> > pages) even if it receives a faulty RESUME command.
> 
> Yes, so now that's a fun problem; effectively you then have 3 valid
> failure modes:
> a) An IO failure so we need to go into POSTCOPY_PAUSE
> b) A fatal migration stream problem to quit
> c) A non-fatal migration stream problem to go .. back into PAUSE?

Hmm yes...

So I got at least three TODO ITEMs now:

- support manual switch source into PAUSED state
- support migrate_cancel during PAUSED/RECOVER state
- when anything wrong happens during PAUSED/RECOVER, switching back to
  PAUSED state on both sides

It just depends on whether we would like to postpone these work, or we
think any of them are essential even for the first version.

IMHO we can postpone this 3rd one as well.

-- 
Peter Xu

Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy

2017-08-04 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Thu, Aug 03, 2017 at 03:03:57PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > On Tue, Aug 01, 2017 at 10:47:16AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (pet...@redhat.com) wrote:
> > > 
> > > [...]
> > > 
> > > > > +/* Return true if we should continue the migration, or false. */
> > > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > > > +{
> > > > > +trace_postcopy_pause_incoming();
> > > > > +
> > > > > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > > > +  MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > > > +
> > > > > +assert(mis->from_src_file);
> > > > > +qemu_file_shutdown(mis->from_src_file);
> > > > > +qemu_fclose(mis->from_src_file);
> > > > > +mis->from_src_file = NULL;
> > > > > +
> > > > > +assert(mis->to_src_file);
> > > > > +qemu_mutex_lock(&mis->rp_mutex);
> > > > > +qemu_file_shutdown(mis->to_src_file);
> > > > > +qemu_fclose(mis->to_src_file);
> > > > > +mis->to_src_file = NULL;
> > > > > +qemu_mutex_unlock(&mis->rp_mutex);
> > > > 
> > > > Hmm is that safe?  If we look at migrate_send_rp_message we have:
> > > > 
> > > > static void migrate_send_rp_message(MigrationIncomingState *mis,
> > > > enum mig_rp_message_type 
> > > > message_type,
> > > > uint16_t len, void *data)
> > > > {
> > > > trace_migrate_send_rp_message((int)message_type, len);
> > > > qemu_mutex_lock(&mis->rp_mutex);
> > > > qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
> > > > qemu_put_be16(mis->to_src_file, len);
> > > > qemu_put_buffer(mis->to_src_file, data, len);
> > > > qemu_fflush(mis->to_src_file);
> > > > qemu_mutex_unlock(&mis->rp_mutex);
> > > > }
> > > > 
> > > > If we came into postcopy_pause_incoming at about the same time
> > > > migrate_send_rp_message was being called and pause_incoming took the
> > > > lock first, then once it release the lock, send_rp_message carries on
> > > > and uses mis->to_src_file that's now NULL.
> > > > 
> > > > One solution here is to just call qemu_file_shutdown() but leave the
> > > > files open at this point, but clean the files up sometime later.
> > > 
> > > I see the commnent on patch 14 as well - yeah, we need patch 14 to
> > > co-op here, and as long as we are with patch 14, we should be ok.
> > > 
> > > > 
> > > > > +
> > > > > +while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > > > +qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> > > > > +}
> > > > > +
> > > > > +trace_postcopy_pause_incoming_continued();
> > > > > +
> > > > > +return true;
> > > > > +}
> > > > > +
> > > > >  static int qemu_loadvm_state_main(QEMUFile *f, 
> > > > > MigrationIncomingState *mis)
> > > > >  {
> > > > >  uint8_t section_type;
> > > > >  int ret = 0;
> > > > >  
> > > > > +retry:
> > > > >  while (true) {
> > > > >  section_type = qemu_get_byte(f);
> > > > >  
> > > > > @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, 
> > > > > MigrationIncomingState *mis)
> > > > >  out:
> > > > >  if (ret < 0) {
> > > > >  qemu_file_set_error(f, ret);
> > > > > +
> > > > > +/*
> > > > > + * Detect whether it is:
> > > > > + *
> > > > > + * 1. postcopy running
> > > > > + * 2. network failure (-EIO)
> > > > > + *
> > > > > + * If so, we try to wait for a recovery.
> > > > > + */
> > > > > +if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > > > +ret == -EIO && postcopy_pause_incoming(mis)) {
> > > > > +/* Reset f to point to the newly created channel */
> > > > > +f = mis->from_src_file;
> > > > > +goto retry;
> > > > > +}
> > > > 
> > > > I wonder if:
> > > > 
> > > >if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > >ret == -EIO && postcopy_pause_incoming(mis)) {
> > > >/* Try again after postcopy recovery */
> > > >return qemu_loadvm_state_main(mis->from_src_file, mis);
> > > >}
> > > > would be nicer; it avoids the goto loop.
> > > 
> > > I agree we should avoid using goto loops. However I do see vast usages
> > > of goto like this one when we want to redo part of the procedures of a
> > > function (or, of course, when handling errors in "C-style").
> > 
> > We mostly use them to jump forward to an error exit; only rarely do
> > we do loops with them;  so if we can sensibly avoid them it's best.
> > 
> > > Calling qemu_loadvm_state_main() inside itself is ok as well, but it
> > > also has defect: stack usage would be out of control, or even, it can
> > > be controled by malicious users. E.g., if someone used program to
> > > period

Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy

2017-08-04 Thread Peter Xu

On Fri, Aug 04, 2017 at 10:33:19AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > On Thu, Aug 03, 2017 at 03:03:57PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (pet...@redhat.com) wrote:
> > > > On Tue, Aug 01, 2017 at 10:47:16AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Peter Xu (pet...@redhat.com) wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > +/* Return true if we should continue the migration, or false. */
> > > > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > > > > +{
> > > > > > +trace_postcopy_pause_incoming();
> > > > > > +
> > > > > > +migrate_set_state(&mis->state, 
> > > > > > MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > > > > +  MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > > > > +
> > > > > > +assert(mis->from_src_file);
> > > > > > +qemu_file_shutdown(mis->from_src_file);
> > > > > > +qemu_fclose(mis->from_src_file);
> > > > > > +mis->from_src_file = NULL;
> > > > > > +
> > > > > > +assert(mis->to_src_file);
> > > > > > +qemu_mutex_lock(&mis->rp_mutex);
> > > > > > +qemu_file_shutdown(mis->to_src_file);
> > > > > > +qemu_fclose(mis->to_src_file);
> > > > > > +mis->to_src_file = NULL;
> > > > > > +qemu_mutex_unlock(&mis->rp_mutex);
> > > > > 
> > > > > Hmm is that safe?  If we look at migrate_send_rp_message we have:
> > > > > 
> > > > > static void migrate_send_rp_message(MigrationIncomingState *mis,
> > > > > enum mig_rp_message_type 
> > > > > message_type,
> > > > > uint16_t len, void *data)
> > > > > {
> > > > > trace_migrate_send_rp_message((int)message_type, len);
> > > > > qemu_mutex_lock(&mis->rp_mutex);
> > > > > qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
> > > > > qemu_put_be16(mis->to_src_file, len);
> > > > > qemu_put_buffer(mis->to_src_file, data, len);
> > > > > qemu_fflush(mis->to_src_file);
> > > > > qemu_mutex_unlock(&mis->rp_mutex);
> > > > > }
> > > > > 
> > > > > If we came into postcopy_pause_incoming at about the same time
> > > > > migrate_send_rp_message was being called and pause_incoming took the
> > > > > lock first, then once it release the lock, send_rp_message carries on
> > > > > and uses mis->to_src_file that's now NULL.
> > > > > 
> > > > > One solution here is to just call qemu_file_shutdown() but leave the
> > > > > files open at this point, but clean the files up sometime later.
> > > > 
> > > > I see the commnent on patch 14 as well - yeah, we need patch 14 to
> > > > co-op here, and as long as we are with patch 14, we should be ok.
> > > > 
> > > > > 
> > > > > > +
> > > > > > +while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > > > > +qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> > > > > > +}
> > > > > > +
> > > > > > +trace_postcopy_pause_incoming_continued();
> > > > > > +
> > > > > > +return true;
> > > > > > +}
> > > > > > +
> > > > > >  static int qemu_loadvm_state_main(QEMUFile *f, 
> > > > > > MigrationIncomingState *mis)
> > > > > >  {
> > > > > >  uint8_t section_type;
> > > > > >  int ret = 0;
> > > > > >  
> > > > > > +retry:
> > > > > >  while (true) {
> > > > > >  section_type = qemu_get_byte(f);
> > > > > >  
> > > > > > @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile 
> > > > > > *f, MigrationIncomingState *mis)
> > > > > >  out:
> > > > > >  if (ret < 0) {
> > > > > >  qemu_file_set_error(f, ret);
> > > > > > +
> > > > > > +/*
> > > > > > + * Detect whether it is:
> > > > > > + *
> > > > > > + * 1. postcopy running
> > > > > > + * 2. network failure (-EIO)
> > > > > > + *
> > > > > > + * If so, we try to wait for a recovery.
> > > > > > + */
> > > > > > +if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > > > > +ret == -EIO && postcopy_pause_incoming(mis)) {
> > > > > > +/* Reset f to point to the newly created channel */
> > > > > > +f = mis->from_src_file;
> > > > > > +goto retry;
> > > > > > +}
> > > > > 
> > > > > I wonder if:
> > > > > 
> > > > >if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > > >ret == -EIO && postcopy_pause_incoming(mis)) {
> > > > >/* Try again after postcopy recovery */
> > > > >return qemu_loadvm_state_main(mis->from_src_file, mis);
> > > > >}
> > > > > would be nicer; it avoids the goto loop.
> > > > 
> > > > I agree we should avoid using goto loops. However I do see vast usages
> > > > of goto like this one when we want to redo part of the procedures of a
> > > > function (or, of course, when handling errors in "C-style").
> > > 
> > > We mostly use them to jump forward to an error exit; only rarely do
> > >

[Qemu-devel] [PATCH v9 1/3] ACPI: add APEI/HEST/CPER structures and macros

2017-08-04 Thread Dongjiu Geng

(1) Add related APEI/HEST table structures and  macros, these
definition refer to ACPI 6.1 and UEFI 2.6 spec.
(2) Add generic error status block and CPER memory section
definition, user space only handle memory section errors.

Signed-off-by: Dongjiu Geng 
---
thanks Michael and Laszlo's review:

change since v5:
(1) update the commit message title
(2) remove UUID_BE and UEFI_CPER_SEC_PLATFORM_MEM macros
(3) remove including "qemu/uuid.h" file in this patch
(4) remove including "hest_ghes.h" file in this patch
(5) drop "this is" from the comment of structures and macros
(6) replace the "QemuUUID section_type_le" to "uint8_t section_type_le[16]"

chnage since v4:
(1) fix email threading in this series is incorrect issue

change since v3:
(1) separate the original one patch into three patches: one is new
ACPI structures and macros, another is C source file to generate ACPI HEST
table and dynamically record CPER ,final patch is the change about Makefile
and configuration
(2) add comments about where the ACPI structures and macros come from, for 
example,
they come from the UEFI Spec 2.6, ""; ACPI 6.1 spec, 
"xx".
(3) correct the macros name, for emaple, prefix some macro names with
"UEFI_".
(4) remove the uuid_le struct and use the QemuUUID in the include/qemu/uuid.h"
(5) remove the duplicate ACPI address space, because it already defined in 
the "include/hw/acpi/aml-build.h"
(6) remove the acpi_generic_address structure because same definition exists in 
the 
AcpiGenericAddress.
(7) rename the struct acpi_hest_notify to AcpiHestNotifyType
(8) rename the struct acpi_hest_types to AcpiHestSourceType
(9) rename enum constants AcpiHestSourceType to ACPI_HEST_SOURCE_xxx from
 ACPI_HEST_TYPE_xxx
(10) remove the NOT_USED{3,4,5} enum constants in the AcpiHestSourceType.
(11) add missed QEMU_PACKED for the struct definition.
(12) remove the defnition of AcpiGenericErrorData, and rename the
 AcpiGenericErrorDataV300 to AcpiGenericErrorData.
(13) use the QemuUUID type for the "section_type" field AcpiGenericErrorData, 
and rename it
 to section_type_le.
(14) moving type AcpiGenericErrorSeverity above AcpiGenericErrorData and
 AcpiGenericErrorDataV300, and remarking on the "error_severity" fields
 that they take their values from AcpiGenericErrorSeverity
(15) remove the wrongly call to BERT(Boot Error Record Table)
(16) add comments for the struction member, for example, pint out that
 the block_status member in the AcpiGenericErrorStatus is a bitmask
 composed of ACPI_GEBS_xxx macros
(17) remove the hardware error source notification type list, and rename
 the MAX_ERROR_SOURCE_COUNT_V6 to ACPI_HEST_NOTIFY_RESERVED.
(18) remove the physical_addr member of GhesErrorState
(19) change the "uint64_t ghes_addr_le[8]" in GhesErrorState to uint64_t 
ghes_addr_le
(20) change the second parameter to "error_physical_addr" in the 
ghes_update_guest
 API statement
---
 include/hw/acpi/acpi-defs.h | 193 
 1 file changed, 193 insertions(+)

diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 4cc3630..ff9525e 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -295,6 +295,44 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable;
 #define ACPI_APIC_GENERIC_TRANSLATOR15
 #define ACPI_APIC_RESERVED  16   /* 16 and greater are reserved */
 
+/* UEFI Spec 2.6, "N.2.5 Memory Error Section */
+#define UEFI_CPER_MEM_VALID_ERROR_STATUS 0x0001
+#define UEFI_CPER_MEM_VALID_PA   0x0002
+#define UEFI_CPER_MEM_VALID_PA_MASK  0x0004
+#define UEFI_CPER_MEM_VALID_NODE 0x0008
+#define UEFI_CPER_MEM_VALID_CARD 0x0010
+#define UEFI_CPER_MEM_VALID_MODULE   0x0020
+#define UEFI_CPER_MEM_VALID_BANK 0x0040
+#define UEFI_CPER_MEM_VALID_DEVICE   0x0080
+#define UEFI_CPER_MEM_VALID_ROW  0x0100
+#define UEFI_CPER_MEM_VALID_COLUMN   0x0200
+#define UEFI_CPER_MEM_VALID_BIT_POSITION 0x0400
+#define UEFI_CPER_MEM_VALID_REQUESTOR0x0800
+#define UEFI_CPER_MEM_VALID_RESPONDER0x1000
+#define UEFI_CPER_MEM_VALID_TARGET   0x2000
+#define UEFI_CPER_MEM_VALID_ERROR_TYPE   0x4000
+#define UEFI_CPER_MEM_VALID_RANK_NUMBER  0x8000
+#define UEFI_CPER_MEM_VALID_CARD_HANDLE  0x1
+#define UEFI_CPER_MEM_VALID_MODULE_HANDLE0x2
+#define UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC   3
+
+/* From the ACPI 6.1 spec, "18.3.2.9 Hardware Error Notification" */
+
+enum AcpiHestNotifyType {
+ACPI_HEST_NOTIFY_POLLED = 0,
+ACPI_HEST_NOTIFY_EXTERNAL = 1,
+ACPI_HEST_NOTIFY_LOCAL = 2,
+ACPI_HEST_NOTIFY_SCI = 3,
+ACPI_HEST_NOTIFY_NMI = 4,
+ACPI_HEST_NOTIFY_CMCI = 5,  /* ACPI 5.0 */
+ACPI_HEST_NOTIFY_MCE = 6,   /* ACPI 5.0 */
+ACPI_HEST_NOTIFY_GPIO = 7,  /* ACPI 6.0 */
+ACPI_HEST_NOTIFY_SEA = 8,   /* ACPI 6.1 */

[Qemu-devel] [PATCH v9 3/3] ACPI: build and enable APEI GHES in the Makefile and configuration

2017-08-04 Thread Dongjiu Geng

Add CONFIG_ACPI_APEI configuration in the Makefile and
enable it in the arm-softmmu.mak

Signed-off-by: Dongjiu Geng 
---
thanks a lot Michael and Laszlo's review and comments: 

change since v5:
(1) no change

change since v4:
(1) fix email threading in this series is incorrect issue

change since v3:
(1) change name to "CONFIG_ACPI_APEI" from CONFIG_ACPI_APEI_GENERATION
---
 default-configs/arm-softmmu.mak | 1 +
 hw/acpi/Makefile.objs   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 1e3bd2b..ee6f5fc 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -121,3 +121,4 @@ CONFIG_ACPI=y
 CONFIG_SMBIOS=y
 CONFIG_ASPEED_SOC=y
 CONFIG_GPIO_KEY=y
+CONFIG_ACPI_APEI=y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 11c35bc..bafb148 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
+common-obj-$(CONFIG_ACPI_APEI) += hest_ghes.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
-- 
1.8.3.1

[Qemu-devel] [PATCH v9 2/3] ACPI: Add APEI GHES Table Generation support

2017-08-04 Thread Dongjiu Geng

This implements APEI GHES Table by passing the error CPER info
to the guest via a fw_cfg_blob. After a CPER info is recorded, an
SEA(Synchronous External Abort)/SEI(SError Interrupt) exception
will be injected into the guest OS.

Below is the table layout, the max number of error soure is 11,
which is classified by notification type.

 etc/acpi/tables   etc/hardware_errors

==
+ +--++--+
| | HEST ||address   |  
+--+
| +--+|registers |  | 
Error Status |
| | GHES0|| ++  | 
Data Block 0 |
| +--+ +->| |status_address0 |->| 
++
| | .| |  | ++  | | 
 CPER  |
| | error_status_address-+-+ +--->| |status_address1 |--+   | | 
 CPER  |
| | .|   || ++  |   | | 
   |
| | read_ack_register+-+ ||  .   |  |   | | 
 CPER  |
| | read_ack_preserve| | |+--+  |   | 
+-++
| | read_ack_write   | | | +->| |status_address10|+ |   | 
Error Status |
+ +--+ | | |  | ++| |   | 
Data Block 1 |
| | GHES1| +-+-+->| | ack_value0 || +-->| 
++
+ +--+   | |  | ++| | | 
 CPER  |
| | .|   | | +--->| | ack_value1 || | | 
 CPER  |
| | error_status_address-+---+ | || ++| | | 
   |
| | .| | || |  . || | | 
 CPER  |
| | read_ack_register+-+-+| ++| 
+-++
| | read_ack_preserve| |   +->| | ack_value10|| | 
|..  |
| | read_ack_write   | |   |  | ++| | 
++
+ +--| |   |  | | 
Error Status |
| | ...  | |   |  | | 
Data Block 10|
+ +--+ |   |  +>| 
++
| | GHES10   | |   || | 
 CPER  |
+ +--+ |   || | 
 CPER  |
| | .| |   || | 
   |
| | error_status_address-+-+   || | 
 CPER  |
| | .| |
+-++
| | read_ack_register+-+
| | read_ack_preserve|
| | read_ack_write   |
+ +--+

For GHESv2 error source, the OSPM must acknowledges the error via Read Ack 
register.
so user space must check the ack value to avoid read-write race condition.

Signed-off-by: Dongjiu Geng 
---
thanks a lot Michael and Laszlo's review and comments:

change since v8:
(1) remove the ACK value address

change since v6:
(1) update the commit message

change since v5:
(1) move to GHESv2 error source from GHESv1, because we maily use the GHESv2
(2) add the GHESv2 error source layout in the comments
(2) add the logic to read the ack value from OSPM to avoid read-write race 
condition
(3) including header files "aml-build.h" and "hest_ghes.h" in this patch to 
make it
reasonable
(4) calculate the fw_cfg blob offsets that should be patched in more 
fine-grained steps,
with multiple separate increments, using:
- structure type names,
- sizeof operators,
- offsetof macros,
- and possibly a separate comment for each offset increment.
(5) remove page boundary refers to 4096 in the "bios_linker_loader_alloc"
(6) using build_append_int_noprefix to avoid use pointer math, so that the code 
is easily
readable
(7) change memory section definition to array from QemuUUID
(8) only enable SEA/SEI notification hardware error source to avoid guest OS 
ACPI driver
probe warning

change since v4:
1. fix email threading in this series is incorrect issue

change since v3:
1. remove the unnecessary include for "hw/acpi/vmgenid.h" in 
   hw/arm/virt-acpi-build.c
2. add conversion between LE and host-endian for the CPER record
3. handle the case that run out of the preallocated memory for the CPER record
4. change to use g_malloc0 instead of g_malloc 
5. change block_reqr_size name to block_rer_size
6. change QEMU coding style, that

Re: [Qemu-devel] [PATCH 10/15] target/arm: Don't use cpsr_write/cpsr_read to transfer M profile XPSR

2017-08-04 Thread Peter Maydell

On 3 August 2017 at 23:13, Richard Henderson  wrote:
> On 08/02/2017 09:43 AM, Peter Maydell wrote:
>> +if (val & XPSR_EXCP) {
>> +/* This is a CPSR format value from an older QEMU. (We can tell
>> + * because values transferred in XPSR format always have zero
>> + * for the EXCP field, and CPSR format will always have bit 4
>> + * set in CPSR_M.) Rearrange it into XPSR format. The 
>> significant
>> + * differences are that the T bit is not in the same place, the
>> + * primask/faultmask info may be in the CPSR I and F bits, and
>> + * we do not want the mode bits.
>> + */
>> +uint32_t newval = val;
>> +
>> +newval &= (CPSR_NZCV | CPSR_Q | CPSR_IT | CPSR_GE);
>> +if (val & CPSR_T) {
>> +newval |= XPSR_T;
>> +}
>> +/* If the I or F bits are set then this is a migration from
>> + * an old QEMU which still stored the M profile FAULTMASK
>> + * and PRIMASK in env->daif. For a new QEMU, the data is
>> + * transferred using the vmstate_m_faultmask_primask subsection.
>> + */
>
> The second comment seems sort of redundant with the first now.

I felt that the migration-compat stuff was sufficiently subtle
that it was worth retaining the second detailed comment as well
as the brief summary in the new first comment.

thanks
-- PMM

Re: [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP

2017-08-04 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Thu, Aug 03, 2017 at 11:50:22AM +0100, Dr. David Alan Gilbert wrote:
> > > +/* Size of the bitmap, in bytes */
> > > +size = (block->max_length >> TARGET_PAGE_BITS) / 8;
> > > +qemu_put_be64(file, size);
> > > +qemu_put_buffer(file, (const uint8_t *)block->receivedmap, size);
> > 
> > Do we need to be careful about endianness and length of long here?
> > The migration stream can (theoretically) migrate between hosts of
> > different endianness, e.g. a Power LE and Power BE host it can also
> > migrate between a 32bit and 64bit host where the 'long' used in our
> > bitmap is a different length.
> 
> Ah, good catch...
> 
> I feel like we'd better provide a new bitmap helper for this when the
> bitmap will be sent to somewhere else, like:
> 
>   void bitmap_to_le(unsigned long *dst, const unsigned long *src,
> long nbits);
>   void bitmap_from_le(unsigned long *dst, const unsigned long *src,
>   long nbits);
> 
> I used little endian since I *think* that should work even cross 32/64
> bits machines (and I think big endian should not work).

Lets think about some combinations:

64 bit LE  G0,G1...G7
64 bit BE  G7,G6...G0
32 bit LE  A0,A1,A2,A3, B0,B1,B2,B3
32 bit BE  A3,A2,A1,A0  B3,B2,B1,B0

considering a 64bit BE src to a 32bit LE dest:
  64 bit BE  G7,G6...G0
  bitmap_to_le swaps that to
 G0,G1,..G7

destination reads two 32bit chunks:
  G0,G1,G2,G3G4,G5,G6,G7

dest is LE so no byteswap is needed.

Yes, I _think_ that's OK.

> > I think that means you have to save it as a series of long's;
> > and also just make sure 'size' is a multiple of 'long' - otherwise
> > you lose the last few bytes, which on a big endian system would
> > be a problem.
> 
> Yeah, then the size should possibly be pre-cooked with
> BITS_TO_LONGS(). However that's slightly tricky as well, maybe I
> should provide another bitmap helper:
> 
>   static inline long bitmap_size(long nbits)
>   {
>   return BITS_TO_LONGS(nbits);
>   }
> 
> Since the whole thing should be part of bitmap APIs imho.

The macro is enough I think.

> > 
> > Also, should we be using 'max_length' or 'used_length' - ram_save_setup
> > stores the used_length.  I don't think we should be accessing outside
> > the used_length?  That might also make the thing about 'size' being
> > rounded to a 'long' more interesting; maybe need to check you don't use
> > the bits outside the used_length.
> 
> Yes. AFAIU max_length and used_length are always the same currently in
> our codes. I used max_length since in ram_state_init() we inited
> block->bmap and block->unsentmap with it. I can switch to used_length
> though.

I remember it went in a couple of years ago because there were cases it
was different; they're rare though - I think it was an ACPI case.

> > > +qemu_fflush(file);
> > > +
> > > +if (qemu_file_get_error(file)) {
> > > +return qemu_file_get_error(file);
> > > +}
> > > +
> > > +return sizeof(size) + size;
> > 
> > I think since size is always sent as a 64bit that's  size + 8.
> 
> Yes. I "offloaded" the calcluation of sizeof(size) to compiler (in
> case I got brain furt when writting the codes...). So you prefer
> digits directly in these cases? It might be just fragile if we changed
> the type of "size" someday (though I guess we won't).
> 
> Let me use "size + 8".

Lets stick with what you have actually; it's OK since size is a
uint64_t.

Dave

> 
> > 
> > > +}
> > > +
> > > +/*
> > >   * An outstanding page request, on the source, having been received
> > >   * and queued
> > >   */
> > > @@ -2705,6 +2731,54 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> > > version_id)
> > >  return ret;
> > >  }
> > >  
> > > +/*
> > > + * Read the received bitmap, revert it as the initial dirty bitmap.
> > > + * This is only used when the postcopy migration is paused but wants
> > > + * to resume from a middle point.
> > > + */
> > > +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> > > +{
> > > +QEMUFile *file = s->rp_state.from_dst_file;
> > > +uint64_t local_size = (block->max_length >> TARGET_PAGE_BITS) / 8;
> > > +uint64_t size;
> > > +
> > > +if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > > +error_report("%s: incorrect state %s", __func__,
> > > + MigrationStatus_lookup[s->state]);
> > > +return -EINVAL;
> > > +}
> > > +
> > > +size = qemu_get_be64(file);
> > > +
> > > +/* The size of the bitmap should match with our ramblock */
> > > +if (size != local_size) {
> > > +error_report("%s: ramblock '%s' bitmap size mismatch "
> > > + "(0x%lx != 0x%lx)", __func__, block->idstr,
> > > + size, local_size);
> > > +return -EINVAL;
> > > +}
> > 
> > Coming back to the used_length thing above;  again I think the rule
> > is that the used_length has to match not the max_leng

[Qemu-devel] [PATCH v9 0/3] Generate APEI GHES table and dynamically record CPER

2017-08-04 Thread Dongjiu Geng

In the armv8 platform, the mainly hardware error source are ARMv8
SEA/SEI/GSIV. For the ARMv8 SEA/SEI, the KVM or host kernel will signal SIGBUS
or use other interface to notify user space, such as Qemu. After Qemu gets
the notification, it will record the CPER and inject the SEA/SEI to KVM. this
series of patches will generate APEI table when guest OS boot up, and 
dynamically
record CPER for the guest OS about the generic hardware errors, currently the
userspace only handle the memory section hardware errors. Before Qemu record the
CPER, it needs to check the ACK value written by the guest OS to avoid 
read-write
race condition.

Below is the APEI/GHESV2/CPER table layout, the max number of error soure is 11,
which is classified by notification type, now only enable the SEA/SEI 
notification type
error source.

 etc/acpi/tables   etc/hardware_errors

==
+ +--++--+
| | HEST ||address   |  
+--+
| +--+|registers |  | 
Error Status |
| | GHES0|| ++  | 
Data Block 0 |
| +--+ +->| |status_address0 |->| 
++
| | .| |  | ++  | | 
 CPER  |
| | error_status_address-+-+ +--->| |status_address1 |--+   | | 
 CPER  |
| | .|   || ++  |   | | 
   |
| | read_ack_register+-+ ||  .   |  |   | | 
 CPER  |
| | read_ack_preserve| | |+--+  |   | 
+-++
| | read_ack_write   | | | +->| |status_address10|+ |   | 
Error Status |
+ +--+ | | |  | ++| |   | 
Data Block 1 |
| | GHES1| +-+-+->| |ack_address0|--+ | +-->| 
++
+ +--+   | |  | ++  | | | | 
 CPER  |
| | .|   | | +--->| |ack_address1|--+-+   | | | 
 CPER  |
| | error_status_address-+---+ | || ++  | |   | | | 
   |
| | .| | || | .  |  | |   | | | 
 CPER  |
| | read_ack_register+-+-+| ++  | |   | 
+-++
| | read_ack_preserve| |   +->| |ack_address10   |--+-+-+ | | 
|..  |
| | read_ack_write   | |   |  | ++  | | | | | 
++
+ +--| |   |  | |  ack0  |<-+ | | | | 
Error Status |
| | ...  | |   |  | ++| | | | 
Data Block 10|
+ +--+ |   |  | |  ack1  |<---+ | +>| 
++
| | GHES10   | |   |  | ++  |   | | 
 CPER  |
+ +--+ |   |  | |    |  |   | | 
 CPER  |
| | .| |   |  | +--+ |  |   | | 
   |
| | error_status_address-+-+   |  | |  ack10 |< +   | | 
 CPER  |
| | .| |  | ++  
+-++
| | read_ack_register+-+
| | read_ack_preserve|
| | read_ack_write   |
+ +--+

After injecting a SEA/SEI ghes error, the gueset OS kernel log will be shown as 
below:

[  142.95] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 8
[  142.913141] {1}[Hardware Error]: event severity: recoverable
[  142.914498] {1}[Hardware Error]:  Error 0, type: recoverable
[  142.915851] {1}[Hardware Error]:   section_type: memory error
[  142.917163] {1}[Hardware Error]:   physical_address: 0x
[  142.918792] {1}[Hardware Error]:   error_type: 3, multi-bit ECC

how to test:
1. In the guest OS, use this command to dump the APEI table: 
"iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST"
2. And find the address for the generic error status block
   according to the notification type
3. then find the CPER record through the generic error status block.

For example(notification type is SEA):

(1) root@genericarmv8:~# iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST
(2) root@genericarmv8:~# cat HEST.dsl
/*
 * Intel ACPI Component Architecture
 * AML/ASL+ Disassembler version 20170728 (64-bit version)
 * Copyright (c) 2000 - 2017 Intel Corporation
 *
 * Disassembly of /sys/firmware/acpi/tables/HEST, Mon Sep  5 07:59:17 2016
 *
 * ACPI Data Table [HEST]
 *
 * Forma

Re: [Qemu-devel] [RFC 28/29] migration: final handshake for the resume

2017-08-04 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Thu, Aug 03, 2017 at 02:47:44PM +0100, Dr. David Alan Gilbert wrote:
> 
> [...]
> 
> > > +static int postcopy_resume_handshake(MigrationState *s)
> > > +{
> > > +qemu_mutex_lock(&s->resume_lock);
> > > +
> > > +qemu_savevm_send_postcopy_resume(s->to_dst_file);
> > > +
> > > +while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > > +qemu_cond_wait(&s->resume_cond, &s->resume_lock);
> > > +}
> > > +
> > > +qemu_mutex_unlock(&s->resume_lock);
> > > +
> > > +if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> > > +return 0;
> > > +}
> > 
> > That feels to be a small racy - couldn't that validly become a
> > MIGRATION_STATUS_COMPLETED before that check?
> 
> Since postcopy_resume_handshake() is called in migration_thread()
> context, so it won't change to complete at this point (confirmed with
> Dave offlist on the question).

Yes.

> > 
> > I wonder if we need to change migrate_fd_cancel to be able to
> > cause a cancel in this case?
> 
> Yeah that's important, but haven't considered in current series. Do
> you mind to postpone it as TODO as well (along with the work to allow
> the user to manually switch to PAUSED state, as Dan suggested)?

Yes I don't the cancel in that case is that important; it's already in
the recovery from a bad situation.

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed

2017-08-04 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Thu, Aug 03, 2017 at 02:54:35PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > Firstly, MigThrError enumeration is introduced to describe the error in
> > > migration_detect_error() better. This gives the migration_thread() a
> > > chance to know whether a recovery has happened.
> > > 
> > > Then, if a recovery is detected, migration_thread() will reset its local
> > > variables to prepare for that.
> > > 
> > > Signed-off-by: Peter Xu 
> > > ---
> > >  migration/migration.c | 40 +---
> > >  1 file changed, 29 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index ecebe30..439bc22 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -2159,6 +2159,15 @@ static bool postcopy_should_start(MigrationState 
> > > *s)
> > >  return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
> > >  }
> > >  
> > > +typedef enum MigThrError {
> > > +/* No error detected */
> > > +MIG_THR_ERR_NONE = 0,
> > > +/* Detected error, but resumed successfully */
> > > +MIG_THR_ERR_RECOVERED = 1,
> > > +/* Detected fatal error, need to exit */
> > > +MIG_THR_ERR_FATAL = 2,
> > > +} MigThrError;
> > > +
> > 
> > Could you move this patch earlier to when postcopy_pause is created
> > so it's created with this enum?
> 
> Sure.
> 
> [...]
> 
> > > @@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque)
> > >  /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> > >  enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
> > >  bool enable_colo = migrate_colo_enabled();
> > > +MigThrError thr_error;
> > >  
> > >  rcu_register_thread();
> > >  
> > > @@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque)
> > >   * Try to detect any kind of failures, and see whether we
> > >   * should stop the migration now.
> > >   */
> > > -if (migration_detect_error(s)) {
> > > +thr_error = migration_detect_error(s);
> > > +if (thr_error == MIG_THR_ERR_FATAL) {
> > > +/* Stop migration */
> > >  break;
> > > +} else if (thr_error == MIG_THR_ERR_RECOVERED) {
> > > +/*
> > > + * Just recovered from a e.g. network failure, reset all
> > > + * the local variables.
> > > + */
> > > +initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +initial_bytes = 0;
> > 
> > They don't seem that important to reset?
> 
> The problem is that we have this in migration_thread():
> 
> if (current_time >= initial_time + BUFFER_DELAY) {
> uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
>  initial_bytes;
> uint64_t time_spent = current_time - initial_time;
> double bandwidth = (double)transferred_bytes / time_spent;
> threshold_size = bandwidth * s->parameters.downtime_limit;
> ...
> }
> 
> Here qemu_ftell() would possibly be very small since we have just
> resumed... and then transferred_bytes will be extremely huge since
> "qemu_ftell(s->to_dst_file) - initial_bytes" is actually negative...
> Then, with luck, we'll got extremely huge "bandwidth" as well.

Ah yes that's a good reason to reset it then; add a comment like
'important to avoid breaking transferred_bytes and bandwidth
calculation'

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-04 Thread Stefan Hajnoczi

On Thu, Aug 3, 2017 at 11:36 PM, Paolo Bonzini  wrote:
> - Original Message -
>> From: "Dr. David Alan Gilbert" 
>> To: "Alberto Garcia" 
>> Cc: qemu-devel@nongnu.org, pbonz...@redhat.com, js...@redhat.com
>> Sent: Thursday, August 3, 2017 6:45:17 PM
>> Subject: Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block 
>> devices
>>
>> * Alberto Garcia (be...@igalia.com) wrote:
>> > On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git)
>> > wrote:
>> > > --- a/vl.c
>> > > +++ b/vl.c
>> > > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp)
>> > >  replay_disable_events();
>> > >  iothread_stop_all();
>> > >
>> > > -bdrv_close_all();
>> > >  pause_all_vcpus();
>> > > +bdrv_close_all();
>> > >  res_free();
>> >
>> > I haven't debugged it yet, but in my computer iotest 093 stops working
>> > (it never finishes) after this change.
>>
>> Yes, I can reproduce that here (I've got to explicitly run 093 - it
>> doesn't do it automatically for me):
>
> The culprit so to speak is this:
>
> if (qtest_enabled()) {
> /* For testing block IO throttling only */
> tg->clock_type = QEMU_CLOCK_VIRTUAL;
> }
>
> So after pause_all_vcpus(), the clock doesn't advance and bdrv_close_all
> hangs.  Should throttling be disabled by the time bdrv_close drains the
> BlockDriverState, and likewise for bdrv_close_all?

bdrv_drain_all() is called before blk_remove_all_bs(), so throttling
is still enabled while draining:

void bdrv_close_all(void)
{
block_job_cancel_sync_all();
nbd_export_close_all();

/* Drop references from requests still in flight, such as canceled block
 * jobs whose AIO context has not been polled yet */
bdrv_drain_all();

blk_remove_all_bs();

Perhaps we need a bdrv_throttle_disable_all() function.  Manos is
moving throttling to a block driver so there is no longer a 1:1
relationship between BB and throttling - there might be multiple
throtting nodes in a BDS graph.

Stefan

Re: [Qemu-devel] [PATCH v3 1/9] kvm: remove hard dependency on pci

2017-08-04 Thread Cornelia Huck

On Wed, 26 Jul 2017 10:26:23 +0200
Cornelia Huck  wrote:

> On Wed, 26 Jul 2017 08:52:44 +0200
> Thomas Huth  wrote:

> > Since this missed the 2.10 freeze anyway and we've got some more time to
> > discuss it: I still think it would be much cleaner to move the functions
> > from kvm-all.c that use these PCI functions into a separate file
> > kvm-pci.c instead and compile that only with CONFIG_PCI=y. That way we
> > likely also could prevent that more dependencies on CONFIG_PCI creep
> > into kvm-all.c again at later points in time.
> > 
> > I've now had a closer look, and it seems like the only affected
> > functions are kvm_irqchip_add_msi_route() and
> > kvm_irqchip_update_msi_route() ... and these seems only to be used in
> > code we would not care about on s390x with CONFIG_PCI=n. So could you
> > please have at least a try to move these two functions (and other
> > related msi functions) to a new file kvm-pci.c instead to see whether
> > that would work, too?  
> 
> I can try, as this missed the 2.10 boat anyway.
> 
> The code contains some parts that are not relevant in all cases (msi
> routes if we don't use pci, adapter routes for anything not
> s390x, ...), but I don't think trying to rip this out would be much of
> an improvement. I'll try the minimal (well, not quite as minimal as
> this patch) route instead.

I tried this; but it turned out to be one of those cases where you pull
on a stick and then find half a tree attached to it... Much of the code
is used by all routing variants, and I ended up exposing so many
functions that it was just ugly and unreadable.

The variant I'm now going with is keeping the original approach and
adding a pci_available variable, which I also use in the patch that
exposes the zpci facility patch only when pci is compiled in. I'll try
to post this after lunch.

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-04 Thread Stefan Hajnoczi

On Mon, Jul 17, 2017 at 5:43 PM, John Snow  wrote:
> On 07/17/2017 06:26 AM, Dr. David Alan Gilbert wrote:
>> * Stefan Hajnoczi (stefa...@gmail.com) wrote:
>>> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) 
>>> wrote:
 From: "Dr. David Alan Gilbert" 

 There's a rare exit seg if the guest is accessing
 IO during exit.
 It's always hitting the atomic_inc(&bs->in_flight) with a NULL
 bs. This was added recently in 99723548  but I don't see it
 as the cause.

 Flip vl.c around so we pause the cpus before closing the block devices,
 that way we shouldn't have anything trying to access them when
 they're gone.

 This was originally Red Hat bz 
 https://bugzilla.redhat.com/show_bug.cgi?id=1451015

 Signed-off-by: Dr. David Alan Gilbert 
 Reported-by: Cong Li 

 --
 This is a very rare race, I'll leave it running in a loop to see if
 we hit anything else and to check this really fixes it.

 I do worry if there are other cases that can trigger this - e.g.
 hot-unplug or ejecting a CD.

 ---
  vl.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> Reviewed-by: Stefan Hajnoczi 
>>
>> Thanks;  and the test I left running seems solid - ~12k runs
>> over the weekend with no seg.
>>
>> Dave
>>
>> --
>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>>
>
> the root cause of this bug is related to this as well:
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>
> From commit 99723548 we started assuming (incorrectly?) that blk_
> functions always WILL have an attached BDS, but this is not always true,
> for instance, flushing the cache from an empty CDROM.
>
> Paolo, can we move the flight counter increment outside of the
> block-backend layer, is that safe?

I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
regardless of the throttling timer issue discussed below.  BB cannot
assume that the BDS graph is non-empty.

Stefan

Re: [Qemu-devel] Prohibit Windows from running in QEMU

2017-08-04 Thread Peter Lieven

Am 29.10.2013 um 10:59 schrieb Paolo Bonzini:
> Il 29/10/2013 10:48, Peter Lieven ha scritto:
>> Hi all,
>>
>> this question might seem a bit weird, but does anyone see a good way to
>> avoid
>> that Windows is able to boot inside qemu?
>>
>> We have defined several profiles for different operation systems and I want
>> to avoid that someone chooses Linux and then installs Windows within
>> a VM. Reason is licensing.
> Patch QEMU to crash when Hyper-V extensions are enabled...

Hi all,

this is an old topic that has become important for me again recently.
Now all Linux versions should be able to detect KVM even if Hyper-V is enabled.

But how do I detect from Qemu userspace that Hyper-V is enabled?

Thanks,
Peter

[Qemu-devel] [PATCH v10 3/3] ACPI: build and enable APEI GHES in the Makefile and configuration

2017-08-04 Thread Dongjiu Geng

Add CONFIG_ACPI_APEI configuration in the Makefile and
enable it in the arm-softmmu.mak

Signed-off-by: Dongjiu Geng 
---
thanks a lot Michael and Laszlo's review and comments: 

change since v5:
(1) no change

change since v4:
(1) fix email threading in this series is incorrect issue

change since v3:
(1) change name to "CONFIG_ACPI_APEI" from CONFIG_ACPI_APEI_GENERATION
---
 default-configs/arm-softmmu.mak | 1 +
 hw/acpi/Makefile.objs   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 1e3bd2b..ee6f5fc 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -121,3 +121,4 @@ CONFIG_ACPI=y
 CONFIG_SMBIOS=y
 CONFIG_ASPEED_SOC=y
 CONFIG_GPIO_KEY=y
+CONFIG_ACPI_APEI=y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 11c35bc..bafb148 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
+common-obj-$(CONFIG_ACPI_APEI) += hest_ghes.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
-- 
1.8.3.1

[Qemu-devel] [PATCH v10 2/3] ACPI: Add APEI GHES Table Generation support

2017-08-04 Thread Dongjiu Geng

This implements APEI GHES Table by passing the error CPER info
to the guest via a fw_cfg_blob. After a CPER info is recorded, an
SEA(Synchronous External Abort)/SEI(SError Interrupt) exception
will be injected into the guest OS.

Below is the table layout, the max number of error soure is 11,
which is classified by notification type.

 etc/acpi/tables   etc/hardware_errors

==
+ +--++--+
| | HEST ||address   |  
+--+
| +--+|registers |  | 
Error Status |
| | GHES0|| ++  | 
Data Block 0 |
| +--+ +->| |status_address0 |->| 
++
| | .| |  | ++  | | 
 CPER  |
| | error_status_address-+-+ +--->| |status_address1 |--+   | | 
 CPER  |
| | .|   || ++  |   | | 
   |
| | read_ack_register+-+ ||  .   |  |   | | 
 CPER  |
| | read_ack_preserve| | |+--+  |   | 
++
| | read_ack_write   | | | +->| |status_address10|+ |   | 
Error Status |
+ +--+ | | |  | ++| |   | 
Data Block 1 |
| | GHES1| +-+-+->| | ack_value0 || +-->| 
++
+ +--+   | |  | ++| | | 
 CPER  |
| | .|   | | +--->| | ack_value1 || | | 
 CPER  |
| | error_status_address-+---+ | || ++| | | 
   |
| | .| | || |  . || | | 
 CPER  |
| | read_ack_register+-+-+| ++| 
+-++
| | read_ack_preserve| |   +->| | ack_value10|| | 
|..  |
| | read_ack_write   | |   |  | ++| | 
++
+ +--| |   |  | | 
Error Status |
| | ...  | |   |  | | 
Data Block 10|
+ +--+ |   |  +>| 
++
| | GHES10   | |   || | 
 CPER  |
+ +--+ |   || | 
 CPER  |
| | .| |   || | 
   |
| | error_status_address-+-+   || | 
 CPER  |
| | .| |
+-++
| | read_ack_register+-+
| | read_ack_preserve|
| | read_ack_write   |
+ +--+

For GHESv2 error source, the OSPM must acknowledges the error via Read Ack 
register.
so user space must check the ack value to avoid read-write race condition.

Signed-off-by: Dongjiu Geng 
---
thanks a lot Michael and Laszlo's review and comments:

change since v8:
(1) remove the ACK value address

change since v6:
(1) update the commit message

change since v5:
(1) move to GHESv2 error source from GHESv1, because we maily use the GHESv2
(2) add the GHESv2 error source layout in the comments
(2) add the logic to read the ack value from OSPM to avoid read-write race 
condition
(3) including header files "aml-build.h" and "hest_ghes.h" in this patch to 
make it
reasonable
(4) calculate the fw_cfg blob offsets that should be patched in more 
fine-grained steps,
with multiple separate increments, using:
- structure type names,
- sizeof operators,
- offsetof macros,
- and possibly a separate comment for each offset increment.
(5) remove page boundary refers to 4096 in the "bios_linker_loader_alloc"
(6) using build_append_int_noprefix to avoid use pointer math, so that the code 
is easily
readable
(7) change memory section definition to array from QemuUUID
(8) only enable SEA/SEI notification hardware error source to avoid guest OS 
ACPI driver
probe warning

change since v4:
1. fix email threading in this series is incorrect issue

change since v3:
1. remove the unnecessary include for "hw/acpi/vmgenid.h" in 
   hw/arm/virt-acpi-build.c
2. add conversion between LE and host-endian for the CPER record
3. handle the case that run out of the preallocated memory for the CPER record
4. change to use g_malloc0 instead of g_malloc 
5. change block_reqr_size name to block_rer_size
6. change QEMU coding style, that is

[Qemu-devel] [PATCH v10 0/3] Generate APEI GHES table and dynamically record CPER

2017-08-04 Thread Dongjiu Geng

In the armv8 platform, the mainly hardware error source are ARMv8
SEA/SEI/GSIV. For the ARMv8 SEA/SEI, the KVM or host kernel will signal SIGBUS
or use other interface to notify user space, such as Qemu. After Qemu gets
the notification, it will record the CPER and inject the SEA/SEI to KVM. this
series of patches will generate APEI table when guest OS boot up, and 
dynamically
record CPER for the guest OS about the generic hardware errors, currently the
userspace only handle the memory section hardware errors. Before Qemu record the
CPER, it needs to check the ACK value written by the guest OS to avoid 
read-write
race condition.

Below is the APEI/GHESV2/CPER table layout, the max number of error soure is 11,
which is classified by notification type, now only enable the SEA/SEI 
notification type
error source.

   etc/acpi/tables   etc/hardware_errors

==
+ +--++--+
| | HEST ||address   |  
+--+
| +--+|registers |  | 
Error Status |
| | GHES0|| ++  | 
Data Block 0 |
| +--+ +->| |status_address0 |->| 
++
| | .| |  | ++  | | 
 CPER  |
| | error_status_address-+-+ +--->| |status_address1 |--+   | | 
 CPER  |
| | .|   || ++  |   | | 
   |
| | read_ack_register+-+ ||  .   |  |   | | 
 CPER  |
| | read_ack_preserve| | |+--+  |   | 
++
| | read_ack_write   | | | +->| |status_address10|+ |   | 
Error Status |
+ +--+ | | |  | ++| |   | 
Data Block 1 |
| | GHES1| +-+-+->| | ack_value0 || +-->| 
++
+ +--+   | |  | ++| | | 
 CPER  |
| | .|   | | +--->| | ack_value1 || | | 
 CPER  |
| | error_status_address-+---+ | || ++| | | 
   |
| | .| | || |  . || | | 
 CPER  |
| | read_ack_register+-+-+| ++| 
+-++
| | read_ack_preserve| |   +->| | ack_value10|| | 
|..  |
| | read_ack_write   | |   |  | ++| | 
++
+ +--| |   |  | | 
Error Status |
| | ...  | |   |  | | 
Data Block 10|
+ +--+ |   |  +>| 
++
| | GHES10   | |   || | 
 CPER  |
+ +--+ |   || | 
 CPER  |
| | .| |   || | 
   |
| | error_status_address-+-+   || | 
 CPER  |
| | .| |
+-++
| | read_ack_register+-+
| | read_ack_preserve|
| | read_ack_write   |
+ +--+

After injecting a SEA/SEI ghes error, the gueset OS kernel log will be shown as 
below:

[  142.95] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 8
[  142.913141] {1}[Hardware Error]: event severity: recoverable
[  142.914498] {1}[Hardware Error]:  Error 0, type: recoverable
[  142.915851] {1}[Hardware Error]:   section_type: memory error
[  142.917163] {1}[Hardware Error]:   physical_address: 0x
[  142.918792] {1}[Hardware Error]:   error_type: 3, multi-bit ECC

how to test:
1. In the guest OS, use this command to dump the APEI table: 
"iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST"
2. And find the address for the generic error status block
   according to the notification type
3. then find the CPER record through the generic error status block.

For example(notification type is SEA):

(1) root@genericarmv8:~# iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST
(2) root@genericarmv8:~# cat HEST.dsl
/*
 * Intel ACPI Component Architecture
 * AML/ASL+ Disassembler version 20170728 (64-bit version)
 * Copyright (c) 2000 - 2017 Intel Corporation
 *
 * Disassembly of /sys/firmware/acpi/tables/HEST, Mon Sep  5 07:59:17 2016
 *
 * ACPI Data Table [HEST]
 *
 * F

[Qemu-devel] [PATCH v10 1/3] ACPI: add APEI/HEST/CPER structures and macros

2017-08-04 Thread Dongjiu Geng

(1) Add related APEI/HEST table structures and  macros, these
definition refer to ACPI 6.1 and UEFI 2.6 spec.
(2) Add generic error status block and CPER memory section
definition, user space only handle memory section errors.

Signed-off-by: Dongjiu Geng 
---
thanks Michael and Laszlo's review:

change since v5:
(1) update the commit message title
(2) remove UUID_BE and UEFI_CPER_SEC_PLATFORM_MEM macros
(3) remove including "qemu/uuid.h" file in this patch
(4) remove including "hest_ghes.h" file in this patch
(5) drop "this is" from the comment of structures and macros
(6) replace the "QemuUUID section_type_le" to "uint8_t section_type_le[16]"

chnage since v4:
(1) fix email threading in this series is incorrect issue

change since v3:
(1) separate the original one patch into three patches: one is new
ACPI structures and macros, another is C source file to generate ACPI HEST
table and dynamically record CPER ,final patch is the change about Makefile
and configuration
(2) add comments about where the ACPI structures and macros come from, for 
example,
they come from the UEFI Spec 2.6, ""; ACPI 6.1 spec, 
"xx".
(3) correct the macros name, for emaple, prefix some macro names with
"UEFI_".
(4) remove the uuid_le struct and use the QemuUUID in the include/qemu/uuid.h"
(5) remove the duplicate ACPI address space, because it already defined in 
the "include/hw/acpi/aml-build.h"
(6) remove the acpi_generic_address structure because same definition exists in 
the 
AcpiGenericAddress.
(7) rename the struct acpi_hest_notify to AcpiHestNotifyType
(8) rename the struct acpi_hest_types to AcpiHestSourceType
(9) rename enum constants AcpiHestSourceType to ACPI_HEST_SOURCE_xxx from
 ACPI_HEST_TYPE_xxx
(10) remove the NOT_USED{3,4,5} enum constants in the AcpiHestSourceType.
(11) add missed QEMU_PACKED for the struct definition.
(12) remove the defnition of AcpiGenericErrorData, and rename the
 AcpiGenericErrorDataV300 to AcpiGenericErrorData.
(13) use the QemuUUID type for the "section_type" field AcpiGenericErrorData, 
and rename it
 to section_type_le.
(14) moving type AcpiGenericErrorSeverity above AcpiGenericErrorData and
 AcpiGenericErrorDataV300, and remarking on the "error_severity" fields
 that they take their values from AcpiGenericErrorSeverity
(15) remove the wrongly call to BERT(Boot Error Record Table)
(16) add comments for the struction member, for example, pint out that
 the block_status member in the AcpiGenericErrorStatus is a bitmask
 composed of ACPI_GEBS_xxx macros
(17) remove the hardware error source notification type list, and rename
 the MAX_ERROR_SOURCE_COUNT_V6 to ACPI_HEST_NOTIFY_RESERVED.
(18) remove the physical_addr member of GhesErrorState
(19) change the "uint64_t ghes_addr_le[8]" in GhesErrorState to uint64_t 
ghes_addr_le
(20) change the second parameter to "error_physical_addr" in the 
ghes_update_guest
 API statement
---
 include/hw/acpi/acpi-defs.h | 193 
 1 file changed, 193 insertions(+)

diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 4cc3630..ff9525e 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -295,6 +295,44 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable;
 #define ACPI_APIC_GENERIC_TRANSLATOR15
 #define ACPI_APIC_RESERVED  16   /* 16 and greater are reserved */
 
+/* UEFI Spec 2.6, "N.2.5 Memory Error Section */
+#define UEFI_CPER_MEM_VALID_ERROR_STATUS 0x0001
+#define UEFI_CPER_MEM_VALID_PA   0x0002
+#define UEFI_CPER_MEM_VALID_PA_MASK  0x0004
+#define UEFI_CPER_MEM_VALID_NODE 0x0008
+#define UEFI_CPER_MEM_VALID_CARD 0x0010
+#define UEFI_CPER_MEM_VALID_MODULE   0x0020
+#define UEFI_CPER_MEM_VALID_BANK 0x0040
+#define UEFI_CPER_MEM_VALID_DEVICE   0x0080
+#define UEFI_CPER_MEM_VALID_ROW  0x0100
+#define UEFI_CPER_MEM_VALID_COLUMN   0x0200
+#define UEFI_CPER_MEM_VALID_BIT_POSITION 0x0400
+#define UEFI_CPER_MEM_VALID_REQUESTOR0x0800
+#define UEFI_CPER_MEM_VALID_RESPONDER0x1000
+#define UEFI_CPER_MEM_VALID_TARGET   0x2000
+#define UEFI_CPER_MEM_VALID_ERROR_TYPE   0x4000
+#define UEFI_CPER_MEM_VALID_RANK_NUMBER  0x8000
+#define UEFI_CPER_MEM_VALID_CARD_HANDLE  0x1
+#define UEFI_CPER_MEM_VALID_MODULE_HANDLE0x2
+#define UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC   3
+
+/* From the ACPI 6.1 spec, "18.3.2.9 Hardware Error Notification" */
+
+enum AcpiHestNotifyType {
+ACPI_HEST_NOTIFY_POLLED = 0,
+ACPI_HEST_NOTIFY_EXTERNAL = 1,
+ACPI_HEST_NOTIFY_LOCAL = 2,
+ACPI_HEST_NOTIFY_SCI = 3,
+ACPI_HEST_NOTIFY_NMI = 4,
+ACPI_HEST_NOTIFY_CMCI = 5,  /* ACPI 5.0 */
+ACPI_HEST_NOTIFY_MCE = 6,   /* ACPI 5.0 */
+ACPI_HEST_NOTIFY_GPIO = 7,  /* ACPI 6.0 */
+ACPI_HEST_NOTIFY_SEA = 8,   /* ACPI 6.1 */

Re: [Qemu-devel] Prohibit Windows from running in QEMU

2017-08-04 Thread Paolo Bonzini

On 04/08/2017 11:58, Peter Lieven wrote:
> Am 29.10.2013 um 10:59 schrieb Paolo Bonzini:
>> Il 29/10/2013 10:48, Peter Lieven ha scritto:
>>> Hi all,
>>>
>>> this question might seem a bit weird, but does anyone see a good way to
>>> avoid
>>> that Windows is able to boot inside qemu?
>>>
>>> We have defined several profiles for different operation systems and I want
>>> to avoid that someone chooses Linux and then installs Windows within
>>> a VM. Reason is licensing.
>> Patch QEMU to crash when Hyper-V extensions are enabled...
> 
> Hi all,
> 
> this is an old topic that has become important for me again recently.
> Now all Linux versions should be able to detect KVM even if Hyper-V is 
> enabled.
> 
> But how do I detect from Qemu userspace that Hyper-V is enabled?

Maybe a better one: make KVM crash the guest if CR8 is nonzero on a
vmexit.  Linux doesn't use it, Windows should not survive long.
Warning, I don't know if UEFI firmware uses CR8.

Paolo

Re: [Qemu-devel] [PATCH for-2.10 2/5] block: Allow reopen rw without BDRV_O_ALLOW_RDWR

2017-08-04 Thread Kevin Wolf

Am 04.08.2017 um 03:49 hat Eric Blake geschrieben:
> On 08/03/2017 10:21 AM, Eric Blake wrote:
> > On 08/03/2017 10:02 AM, Kevin Wolf wrote:
> >> BDRV_O_ALLOW_RDWR is a flag that tells whether qemu can internally
> >> reopen a node read-write temporarily because the user requested
> >> read-write for the top-level image, but qemu decided that read-only is
> >> enough for this node (a backing file).
> >>
> >> bdrv_reopen() is different, it is also used for cases where the user
> >> changed their mind and wants to update the options. There is no reason
> >> to forbid making a node read-write in that case.
> > 
> > Hmm, I wonder.  https://bugzilla.redhat.com/show_bug.cgi?id=1465320
> > details a failure when starting qemu with a read-write NBD disk, then
> > taking several snapshots (nbd <- snap1 <- snap2 <- snap3), then where
> > intermediate commit (snap2 into nbd) works but live commit (snap3 into
> > nbd) fails with a message that nbd does not support reopening.  I'm
> > presuming that your series may help to address that; I'll give it a spin
> > and see what happens.
> 
> Nope, even with your patches, I'm still getting:
> 
> {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar2'}}
> {"return": {}}
> {"timestamp": {"seconds": 1501811285, "microseconds": 439748}, "event":
> "BLOCK_JOB_COMPLETED", "data": {"device": "drive-image1", "len":
> 2097152, "offset": 2097152, "speed": 0, "type": "commit"}}
> 
> {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar3'}}
> {"error": {"class": "GenericError", "desc": "Block format 'nbd' used by
> node '#block048' does not support reopening files"}}

That's simply NBD not implementing .bdrv_reopen_*. In other words, not a
bug, but just a missing feature.

Kevin


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v2 5/6] hw/block: Use errp directly rather than local_err

2017-08-04 Thread Mao Zhongyi

Pass the error message to errp directly rather than the local
variable local_err and propagate it to errp via error_propagate().

Cc: John Snow 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Keith Busch 
Cc: Stefan Hajnoczi 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Gerd Hoffmann 
Cc: Markus Armbruster 

Signed-off-by: Mao Zhongyi 
---
 hw/block/fdc.c| 17 ++---
 hw/block/nvme.c   |  8 +++-
 hw/block/virtio-blk.c | 17 +
 hw/ide/qdev.c | 12 
 hw/scsi/scsi-disk.c   | 13 -
 hw/usb/dev-storage.c  |  9 +++--
 6 files changed, 25 insertions(+), 51 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 02f4a46..192a4aa 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -473,16 +473,13 @@ static void fd_revalidate(FDrive *drv)
 static void fd_change_cb(void *opaque, bool load, Error **errp)
 {
 FDrive *drive = opaque;
-Error *local_err = NULL;
 
 if (!load) {
 blk_set_perm(drive->blk, 0, BLK_PERM_ALL, &error_abort);
 } else {
-blkconf_apply_backend_options(drive->conf,
-  blk_is_read_only(drive->blk), false,
-  &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!blkconf_apply_backend_options(drive->conf,
+   blk_is_read_only(drive->blk),
+   false, errp)) {
 return;
 }
 }
@@ -522,7 +519,6 @@ static void floppy_drive_realize(DeviceState *qdev, Error 
**errp)
 FloppyDrive *dev = FLOPPY_DRIVE(qdev);
 FloppyBus *bus = FLOPPY_BUS(qdev->parent_bus);
 FDrive *drive;
-Error *local_err = NULL;
 int ret;
 
 if (dev->unit == -1) {
@@ -568,10 +564,9 @@ static void floppy_drive_realize(DeviceState *qdev, Error 
**errp)
 dev->conf.rerror = BLOCKDEV_ON_ERROR_AUTO;
 dev->conf.werror = BLOCKDEV_ON_ERROR_AUTO;
 
-blkconf_apply_backend_options(&dev->conf, blk_is_read_only(dev->conf.blk),
-  false, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!blkconf_apply_backend_options(&dev->conf,
+   blk_is_read_only(dev->conf.blk),
+   false, errp)) {
 return;
 }
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 43c60ab..1856053 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -928,7 +928,6 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 int i;
 int64_t bs_size;
 uint8_t *pci_conf;
-Error *local_err = NULL;
 
 if (!n->conf.blk) {
 error_setg(errp, "drive property not set");
@@ -947,10 +946,9 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 return;
 }
 blkconf_blocksizes(&n->conf);
-blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk),
-  false, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!blkconf_apply_backend_options(&n->conf,
+   blk_is_read_only(n->conf.blk),
+   false, errp)) {
 return;
 }
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b750bd8..2940337 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -911,7 +911,6 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOBlock *s = VIRTIO_BLK(dev);
 VirtIOBlkConf *conf = &s->conf;
-Error *err = NULL;
 unsigned i;
 
 if (!conf->conf.blk) {
@@ -928,17 +927,13 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 }
 
 blkconf_serial(&conf->conf, &conf->serial);
-blkconf_apply_backend_options(&conf->conf,
-  blk_is_read_only(conf->conf.blk), true,
-  &err);
-if (err) {
-error_propagate(errp, err);
+if (!blkconf_apply_backend_options(&conf->conf,
+   blk_is_read_only(conf->conf.blk),
+   true, errp)) {
 return;
 }
 s->original_wce = blk_enable_write_cache(conf->conf.blk);
-blkconf_geometry(&conf->conf, NULL, 65535, 255, 255, &err);
-if (err) {
-error_propagate(errp, err);
+if (!blkconf_geometry(&conf->conf, NULL, 65535, 255, 255, errp)) {
 return;
 }
 blkconf_blocksizes(&conf->conf);
@@ -953,9 +948,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 for (i = 0; i < conf->num_queues; i++) {
 virtio_add_queue(vdev, 128, virtio_blk_handle_output);
 }
-virtio_blk_data_plane_create(vdev, conf, &s->dataplane, &err);
-if (err != NULL) {
-error_propagate(errp, err);
+if (!virtio_blk_data_plane_create(vde

Re: [Qemu-devel] Prohibit Windows from running in QEMU

2017-08-04 Thread Peter Lieven

Am 04.08.2017 um 12:23 schrieb Paolo Bonzini:
> On 04/08/2017 11:58, Peter Lieven wrote:
>> Am 29.10.2013 um 10:59 schrieb Paolo Bonzini:
>>> Il 29/10/2013 10:48, Peter Lieven ha scritto:
 Hi all,

 this question might seem a bit weird, but does anyone see a good way to
 avoid
 that Windows is able to boot inside qemu?

 We have defined several profiles for different operation systems and I want
 to avoid that someone chooses Linux and then installs Windows within
 a VM. Reason is licensing.
>>> Patch QEMU to crash when Hyper-V extensions are enabled...
>> Hi all,
>>
>> this is an old topic that has become important for me again recently.
>> Now all Linux versions should be able to detect KVM even if Hyper-V is 
>> enabled.
>>
>> But how do I detect from Qemu userspace that Hyper-V is enabled?
> Maybe a better one: make KVM crash the guest if CR8 is nonzero on a
> vmexit.  Linux doesn't use it, Windows should not survive long.

You mean the kvm kernel module? Or can I access this register also
from Qemu on any call that is handled in userspace? It would be easier
to have a cmdline option to Qemu than an option to a kernel module.

> Warning, I don't know if UEFI firmware uses CR8.

UEFI firmware is not important in this case.
Do you know if FreeBSD, OpenBSD or NetBSD use it?

Thank for your ideas,
Peter

[Qemu-devel] [PATCH v2 4/6] hw/block: Fix the return type

2017-08-04 Thread Mao Zhongyi

When the function no success value to transmit, it usually make the
function return void. It has turned out not to be a success, because
it means that the extra local_err variable and error_propagate() will
be needed. It leads to cumbersome code, therefore, transmit success/
failure in the return value is worth.

So fix the return type of blkconf_apply_backend_options(),
blkconf_geometry() and virtio_blk_data_plane_create() to avoid it.

Cc: John Snow 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Stefan Hajnoczi 

Signed-off-by: Mao Zhongyi 
---
 hw/block/block.c| 15 +--
 hw/block/dataplane/virtio-blk.c | 12 +++-
 hw/block/dataplane/virtio-blk.h |  2 +-
 include/hw/block/block.h|  4 ++--
 4 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index 27878d0..b0269c8 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -51,7 +51,7 @@ void blkconf_blocksizes(BlockConf *conf)
 }
 }
 
-void blkconf_apply_backend_options(BlockConf *conf, bool readonly,
+bool blkconf_apply_backend_options(BlockConf *conf, bool readonly,
bool resizable, Error **errp)
 {
 BlockBackend *blk = conf->blk;
@@ -76,7 +76,7 @@ void blkconf_apply_backend_options(BlockConf *conf, bool 
readonly,
 
 ret = blk_set_perm(blk, perm, shared_perm, errp);
 if (ret < 0) {
-return;
+return false;
 }
 
 switch (conf->wce) {
@@ -99,9 +99,11 @@ void blkconf_apply_backend_options(BlockConf *conf, bool 
readonly,
 
 blk_set_enable_write_cache(blk, wce);
 blk_set_on_error(blk, rerror, werror);
+
+return true;
 }
 
-void blkconf_geometry(BlockConf *conf, int *ptrans,
+bool blkconf_geometry(BlockConf *conf, int *ptrans,
   unsigned cyls_max, unsigned heads_max, unsigned secs_max,
   Error **errp)
 {
@@ -129,15 +131,16 @@ void blkconf_geometry(BlockConf *conf, int *ptrans,
 if (conf->cyls || conf->heads || conf->secs) {
 if (conf->cyls < 1 || conf->cyls > cyls_max) {
 error_setg(errp, "cyls must be between 1 and %u", cyls_max);
-return;
+return false;
 }
 if (conf->heads < 1 || conf->heads > heads_max) {
 error_setg(errp, "heads must be between 1 and %u", heads_max);
-return;
+return false;
 }
 if (conf->secs < 1 || conf->secs > secs_max) {
 error_setg(errp, "secs must be between 1 and %u", secs_max);
-return;
+return false;
 }
 }
+return true;
 }
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 5556f0e..f6fc639 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -76,7 +76,7 @@ static void notify_guest_bh(void *opaque)
 }
 
 /* Context: QEMU global mutex held */
-void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
+bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
   VirtIOBlockDataPlane **dataplane,
   Error **errp)
 {
@@ -91,11 +91,11 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 error_setg(errp,
"device is incompatible with iothread "
"(transport does not support notifiers)");
-return;
+return false;
 }
 if (!virtio_device_ioeventfd_enabled(vdev)) {
 error_setg(errp, "ioeventfd is required for iothread");
-return;
+return false;
 }
 
 /* If dataplane is (re-)enabled while the guest is running there could
@@ -103,12 +103,12 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
  */
 if (blk_op_is_blocked(conf->conf.blk, BLOCK_OP_TYPE_DATAPLANE, errp)) {
 error_prepend(errp, "cannot start virtio-blk dataplane: ");
-return;
+return false;
 }
 }
 /* Don't try if transport does not support notifiers. */
 if (!virtio_device_ioeventfd_enabled(vdev)) {
-return;
+return false;
 }
 
 s = g_new0(VirtIOBlockDataPlane, 1);
@@ -126,6 +126,8 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
 *dataplane = s;
+
+return true;
 }
 
 /* Context: QEMU global mutex held */
diff --git a/hw/block/dataplane/virtio-blk.h b/hw/block/dataplane/virtio-blk.h
index db3f47b..5e18bb9 100644
--- a/hw/block/dataplane/virtio-blk.h
+++ b/hw/block/dataplane/virtio-blk.h
@@ -19,7 +19,7 @@
 
 typedef struct VirtIOBlockDataPlane VirtIOBlockDataPlane;
 
-void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
+bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
   VirtIOBlockData

[Qemu-devel] [PATCH v2 2/6] hw/block/fdc: Convert to realize

2017-08-04 Thread Mao Zhongyi

Convert floppy_drive_init() to realize and rename it to
floppy_drive_realize().

Cc: John Snow 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Markus Armbruster 

Signed-off-by: Mao Zhongyi 
---
 hw/block/fdc.c | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 4011290..02f4a46 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -517,7 +517,7 @@ static Property floppy_drive_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
-static int floppy_drive_init(DeviceState *qdev)
+static void floppy_drive_realize(DeviceState *qdev, Error **errp)
 {
 FloppyDrive *dev = FLOPPY_DRIVE(qdev);
 FloppyBus *bus = FLOPPY_BUS(qdev->parent_bus);
@@ -535,15 +535,15 @@ static int floppy_drive_init(DeviceState *qdev)
 }
 
 if (dev->unit >= MAX_FD) {
-error_report("Can't create floppy unit %d, bus supports only %d units",
- dev->unit, MAX_FD);
-return -1;
+error_setg(errp, "Can't create floppy unit %d, bus supports "
+   "only %d units", dev->unit, MAX_FD);
+return;
 }
 
 drive = get_drv(bus->fdc, dev->unit);
 if (drive->blk) {
-error_report("Floppy unit %d is in use", dev->unit);
-return -1;
+error_setg(errp, "Floppy unit %d is in use", dev->unit);
+return;
 }
 
 if (!dev->conf.blk) {
@@ -557,8 +557,9 @@ static int floppy_drive_init(DeviceState *qdev)
 if (dev->conf.logical_block_size != 512 ||
 dev->conf.physical_block_size != 512)
 {
-error_report("Physical and logical block size must be 512 for floppy");
-return -1;
+error_setg(errp, "Physical and logical block size must "
+   "be 512 for floppy");
+return;
 }
 
 /* rerror/werror aren't supported by fdc and therefore not even registered
@@ -570,20 +571,20 @@ static int floppy_drive_init(DeviceState *qdev)
 blkconf_apply_backend_options(&dev->conf, blk_is_read_only(dev->conf.blk),
   false, &local_err);
 if (local_err) {
-error_report_err(local_err);
-return -1;
+error_propagate(errp, local_err);
+return;
 }
 
 /* 'enospc' is the default for -drive, 'report' is what blk_new() gives us
  * for empty drives. */
 if (blk_get_on_error(dev->conf.blk, 0) != BLOCKDEV_ON_ERROR_ENOSPC &&
 blk_get_on_error(dev->conf.blk, 0) != BLOCKDEV_ON_ERROR_REPORT) {
-error_report("fdc doesn't support drive option werror");
-return -1;
+error_setg(errp, "fdc doesn't support drive option werror");
+return;
 }
 if (blk_get_on_error(dev->conf.blk, 1) != BLOCKDEV_ON_ERROR_REPORT) {
-error_report("fdc doesn't support drive option rerror");
-return -1;
+error_setg(errp, "fdc doesn't support drive option rerror");
+return;
 }
 
 drive->conf = &dev->conf;
@@ -599,14 +600,12 @@ static int floppy_drive_init(DeviceState *qdev)
 dev->type = drive->drive;
 
 fd_revalidate(drive);
-
-return 0;
 }
 
 static void floppy_drive_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *k = DEVICE_CLASS(klass);
-k->init = floppy_drive_init;
+k->realize = floppy_drive_realize;
 set_bit(DEVICE_CATEGORY_STORAGE, k->categories);
 k->bus_type = TYPE_FLOPPY_BUS;
 k->props = floppy_drive_properties;
-- 
2.9.4

[Qemu-devel] [PATCH v2 1/6] hw/ide: Convert DeviceClass init to realize

2017-08-04 Thread Mao Zhongyi

Replace init with realize in IDEDeviceClass, which has errp
as a parameter. So all the implementations now use error_setg
instead of error_report for reporting error.

Cc: John Snow 
Cc: Markus Armbruster 

Signed-off-by: Mao Zhongyi 
---
 hw/ide/core.c |  7 ++--
 hw/ide/qdev.c | 86 +++
 include/hw/ide/internal.h |  5 +--
 3 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0b48b64..7c86fc7 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -34,6 +34,7 @@
 #include "hw/block/block.h"
 #include "sysemu/block-backend.h"
 #include "qemu/cutils.h"
+#include "qemu/error-report.h"
 
 #include "hw/ide/internal.h"
 
@@ -2398,7 +2399,7 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, 
IDEDriveKind kind,
const char *version, const char *serial, const char *model,
uint64_t wwn,
uint32_t cylinders, uint32_t heads, uint32_t secs,
-   int chs_trans)
+   int chs_trans, Error **errp)
 {
 uint64_t nb_sectors;
 
@@ -2423,11 +2424,11 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, 
IDEDriveKind kind,
 blk_set_guest_block_size(blk, 2048);
 } else {
 if (!blk_is_inserted(s->blk)) {
-error_report("Device needs media, but drive is empty");
+error_setg(errp, "Device needs media, but drive is empty");
 return -1;
 }
 if (blk_is_read_only(blk)) {
-error_report("Can't use a read-only drive");
+error_setg(errp, "Can't use a read-only drive");
 return -1;
 }
 blk_set_dev_ops(blk, &ide_hd_block_ops, s);
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index cc2f5bd..d60ac25 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -80,7 +80,7 @@ static char *idebus_get_fw_dev_path(DeviceState *dev)
 return g_strdup(path);
 }
 
-static int ide_qdev_init(DeviceState *qdev)
+static void ide_qdev_realize(DeviceState *qdev, Error **errp)
 {
 IDEDevice *dev = IDE_DEVICE(qdev);
 IDEDeviceClass *dc = IDE_DEVICE_GET_CLASS(dev);
@@ -91,34 +91,31 @@ static int ide_qdev_init(DeviceState *qdev)
 }
 
 if (dev->unit >= bus->max_units) {
-error_report("Can't create IDE unit %d, bus supports only %d units",
+error_setg(errp, "Can't create IDE unit %d, bus supports only %d 
units",
  dev->unit, bus->max_units);
-goto err;
+return;
 }
 
 switch (dev->unit) {
 case 0:
 if (bus->master) {
-error_report("IDE unit %d is in use", dev->unit);
-goto err;
+error_setg(errp, "IDE unit %d is in use", dev->unit);
+return;
 }
 bus->master = dev;
 break;
 case 1:
 if (bus->slave) {
-error_report("IDE unit %d is in use", dev->unit);
-goto err;
+error_setg(errp, "IDE unit %d is in use", dev->unit);
+return;
 }
 bus->slave = dev;
 break;
 default:
-error_report("Invalid IDE unit %d", dev->unit);
-goto err;
+error_setg(errp, "Invalid IDE unit %d", dev->unit);
+return;
 }
-return dc->init(dev);
-
-err:
-return -1;
+dc->realize(dev, errp);
 }
 
 IDEDevice *ide_create_drive(IDEBus *bus, int unit, DriveInfo *drive)
@@ -159,7 +156,7 @@ typedef struct IDEDrive {
 IDEDevice dev;
 } IDEDrive;
 
-static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind)
+static void ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind, Error **errp)
 {
 IDEBus *bus = DO_UPCAST(IDEBus, qbus, dev->qdev.parent_bus);
 IDEState *s = bus->ifs + dev->unit;
@@ -168,8 +165,8 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind)
 
 if (!dev->conf.blk) {
 if (kind != IDE_CD) {
-error_report("No drive specified");
-return -1;
+error_setg(errp, "No drive specified");
+return;
 } else {
 /* Anonymous BlockBackend for an empty drive */
 dev->conf.blk = blk_new(0, BLK_PERM_ALL);
@@ -182,36 +179,36 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind 
kind)
 dev->conf.discard_granularity = 512;
 } else if (dev->conf.discard_granularity &&
dev->conf.discard_granularity != 512) {
-error_report("discard_granularity must be 512 for ide");
-return -1;
+error_setg(errp, "discard_granularity must be 512 for ide");
+return;
 }
 
 blkconf_blocksizes(&dev->conf);
 if (dev->conf.logical_block_size != 512) {
-error_report("logical_block_size must be 512 for IDE");
-return -1;
+error_setg(errp, "logical_block_size must be 512 for IDE");
+return;
 }
 
 blkconf_serial(&dev->conf, &dev->serial);
 if (kind != IDE_CD) {
 blkconf_geometry(&dev->conf, &dev->chs_trans, 65535,

[Qemu-devel] [PATCH v2 0/6] Convert to realize and improve error handling

2017-08-04 Thread Mao Zhongyi

This series mainly implements the conversions of ide, floppy and nvme
device to realize. Add some error handling messages and remove the local
variable local_err, use errp to propagate the error directly. Also
fix the unusual function name.

v2:
  -use bool as the return type instead of int. [Markus Armbruster & Stefan 
Hajnoczi]

Cc: John Snow 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Keith Busch 
Cc: Stefan Hajnoczi 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Gerd Hoffmann 
Cc: Markus Armbruster 

Mao Zhongyi (6):
  hw/ide: Convert DeviceClass init to realize
  hw/block/fdc: Convert to realize
  hw/block/nvme: Convert to realize
  hw/block: Fix the return type
  hw/block: Use errp directly rather than local_err
  dev-storage: Fix the unusual function name

 hw/block/block.c| 15 ---
 hw/block/dataplane/virtio-blk.c | 12 +++---
 hw/block/dataplane/virtio-blk.h |  2 +-
 hw/block/fdc.c  | 48 +
 hw/block/nvme.c | 24 +--
 hw/block/virtio-blk.c   | 17 +++-
 hw/ide/core.c   |  7 +--
 hw/ide/qdev.c   | 94 +++--
 hw/scsi/scsi-disk.c | 13 ++
 hw/usb/dev-storage.c| 29 ++---
 include/hw/block/block.h|  4 +-
 include/hw/ide/internal.h   |  5 ++-
 12 files changed, 125 insertions(+), 145 deletions(-)

-- 
2.9.4

[Qemu-devel] [PATCH v2 3/6] hw/block/nvme: Convert to realize

2017-08-04 Thread Mao Zhongyi

Convert nvme_init() to realize and rename it to nvme_realize().

Cc: John Snow 
Cc: Keith Busch 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Markus Armbruster 

Signed-off-by: Mao Zhongyi 
---
 hw/block/nvme.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6071dc1..43c60ab 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -920,7 +920,7 @@ static const MemoryRegionOps nvme_cmb_ops = {
 },
 };
 
-static int nvme_init(PCIDevice *pci_dev)
+static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
 NvmeIdCtrl *id = &n->id_ctrl;
@@ -931,24 +931,27 @@ static int nvme_init(PCIDevice *pci_dev)
 Error *local_err = NULL;
 
 if (!n->conf.blk) {
-return -1;
+error_setg(errp, "drive property not set");
+return;
 }
 
 bs_size = blk_getlength(n->conf.blk);
 if (bs_size < 0) {
-return -1;
+error_setg(errp, "could not get backing file size");
+return;
 }
 
 blkconf_serial(&n->conf, &n->serial);
 if (!n->serial) {
-return -1;
+error_setg(errp, "serial property not set");
+return;
 }
 blkconf_blocksizes(&n->conf);
 blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk),
   false, &local_err);
 if (local_err) {
-error_report_err(local_err);
-return -1;
+error_propagate(errp, local_err);
+return;
 }
 
 pci_conf = pci_dev->config;
@@ -1046,7 +1049,6 @@ static int nvme_init(PCIDevice *pci_dev)
 cpu_to_le64(n->ns_size >>
 id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas)].ds);
 }
-return 0;
 }
 
 static void nvme_exit(PCIDevice *pci_dev)
@@ -1081,7 +1083,7 @@ static void nvme_class_init(ObjectClass *oc, void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
 
-pc->init = nvme_init;
+pc->realize = nvme_realize;
 pc->exit = nvme_exit;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
 pc->vendor_id = PCI_VENDOR_ID_INTEL;
-- 
2.9.4

Re: [Qemu-devel] [PATCH for-2.11 02/23] tcg: Rearrange ldst label tracking

2017-08-04 Thread Paolo Bonzini

On 04/08/2017 07:44, Richard Henderson wrote:
> Dispense with TCGBackendData, as it has never been used for more than
> holding a single pointer.  Use a define in the cpu/tcg-target.h to
> signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
> tcg-be-null.h stubs.  Rename tcg-be-ldst.h to tcg-ldst.inc.c.
> 
> Signed-off-by: Richard Henderson 

Reviewed-by: Paolo Bonzini 

> ---
>  tcg/aarch64/tcg-target.h  |  4 
>  tcg/arm/tcg-target.h  |  4 
>  tcg/i386/tcg-target.h |  4 
>  tcg/ia64/tcg-target.h |  4 
>  tcg/mips/tcg-target.h |  4 
>  tcg/ppc/tcg-target.h  |  4 
>  tcg/s390/tcg-target.h |  4 
>  tcg/tcg-be-null.h | 44 
> ---
>  tcg/tcg.h |  6 +++--
>  tcg/aarch64/tcg-target.inc.c  |  3 ++-
>  tcg/arm/tcg-target.inc.c  |  3 ++-
>  tcg/i386/tcg-target.inc.c |  4 ++--
>  tcg/ia64/tcg-target.inc.c | 19 ---
>  tcg/mips/tcg-target.inc.c |  4 ++--
>  tcg/ppc/tcg-target.inc.c  |  4 ++--
>  tcg/s390/tcg-target.inc.c |  4 ++--
>  tcg/sparc/tcg-target.inc.c|  2 --
>  tcg/{tcg-be-ldst.h => tcg-ldst.inc.c} | 27 -
>  tcg/tcg.c | 17 +++---
>  tcg/tci/tcg-target.inc.c  |  2 --
>  20 files changed, 61 insertions(+), 106 deletions(-)
>  delete mode 100644 tcg/tcg-be-null.h
>  rename tcg/{tcg-be-ldst.h => tcg-ldst.inc.c} (85%)
> 
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 3c3b1e603d..484cf6236c 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -120,4 +120,8 @@ static inline void flush_icache_range(uintptr_t start, 
> uintptr_t stop)
>  
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif /* AARCH64_TCG_TARGET_H */
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index b836f7f127..55de35a691 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -138,4 +138,8 @@ static inline void flush_icache_range(uintptr_t start, 
> uintptr_t stop)
>  /* not defined -- call should be eliminated at compile time */
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 2fd28fa6a5..11ee7fadd1 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -186,4 +186,8 @@ static inline void tb_target_set_jmp_target(uintptr_t 
> tc_ptr,
>  
>  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
> index 5c9ca8c1ce..83107e1407 100644
> --- a/tcg/ia64/tcg-target.h
> +++ b/tcg/ia64/tcg-target.h
> @@ -199,4 +199,8 @@ static inline void flush_icache_range(uintptr_t start, 
> uintptr_t stop)
>  /* not defined -- call should be eliminated at compile time */
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index 557c8ddc46..bea5290b9f 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -209,4 +209,8 @@ static inline void flush_icache_range(uintptr_t start, 
> uintptr_t stop)
>  
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index 5bab3387e5..c1226ea5b6 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -127,4 +127,8 @@ extern bool have_isa_3_00;
>  void flush_icache_range(uintptr_t start, uintptr_t stop);
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
> index 1398952d6b..8fea9646b4 100644
> --- a/tcg/s390/tcg-target.h
> +++ b/tcg/s390/tcg-target.h
> @@ -153,4 +153,8 @@ static inline void tb_target_set_jmp_target(uintptr_t 
> tc_ptr,
>  /* no need to flush icache explicitly */
>  }
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/tcg-be-null.h b/tcg/tcg-be-null.h
> deleted file mode 100644
> index 5222fe29e2..00
> --- a/tcg/tcg-be-null.h
> +++ /dev/null
> @@ -1,44 +0,0 @@
> -/*
> - * TCG Backend Data: No backend data
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a 
> copy

[Qemu-devel] [PATCH v2 6/6] dev-storage: Fix the unusual function name

2017-08-04 Thread Mao Zhongyi

The function name of usb_msd_{realize,unrealize}_*,
usb_msd_class_initfn_* are unusual. Rename it to
usb_msd_*_{realize,unrealize}, usb_msd_class_*_initfn.

Cc: Gerd Hoffmann 

Signed-off-by: Mao Zhongyi 
---
 hw/usb/dev-storage.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 801f552..2a05cd5 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -596,7 +596,7 @@ static void usb_msd_unrealize_storage(USBDevice *dev, Error 
**errp)
 object_unref(OBJECT(&s->bus));
 }
 
-static void usb_msd_realize_storage(USBDevice *dev, Error **errp)
+static void usb_msd_storage_realize(USBDevice *dev, Error **errp)
 {
 MSDState *s = USB_STORAGE_DEV(dev);
 BlockBackend *blk = s->conf.blk;
@@ -643,14 +643,14 @@ static void usb_msd_realize_storage(USBDevice *dev, Error 
**errp)
 s->scsi_dev = scsi_dev;
 }
 
-static void usb_msd_unrealize_bot(USBDevice *dev, Error **errp)
+static void usb_msd_bot_unrealize(USBDevice *dev, Error **errp)
 {
 MSDState *s = USB_STORAGE_DEV(dev);
 
 object_unref(OBJECT(&s->bus));
 }
 
-static void usb_msd_realize_bot(USBDevice *dev, Error **errp)
+static void usb_msd_bot_realize(USBDevice *dev, Error **errp)
 {
 MSDState *s = USB_STORAGE_DEV(dev);
 DeviceState *d = DEVICE(dev);
@@ -764,12 +764,12 @@ static void usb_msd_class_initfn_common(ObjectClass 
*klass, void *data)
 dc->vmsd = &vmstate_usb_msd;
 }
 
-static void usb_msd_class_initfn_storage(ObjectClass *klass, void *data)
+static void usb_msd_class_storage_initfn(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 USBDeviceClass *uc = USB_DEVICE_CLASS(klass);
 
-uc->realize = usb_msd_realize_storage;
+uc->realize = usb_msd_storage_realize;
 uc->unrealize = usb_msd_unrealize_storage;
 dc->props = msd_properties;
 }
@@ -828,26 +828,26 @@ static void usb_msd_instance_init(Object *obj)
 object_property_set_int(obj, -1, "bootindex", NULL);
 }
 
-static void usb_msd_class_initfn_bot(ObjectClass *klass, void *data)
+static void usb_msd_class_bot_initfn(ObjectClass *klass, void *data)
 {
 USBDeviceClass *uc = USB_DEVICE_CLASS(klass);
 
-uc->realize = usb_msd_realize_bot;
-uc->unrealize = usb_msd_unrealize_bot;
+uc->realize = usb_msd_bot_realize;
+uc->unrealize = usb_msd_bot_unrealize;
 uc->attached_settable = true;
 }
 
 static const TypeInfo msd_info = {
 .name  = "usb-storage",
 .parent= TYPE_USB_STORAGE,
-.class_init= usb_msd_class_initfn_storage,
+.class_init= usb_msd_class_storage_initfn,
 .instance_init = usb_msd_instance_init,
 };
 
 static const TypeInfo bot_info = {
 .name  = "usb-bot",
 .parent= TYPE_USB_STORAGE,
-.class_init= usb_msd_class_initfn_bot,
+.class_init= usb_msd_class_bot_initfn,
 };
 
 static void usb_msd_register_types(void)
-- 
2.9.4

Re: [Qemu-devel] [PATCH] pc/acpi: Fix booting of macOS with Clover EFI bootloader

2017-08-04 Thread Dhiru Kholia

On Fri, Aug 4, 2017 at 2:35 PM, Igor Mammedov  wrote:
> On Fri,  4 Aug 2017 12:15:40 +0530
> Dhiru Kholia  wrote:
>
>> This was tested with macOS 10.12.5 and Clover r4114.
>>
>> Without this patch, the macOS boot process gets stuck at the Apple logo
>> without showing any progress bar.
>>
>> I have documented the process of running macOS on QEMU/KVM at,
>>
>> https://github.com/kholia/OSX-KVM/
>>
>> Instead of using this patch, adding an additional command-line knob
>> which exposes this setting (force_rev1_fadt) to the user might be a more
>> general solution.
>
> it's been reported that OVMF had issues that were fixed,
> you probably want to read this thread:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg468456.html

Hi Igor,

I have now tested various OVMF versions with the latest QEMU (commit
aaaec6acad7c).

When using edk2.git-ovmf-x64-0-20170726.b2826.gbb4831c03d.noarch.rpm
from [1] macOS does not boot and gets stuck. The CPU usage goes to
100%. I haven't confirmed this but this OVMF build should have both
198a46d and 072060a edk2 commits in it, based on the build date.

The OVMF blob from Gabriel Somlo [2] hangs on the OVMF logo screen and
it too results in 100% CPU usage after hanging.

I am using "boot-clover.sh" script from my repository [4] to test the
various OVMF versions.

The only OVMF blob which works with the current QEMU for booting macOS
is the one from Proxmox [3]. Unfortunately, I don't know the
corresponding commit in the edk2 repository for this working OVMF
blob.

References,

[1] https://www.kraxel.org/repos/jenkins/edk2/

[2] https://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/

[3] https://git.proxmox.com/?p=pve-qemu-kvm.git;a=tree;f=debian

[4] https://github.com/kholia/OSX-KVM/

It would be great if someone else could reproduce these results too.

Thanks,
Dhiru

Re: [Qemu-devel] [PATCHv2 4/4] scsi: Add 'enclosure' option for scsi devices

2017-08-04 Thread Paolo Bonzini

On 04/08/2017 10:36, Hannes Reinecke wrote:
> Make enclosure support optional with the 'enclosure' argument to
> the scsi device.
> Adding 'enclosure=on' as option to the SCSI device will present
> an enclosure service device on LUN0, either as a stand-alone
> LUN (in case LUN0 is not assigned) or by setting the EncServ bit
> int the inquiry data if LUN0 is assigned to a block devices.
> 
> Signed-off-by: Hannes Reinecke 
> ---
>  hw/scsi/scsi-bus.c | 11 ---
>  hw/scsi/scsi-disk.c|  4 +++-
>  include/hw/scsi/scsi.h |  1 +
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index 79a222f..c11422b 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -22,6 +22,7 @@ static Property scsi_props[] = {
>  DEFINE_PROP_UINT32("channel", SCSIDevice, channel, 0),
>  DEFINE_PROP_UINT32("scsi-id", SCSIDevice, id, -1),
>  DEFINE_PROP_UINT32("lun", SCSIDevice, lun, -1),
> +DEFINE_PROP_BOOL("enclosure", SCSIDevice, enclosure, false),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -494,11 +495,14 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq 
> *r)
>  if (r->req.lun != 0) {
>  r->buf[0] = TYPE_NO_LUN;
>  } else {
> -r->buf[0] = TYPE_ENCLOSURE;
> +r->buf[0] = r->req.dev->enclosure ?
> +TYPE_ENCLOSURE : TYPE_NOT_PRESENT | TYPE_INACTIVE;
>  r->buf[2] = 5; /* Version */
>  r->buf[3] = 2 | 0x10; /* HiSup, response data format */
>  r->buf[4] = r->len - 5; /* Additional Length = (Len - 1) - 4 */
> -r->buf[6] = 0x40; /* Enclosure service */
> +if (r->req.dev->enclosure) {
> +r->buf[6] = 0x40; /* Enclosure service */
> +}
>  r->buf[7] = 0x10 | (r->req.bus->info->tcq ? 0x02 : 0); /* Sync, TCQ. 
>  */
>  memcpy(&r->buf[8], "QEMU", 8);
>  memcpy(&r->buf[16], "QEMU TARGET ", 16);
> @@ -600,7 +604,8 @@ static int32_t scsi_target_send_command(SCSIRequest *req, 
> uint8_t *buf)
>  }
>  break;
>  case RECEIVE_DIAGNOSTIC:
> -if (!scsi_target_emulate_receive_diagnostic(r)) {
> +if (!r->req.dev->enclosure ||
> +!scsi_target_emulate_receive_diagnostic(r)) {
>  goto illegal_request;
>  }
>  break;
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 5f1e5e8..153d97d 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -792,7 +792,9 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
> uint8_t *outbuf)
> the additional length is not adjusted */
>  outbuf[4] = 36 - 5;
>  }
> -
> +if (s->qdev.lun == 0 && s->qdev.enclosure) {
> +outbuf[6] = 0x40; /* Enclosure service */
> +}

Should this really be set on disks even if you do have a LUN0?

Why don't you instead add a very simple scsi-enclosure device that you
can specify as the LUN0?

Thanks,

Paolo

>  /* Sync data transfer and TCQ.  */
>  outbuf[7] = 0x10 | (req->bus->info->tcq ? 0x02 : 0);
>  return buflen;
> diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
> index 6b85786..243c185 100644
> --- a/include/hw/scsi/scsi.h
> +++ b/include/hw/scsi/scsi.h
> @@ -100,6 +100,7 @@ struct SCSIDevice
>  BlockConf conf;
>  SCSISense unit_attention;
>  bool sense_is_ua;
> +bool enclosure;
>  uint8_t sense[SCSI_SENSE_BUF_SIZE];
>  uint32_t sense_len;
>  QTAILQ_HEAD(, SCSIRequest) requests;
>

Re: [Qemu-devel] [PATCHv2 4/4] scsi: Add 'enclosure' option for scsi devices

2017-08-04 Thread Hannes Reinecke

On 08/04/2017 12:48 PM, Paolo Bonzini wrote:
> On 04/08/2017 10:36, Hannes Reinecke wrote:
>> Make enclosure support optional with the 'enclosure' argument to
>> the scsi device.
>> Adding 'enclosure=on' as option to the SCSI device will present
>> an enclosure service device on LUN0, either as a stand-alone
>> LUN (in case LUN0 is not assigned) or by setting the EncServ bit
>> int the inquiry data if LUN0 is assigned to a block devices.
>>
>> Signed-off-by: Hannes Reinecke 
>> ---
>>  hw/scsi/scsi-bus.c | 11 ---
>>  hw/scsi/scsi-disk.c|  4 +++-
>>  include/hw/scsi/scsi.h |  1 +
>>  3 files changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
>> index 79a222f..c11422b 100644
>> --- a/hw/scsi/scsi-bus.c
>> +++ b/hw/scsi/scsi-bus.c
>> @@ -22,6 +22,7 @@ static Property scsi_props[] = {
>>  DEFINE_PROP_UINT32("channel", SCSIDevice, channel, 0),
>>  DEFINE_PROP_UINT32("scsi-id", SCSIDevice, id, -1),
>>  DEFINE_PROP_UINT32("lun", SCSIDevice, lun, -1),
>> +DEFINE_PROP_BOOL("enclosure", SCSIDevice, enclosure, false),
>>  DEFINE_PROP_END_OF_LIST(),
>>  };
>>  
>> @@ -494,11 +495,14 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq 
>> *r)
>>  if (r->req.lun != 0) {
>>  r->buf[0] = TYPE_NO_LUN;
>>  } else {
>> -r->buf[0] = TYPE_ENCLOSURE;
>> +r->buf[0] = r->req.dev->enclosure ?
>> +TYPE_ENCLOSURE : TYPE_NOT_PRESENT | TYPE_INACTIVE;
>>  r->buf[2] = 5; /* Version */
>>  r->buf[3] = 2 | 0x10; /* HiSup, response data format */
>>  r->buf[4] = r->len - 5; /* Additional Length = (Len - 1) - 4 */
>> -r->buf[6] = 0x40; /* Enclosure service */
>> +if (r->req.dev->enclosure) {
>> +r->buf[6] = 0x40; /* Enclosure service */
>> +}
>>  r->buf[7] = 0x10 | (r->req.bus->info->tcq ? 0x02 : 0); /* Sync, 
>> TCQ.  */
>>  memcpy(&r->buf[8], "QEMU", 8);
>>  memcpy(&r->buf[16], "QEMU TARGET ", 16);
>> @@ -600,7 +604,8 @@ static int32_t scsi_target_send_command(SCSIRequest 
>> *req, uint8_t *buf)
>>  }
>>  break;
>>  case RECEIVE_DIAGNOSTIC:
>> -if (!scsi_target_emulate_receive_diagnostic(r)) {
>> +if (!r->req.dev->enclosure ||
>> +!scsi_target_emulate_receive_diagnostic(r)) {
>>  goto illegal_request;
>>  }
>>  break;
>> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
>> index 5f1e5e8..153d97d 100644
>> --- a/hw/scsi/scsi-disk.c
>> +++ b/hw/scsi/scsi-disk.c
>> @@ -792,7 +792,9 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
>> uint8_t *outbuf)
>> the additional length is not adjusted */
>>  outbuf[4] = 36 - 5;
>>  }
>> -
>> +if (s->qdev.lun == 0 && s->qdev.enclosure) {
>> +outbuf[6] = 0x40; /* Enclosure service */
>> +}
> 
> Should this really be set on disks even if you do have a LUN0?
> 
Why not? You _can_ have embedded enclosure devices; that's what this bit
is for...

> Why don't you instead add a very simple scsi-enclosure device that you
> can specify as the LUN0?
> 
You don't have to.
Once you leave LUN0 free (eg by assigning your first disk LUN1) you'll
get this device by virtue of the current LUN0 emulation.

But yeah, maybe I can be doing a separate enclosure device.
Will be checking if and how I can make it to work.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [Qemu-devel] [PATCH v2 4/6] hw/block: Fix the return type

2017-08-04 Thread Stefan Hajnoczi

On Fri, Aug 04, 2017 at 06:26:41PM +0800, Mao Zhongyi wrote:
> When the function no success value to transmit, it usually make the
> function return void. It has turned out not to be a success, because
> it means that the extra local_err variable and error_propagate() will
> be needed. It leads to cumbersome code, therefore, transmit success/
> failure in the return value is worth.
> 
> So fix the return type of blkconf_apply_backend_options(),
> blkconf_geometry() and virtio_blk_data_plane_create() to avoid it.
> 
> Cc: John Snow 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: Stefan Hajnoczi 
> 
> Signed-off-by: Mao Zhongyi 
> ---
>  hw/block/block.c| 15 +--
>  hw/block/dataplane/virtio-blk.c | 12 +++-
>  hw/block/dataplane/virtio-blk.h |  2 +-
>  include/hw/block/block.h|  4 ++--
>  4 files changed, 19 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] Is the use of bdrv_getlength() in handle_aiocb_write_zeroes() kosher?

2017-08-04 Thread Denis V. Lunev

On 08/04/2017 03:16 PM, Markus Armbruster wrote:
> Denis, you added this in commit d50d822:
>
> #ifdef CONFIG_FALLOCATE
> if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
> int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, 
> aiocb->aio_nbytes);
> if (ret == 0 || ret != -ENOTSUP) {
> return ret;
> }
> s->has_fallocate = false;
> }
> #endif
>
> bdrv_getlength() can fail.  Does it do the right thing then?  For what
> it's worth, the comparison of its value is signed.
fallocate() with 0 flags can work only beyond end of file
or on top of the hole. Thus the check is made to validate
that we are beyond EOF.

Technically fallocate should fail if that condition will be
violated. But you right, we can add sanity check here.
This would not harm.

Should I send it?

Den

Re: [Qemu-devel] Cleaning up handling of CPU exceptions accessing non-existent memory/devices

2017-08-04 Thread Peter Maydell

On 1 August 2017 at 11:50, Peter Maydell  wrote:
> My proposal is that we should define a new QOM CPU hook:
>   void (*do_transaction_failed)(CPUState *cpu, hwaddr physaddr, vaddr addr,
> unsigned size, MMUAccessType access_type,
> MemTxAttrs attrs, MemTxResult response);

I realised that we probably also want the 'uintptr_t retaddr'
and 'int mmu_idx' that tlb_fill() and the do_unaligned_access
hook both already have.

I have some patches mostly-ready which implement the
common code changes plus the ARM mode implementation of the
hook function.

thanks
-- PMM

Re: [Qemu-devel] Is the use of bdrv_getlength() in parallels.c kosher?

2017-08-04 Thread Denis V. Lunev

On 08/04/2017 03:31 PM, Markus Armbruster wrote:
> Same question for allocate_clusters() in parallels.c, commit 5a41e1f,
> modified in commit ddd2ef2:
>
> if (s->data_end + space > bdrv_getlength(bs->file->bs) >> 
> BDRV_SECTOR_BITS) {
>
> bdrv_getlength() can fail.  Does it do the right thing then?  For what
> it's worth, the comparison of its value is signed.
>
> There's another one in parallels_open():
>
> if (!(flags & BDRV_O_RESIZE) || !bdrv_has_zero_init(bs->file->bs) ||
> bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs),
>   PREALLOC_MODE_OFF, NULL) != 0) {
> s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
> }
the same as in the previous letter. Saniti checks will not harm.
I'll make a patch.

Thank you for the suggestion.

Den

Re: [Qemu-devel] [PATCH for 2.10] block: use 1 MB bounce buffers for crypto instead of 16KB

2017-08-04 Thread Daniel P. Berrange

On Fri, Aug 04, 2017 at 01:48:01PM +0100, Stefan Hajnoczi wrote:
> On Fri, Aug 04, 2017 at 11:51:36AM +0100, Daniel P. Berrange wrote:
> > Using 16KB bounce buffers creates a significant performance
> > penalty for I/O to encrypted volumes on storage with high
> > I/O latency (rotating rust & network drives), because it
> > triggers lots of fairly small I/O operations.
> > 
> > On tests with rotating rust, and cache=none|directsync,
> > write speed increased from 2MiB/s to 32MiB/s, on a par
> > with that achieved by the in-kernel luks driver.
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  block/crypto.c | 12 +---
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/block/crypto.c b/block/crypto.c
> > index 58ef6f2f52..207941db9a 100644
> > --- a/block/crypto.c
> > +++ b/block/crypto.c
> > @@ -379,7 +379,7 @@ static void block_crypto_close(BlockDriverState *bs)
> >  }
> >  
> >  
> > -#define BLOCK_CRYPTO_MAX_SECTORS 32
> > +#define BLOCK_CRYPTO_MAX_SECTORS 2048
> >  
> >  static coroutine_fn int
> >  block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
> > @@ -396,9 +396,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t 
> > sector_num,
> >  
> >  qemu_iovec_init(&hd_qiov, qiov->niov);
> >  
> > -/* Bounce buffer so we have a linear mem region for
> > - * entire sector. XXX optimize so we avoid bounce
> > - * buffer in case that qiov->niov == 1
> > +/* Bounce buffer because we're not permitted to touch
> > + * contents of qiov - it points to guest memory.
> >   */
> >  cipher_data =
> >  qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 
> > 512,
> 
> In the *read* case you can modify the data buffers in-place.  But the
> guest might see intermediate states in its buffers - not sure whether
> this could pose a security problem.

Whether its a risk or not depends on the choice of crypto parameters, as
exposing ciphertext to the guest might make watermarking attacks easier
to perform. Probably not a problem in practice, but I prefer to err on
the side of caution since I can't be sure it is safe.

> Reviewed-by: Stefan Hajnoczi 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] block: move trace probes into bdrv_co_preadv|pwritev

2017-08-04 Thread Stefan Hajnoczi

On Fri, Aug 04, 2017 at 11:50:36AM +0100, Daniel P. Berrange wrote:
> There are trace probes in bdrv_co_readv|writev, however, the
> block drivers are being gradually moved over to using the
> bdrv_co_preadv|pwritev functions instead. As a result some
> block drivers miss the current probes. Move the probes
> into bdrv_co_preadv|pwritev instead, so that they are triggered
> by more (all?) I/O code paths.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/io.c | 8 
>  block/trace-events | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)

Thanks, applied to my tracing tree:
https://github.com/stefanha/qemu/commits/tracing

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 5/6] hw/block: Use errp directly rather than local_err

2017-08-04 Thread Stefan Hajnoczi

On Fri, Aug 04, 2017 at 06:26:42PM +0800, Mao Zhongyi wrote:
> Pass the error message to errp directly rather than the local
> variable local_err and propagate it to errp via error_propagate().
> 
> Cc: John Snow 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: Keith Busch 
> Cc: Stefan Hajnoczi 
> Cc: "Michael S. Tsirkin" 
> Cc: Paolo Bonzini 
> Cc: Gerd Hoffmann 
> Cc: Markus Armbruster 
> 
> Signed-off-by: Mao Zhongyi 
> ---
>  hw/block/fdc.c| 17 ++---
>  hw/block/nvme.c   |  8 +++-
>  hw/block/virtio-blk.c | 17 +
>  hw/ide/qdev.c | 12 
>  hw/scsi/scsi-disk.c   | 13 -
>  hw/usb/dev-storage.c  |  9 +++--
>  6 files changed, 25 insertions(+), 51 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PULL 0/8] target-mips queue

2017-08-04 Thread Peter Maydell

On 3 August 2017 at 15:45, Yongbok Kim  wrote:
> The following changes since commit aaaec6acad7cf97372d48c1b09126a09697519c8:
>
>   Update version for v2.10.0-rc1 release (2017-08-02 16:36:32 +0100)
>
> are available in the git repository at:
>
>   git://github.com/yongbok/upstream-qemu.git tags/mips-20170803
>
> for you to fetch changes up to d673a68db6963e86536b125af464bb6ed03eba33:
>
>   target/mips: Fix RDHWR CC with icount (2017-08-02 22:18:13 +0100)
>
> 
> MIPS patches 2017-08-03
>
> Changes:
> KVM T&E segment support for TCG
> malta: leave space for the bootmap after the initrd
> Apply CP0.PageMask before writing into TLB entry
> Fix fallout from indirect branch optimisation
>

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH v4 7/9] s390x/pci: fence off instructions for non-pci

2017-08-04 Thread David Hildenbrand

On 04.08.2017 13:29, Cornelia Huck wrote:
> If a guest running on a machine without zpci issues a pci instruction,
> throw them an exception.
> 
> Reviewed-by: Thomas Huth 
> Signed-off-by: Cornelia Huck 
> ---
>  target/s390x/kvm.c | 54 
> +-
>  1 file changed, 41 insertions(+), 13 deletions(-)
> 
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index bc62bba5b7..9de165d8b1 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -1191,7 +1191,11 @@ static int kvm_clp_service_call(S390CPU *cpu, struct 
> kvm_run *run)
>  {
>  uint8_t r2 = (run->s390_sieic.ipb & 0x000f) >> 16;
>  
> -return clp_service_call(cpu, r2);
> +if (s390_has_feat(S390_FEAT_ZPCI)) {
> +return clp_service_call(cpu, r2);
> +} else {
> +return -1;
> +}

I am a fan of dropping these else case and returning directly. But that
is just my opinion.

(applies to all changes in this patch)


-- 

Thanks,

David

Re: [Qemu-devel] [PATCH v4 5/9] s390x/ccw: create s390 phb conditionally

2017-08-04 Thread David Hildenbrand

On 04.08.2017 13:29, Cornelia Huck wrote:
> Don't create the s390 pci host bridge if we do not provide the zpci
> facility.
> 
> Reviewed-by: Thomas Huth 
> Acked-by: Christian Borntraeger 
> Signed-off-by: Cornelia Huck 

This works, because s390_init_cpus(machine) (which sets up features) is
called before adding the bus. Mind to add a comment so nobody tries to
reshuffle these functions?

> ---
>  hw/s390x/s390-virtio-ccw.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 1c7af39ce6..8be4a541c1 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -118,7 +118,6 @@ static void ccw_init(MachineState *machine)
>  {
>  int ret;
>  VirtualCssBus *css_bus;
> -DeviceState *dev;
>  
>  s390_sclp_init();
>  s390_memory_init(machine->ram_size);
> @@ -134,10 +133,13 @@ static void ccw_init(MachineState *machine)
>machine->initrd_filename, "s390-ccw.img",
>"s390-netboot.img", true);
>  
> -dev = qdev_create(NULL, TYPE_S390_PCI_HOST_BRIDGE);
> -object_property_add_child(qdev_get_machine(), TYPE_S390_PCI_HOST_BRIDGE,
> -  OBJECT(dev), NULL);
> -qdev_init_nofail(dev);
> +if (s390_has_feat(S390_FEAT_ZPCI)) {
> +DeviceState *dev = qdev_create(NULL, TYPE_S390_PCI_HOST_BRIDGE);
> +object_property_add_child(qdev_get_machine(),
> +  TYPE_S390_PCI_HOST_BRIDGE,
> +  OBJECT(dev), NULL);
> +qdev_init_nofail(dev);
> +}
>  
>  /* register hypercalls */
>  virtio_ccw_register_hcalls();
> 


-- 

Thanks,

David

Re: [Qemu-devel] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Stefan Hajnoczi

On Fri, Aug 04, 2017 at 11:50:59AM +0100, Daniel P. Berrange wrote:
> Signed-off-by: Daniel P. Berrange 
> ---
>  include/block/block_int.h | 29 +
>  1 file changed, 29 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 8/9] s390x/kvm: msi route fixup for non-pci

2017-08-04 Thread David Hildenbrand

On 04.08.2017 13:29, Cornelia Huck wrote:
> If we don't provide pci, we cannot have a pci device for which we
> have to translate to adapter routes: just return -ENODEV.
> 
> Signed-off-by: Cornelia Huck 
> ---
>  target/s390x/kvm.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index 9de165d8b1..d8db1cbf6e 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -2424,6 +2424,12 @@ int kvm_arch_fixup_msi_route(struct 
> kvm_irq_routing_entry *route,
>  uint32_t idx = data >> ZPCI_MSI_VEC_BITS;
>  uint32_t vec = data & ZPCI_MSI_VEC_MASK;
>  
> +if (!s390_has_feat(S390_FEAT_ZPCI)) {
> +/* How can we get here without pci enabled? */
> +g_assert(false);
> +return -ENODEV;
> +}
> +

/* How can we get here without pci enabled? */
g_assert(s390_has_feat(S390_FEAT_ZPCI));

?

>  pbdev = s390_pci_find_dev_by_idx(s390_get_phb(), idx);
>  if (!pbdev) {
>  DPRINTF("add_msi_route no dev\n");
> 


-- 

Thanks,

David

[Qemu-devel] Is the use of bdrv_getlength() in quorum_co_flush() kosher?

2017-08-04 Thread Markus Armbruster

Have a look at quorum_co_flush():

quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
  bdrv_getlength(s->children[i]->bs),
  s->children[i]->bs->node_name, result);

bdrv_getlength() can fail.  Does it do the right thing then?

Re: [Qemu-devel] [PATCH for 2.10] block: use 1 MB bounce buffers for crypto instead of 16KB

2017-08-04 Thread Stefan Hajnoczi

On Fri, Aug 04, 2017 at 11:51:36AM +0100, Daniel P. Berrange wrote:
> Using 16KB bounce buffers creates a significant performance
> penalty for I/O to encrypted volumes on storage with high
> I/O latency (rotating rust & network drives), because it
> triggers lots of fairly small I/O operations.
> 
> On tests with rotating rust, and cache=none|directsync,
> write speed increased from 2MiB/s to 32MiB/s, on a par
> with that achieved by the in-kernel luks driver.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/crypto.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/block/crypto.c b/block/crypto.c
> index 58ef6f2f52..207941db9a 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
> @@ -379,7 +379,7 @@ static void block_crypto_close(BlockDriverState *bs)
>  }
>  
>  
> -#define BLOCK_CRYPTO_MAX_SECTORS 32
> +#define BLOCK_CRYPTO_MAX_SECTORS 2048
>  
>  static coroutine_fn int
>  block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
> @@ -396,9 +396,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t 
> sector_num,
>  
>  qemu_iovec_init(&hd_qiov, qiov->niov);
>  
> -/* Bounce buffer so we have a linear mem region for
> - * entire sector. XXX optimize so we avoid bounce
> - * buffer in case that qiov->niov == 1
> +/* Bounce buffer because we're not permitted to touch
> + * contents of qiov - it points to guest memory.
>   */
>  cipher_data =
>  qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,

In the *read* case you can modify the data buffers in-place.  But the
guest might see intermediate states in its buffers - not sure whether
this could pose a security problem.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

[Qemu-devel] Is the use of bdrv_getlength() in vhdx*.c kosher?

2017-08-04 Thread Markus Armbruster

bdrv_getlength() can fail.  There are several calls in vhdx*.c that
don't seem to check.  Bug or not?

Re: [Qemu-devel] [Qemu-block] [PATCH] qemu-img: Clarify about relative backing file options

2017-08-04 Thread Eric Blake

On 08/04/2017 08:55 AM, Fam Zheng wrote:
> It's not too surprising when a user specifies the backing file relative
> to the current working directory instead of the top layer image. This
> causes error when they differ. Though the error message has enough
> information to infer the fact about the misunderstanding, it is better
> if we document this explicitly, so that users don't have to learn from
> mistakes.
> 
> Signed-off-by: Fam Zheng 
> ---
>  qemu-img.texi | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/qemu-img.texi b/qemu-img.texi
> index 72dabd6b3e..cf079f90e2 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -244,6 +244,9 @@ only the differences from @var{backing_file}. No size 
> needs to be specified in
>  this case. @var{backing_file} will never be modified unless you use the
>  @code{commit} monitor command (or qemu-img commit).
>  
> +If a relative path name is given, the backing file is looked up against
> +@var{filename}, rather than the current working directory.

Maybe better as:
the backing file is looked up relative to the directory containing
@var{filename}

In all three places

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.10] block: move trace probes into bdrv_co_preadv|pwritev

2017-08-04 Thread Eric Blake

On 08/04/2017 05:50 AM, Daniel P. Berrange wrote:
> There are trace probes in bdrv_co_readv|writev, however, the
> block drivers are being gradually moved over to using the
> bdrv_co_preadv|pwritev functions instead. As a result some
> block drivers miss the current probes. Move the probes
> into bdrv_co_preadv|pwritev instead, so that they are triggered
> by more (all?) I/O code paths.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/io.c | 8 
>  block/trace-events | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)

Arguably, the traces are broken, so this is a bug fix worthy of 2.10.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v4 9/9] s390x: refine pci dependencies

2017-08-04 Thread Cornelia Huck

VIRTIO_PCI should properly depend on CONFIG_PCI.
With this change, we can switch off pci for s390x by removing
'CONFIG_PCI=y' from the default config.

Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 default-configs/s390x-softmmu.mak | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/default-configs/s390x-softmmu.mak 
b/default-configs/s390x-softmmu.mak
index b227a36179..7adaa4a4fe 100644
--- a/default-configs/s390x-softmmu.mak
+++ b/default-configs/s390x-softmmu.mak
@@ -1,5 +1,5 @@
 CONFIG_PCI=y
-CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_PCI=$(CONFIG_PCI)
 CONFIG_VHOST_USER_SCSI=$(CONFIG_LINUX)
 CONFIG_VIRTIO=y
 CONFIG_SCLPCONSOLE=y
-- 
2.13.3

Re: [Qemu-devel] [Qemu-block] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Eric Blake

On 08/04/2017 05:50 AM, Daniel P. Berrange wrote:
> Signed-off-by: Daniel P. Berrange 
> ---
>  include/block/block_int.h | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index d4f4ea7584..deb81a58bd 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -147,12 +147,41 @@ struct BlockDriver {
>  
>  int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
>  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
> +
> +/**
> + * @offset: position in bytes to read at
> + * @bytes: number of bytes to read
> + * @qiov: the buffers to fill with read data
> + *
> + * @offset and @bytes will be a multiple of 'request_alignment',
> + * but the length of individual @qiov elements does not have to
> + * be a multiple.
> + *
> + * @bytes may be less than the total sizeof @iov, and will be
> + * no larger than 'max_transfer'.
> + *
> + * The buffer in @qiov may point directly to guest memory.
> + */
>  int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);

Documenting flags would also be nice (at the moment, flags is always 0)

>  int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
>  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
>  int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs,
>  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags);
> +/**
> + * @offset: position in bytes to write at
> + * @bytes: number of bytes to write
> + * @qiov: the buffers containing data to write
> + *
> + * @offset and @bytes will be a multiple of 'request_alignment',
> + * but the length of individual @qiov elements does not have to
> + * be a multiple.
> + *
> + * @bytes may be less than the total sizeof @iov, and will be
> + * no larger than 'max_transfer'.
> + *
> + * The buffer in @qiov may point directly to guest memory.
> + */
>  int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);

Likewise for documentation, but here, flags will never exceed
bs->supported_write_flags.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Eric Blake

On 08/04/2017 09:08 AM, Daniel P. Berrange wrote:
> Signed-off-by: Daniel P. Berrange 
> ---
> 
>  - Clarify that @bytes matches @qiov total size (Kevin)
> 
>  include/block/block_int.h | 31 +++
>  1 file changed, 31 insertions(+)

[looks like the nongnu.org infrastructure is having heavy load today, so
mails are getting through more slowly than usual - leads to lots of
crossed emails, where I'm seeing replies or direct sends sooner than
list copies]

> +/**
> + * @offset: position in bytes to write at
> + * @bytes: number of bytes to write
> + * @qiov: the buffers containing data to write
> + * @flags: zero or more of bits allowed by 'supported_write_flags'

maybe s/of //

> + *
> + * @offset and @bytes will be a multiple of 'request_alignment',
> + * but the length of individual @qiov elements does not have to
> + * be a multiple.
> + *
> + * @bytes will always equal the total size of @qiov, and will be
> + * no larger than 'max_transfer'.
> + *
> + * The buffer in @qiov may point directly to guest memory.
> + */
>  int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);

Do we make guarantees that the driver callback is never reached if the
image is currently read-only?  If so, is that a guarantee worth documenting?

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 1/7] block: move ThrottleGroup membership to ThrottleGroupMember

2017-08-04 Thread Alberto Garcia

On Mon 31 Jul 2017 11:54:37 AM CEST, Manos Pitsidianakis wrote:
> This commit eliminates the 1:1 relationship between BlockBackend and
> throttle group state.  Users will be able to create multiple throttle
> nodes, each with its own throttle group state, in the future.  The
> throttle group state cannot be per-BlockBackend anymore, it must be
> per-throttle node. This is done by gathering ThrottleGroup membership
> details from BlockBackendPublic into ThrottleGroupMember and refactoring
> existing code to use the structure.
>
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Manos Pitsidianakis 

Reviewed-by: Alberto Garcia 

Berto

Re: [Qemu-devel] [PATCH v4 9/9] s390x: refine pci dependencies

2017-08-04 Thread David Hildenbrand

On 04.08.2017 13:29, Cornelia Huck wrote:
> VIRTIO_PCI should properly depend on CONFIG_PCI.
> With this change, we can switch off pci for s390x by removing
> 'CONFIG_PCI=y' from the default config.
> 
> Reviewed-by: Thomas Huth 
> Signed-off-by: Cornelia Huck 
> ---
>  default-configs/s390x-softmmu.mak | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/default-configs/s390x-softmmu.mak 
> b/default-configs/s390x-softmmu.mak
> index b227a36179..7adaa4a4fe 100644
> --- a/default-configs/s390x-softmmu.mak
> +++ b/default-configs/s390x-softmmu.mak
> @@ -1,5 +1,5 @@
>  CONFIG_PCI=y
> -CONFIG_VIRTIO_PCI=y
> +CONFIG_VIRTIO_PCI=$(CONFIG_PCI)
>  CONFIG_VHOST_USER_SCSI=$(CONFIG_LINUX)
>  CONFIG_VIRTIO=y
>  CONFIG_SCLPCONSOLE=y
> 

Reviewed-by: David Hildenbrand 

-- 

Thanks,

David

Re: [Qemu-devel] [Qemu-block] [PATCH] qemu-img: Clarify about relative backing file options

2017-08-04 Thread Fam Zheng

On Fri, 08/04 09:23, Eric Blake wrote:
> > +If a relative path name is given, the backing file is looked up against
> > +@var{filename}, rather than the current working directory.
> 
> Maybe better as:
> the backing file is looked up relative to the directory containing
> @var{filename}

Sounds good, will update.

[Qemu-devel] [PATCH v4 0/9] s390x: zPCI detangling

2017-08-04 Thread Cornelia Huck

Next version, not so many changes from v3.

As you might have guessed, the goals are still the same:
- Being able to disable PCI support in a build completely.
- Properly fencing off PCI if the relevant facility bit is not provided.

Changes v3->v4:
- introduce pci_available boolean
- use pci_available to fence off setting the zcpi facility bit
- collected tags

Branch is still git://github.com/cohuck/qemu no-zpci-cpumodel

Cornelia Huck (9):
  kvm: remove hard dependency on pci
  s390x/pci: add stubs
  s390x: chsc nt2 events are pci-only
  s390x/pci: do not advertise pci on non-pci builds
  s390x/ccw: create s390 phb conditionally
  s390x/sclp: properly guard pci-specific functions
  s390x/pci: fence off instructions for non-pci
  s390x/kvm: msi route fixup for non-pci
  s390x: refine pci dependencies

 accel/kvm/kvm-all.c   |  6 ++--
 default-configs/s390x-softmmu.mak |  2 +-
 hw/pci/pci-stub.c | 13 +++
 hw/pci/pci.c  |  2 ++
 hw/s390x/Makefile.objs|  3 +-
 hw/s390x/s390-pci-bus.c   |  4 +--
 hw/s390x/s390-pci-bus.h   |  4 +--
 hw/s390x/s390-pci-stub.c  | 74 +++
 hw/s390x/s390-virtio-ccw.c| 12 ---
 hw/s390x/sclp.c   | 19 +++---
 include/hw/pci/pci.h  |  2 ++
 target/s390x/ioinst.c | 16 +
 target/s390x/kvm.c| 64 +
 13 files changed, 189 insertions(+), 32 deletions(-)
 create mode 100644 hw/s390x/s390-pci-stub.c

-- 
2.13.3

[Qemu-devel] [PATCH v6 0/3] Generate APEI GHES table and dynamically record CPER

2017-08-04 Thread Programmingkid


> On Aug 4, 2017, at 1:22 AM, qemu-devel-requ...@nongnu.org wrote:
> 
> Date: Fri, 4 Aug 2017 12:37:52 +0800
> From: Dongjiu Geng 
> To: , , ,
>   , ,
>   , 
> Cc: , ,
>   
> Subject: [Qemu-devel] [PATCH v6 0/3] Generate APEI GHES table and
>   dynamically record CPER
> Message-ID: <1501821475-14647-1-git-send-email-gengdong...@huawei.com>
> Content-Type: text/plain
> 
> In the armv8 platform, the mainly hardware error source are ARMv8
> SEA/SEI/GSIV. For the ARMv8 SEA/SEI, the KVM or host kernel will signal SIGBUS
> or use other interface to notify user space, such as Qemu. After Qemu gets
> the notification, it will record the CPER and inject the SEA/SEI to KVM. this
> series of patches will generate APEI table when guest OS boot up, and 
> dynamically
> record CPER for the guest OS about the generic hardware errors, currently the
> userspace only handle the memory section hardware errors. Before Qemu record 
> the
> CPER, it needs to check the ACK value written by the guest OS to avoid 
> read-write
> race condition.
> 
> Below is the APEI/GHESV2/CPER table layout, the max number of error soure is 
> 11,
> which is classified by notification type, now only enable the SEA/SEI 
> notification type
> error source.
> 
> etc/acpi/tables   etc/hardware_errors
>
> ==
>  +--+
> ++|address   |  
> +--+
> |HEST+|registers |  | 
> Error Status |
> + +--+| ++  | 
> Data Block 0 |
> | | GHES0| +->| |status_address0 |->| 
> ++
> +--+ |  | ++  | | 
>  CPER  |
> | | .| | +--->| |status_address1 |--+   | 
> |  CPER  |
> | | error_status_address | | || ++  |   | 
> |    |
> | | .| | ||  .   |  |   | 
> |  CPER  |
> | | error_status_address-+-+ |+--+  |   | 
> +-++
> | | .|   | +->| |status_address10|+ |   | 
> Error Status |
> | | read_ack_register+-+ | |  | ++| |   | 
> Data Block 1 |
> | | read_ack_preserve| +-+-+->| |ack_address0|--+ | +-->| 
> ++
> | | read_ack_write   |   | |  | ++  | | | 
> |  CPER  |
> + +--+   | | +--->| |ack_address1|--+-+   | | 
> |  CPER  |
> | | GHES1|   | | || ++  | |   | | 
> |    |
> + +--+   | | || | .  |  | |   | | 
> |  CPER  |
> | | .|   | | || ++  | |   | 
> +-++
> | | error_status_address-+---+ | | +->| |ack_address10   |--+-+-+ | | 
> |..  |
> | | .| | | |  | ++  | | | | | 
> ++
> | | read_ack_register+-+-+ |  | |  ack0  |<-+ | | | | 
> Error Status |
> | | read_ack_preserve| |   |  | ++| | | | 
> Data Block 10|
> | | read_ack_write   | |   |  | |  ack1  |<---+ | +>| 
> ++
> + +--+ |   |  | ++  |   | 
> |  CPER  |
> | | ...  | |   |  | |    |  |   | 
> |  CPER  |
> + +--+ |   |  | +--+ |  |   | 
> |    |
> | | GHES10   | |   |  | |  ack10 |< +   | 
> |  CPER  |
> + +--+ |   |  | ++  
> +-++
> | | .| |   |
> | | error_status_address-+-+   |
> | | .| |
> | | read_ack_register+-+
> | | read_ack_preserve|
> | | read_ack_write   |
> + +--+


Excellent job with the ASCII drawing.

[Qemu-devel] [PATCH v4 3/9] s390x: chsc nt2 events are pci-only

2017-08-04 Thread Cornelia Huck

The nt2 event class is pci-only - don't look for events if pci is
not in the active cpu model.

Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-bus.c  |  4 ++--
 hw/s390x/s390-pci-bus.h  |  4 ++--
 hw/s390x/s390-pci-stub.c |  4 ++--
 target/s390x/ioinst.c| 16 
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 61cfd2138f..c57f6ebae0 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -47,7 +47,7 @@ S390pciState *s390_get_phb(void)
 return phb;
 }
 
-int chsc_sei_nt2_get_event(void *res)
+int pci_chsc_sei_nt2_get_event(void *res)
 {
 ChscSeiNt2Res *nt2_res = (ChscSeiNt2Res *)res;
 PciCcdfAvail *accdf;
@@ -87,7 +87,7 @@ int chsc_sei_nt2_get_event(void *res)
 return rc;
 }
 
-int chsc_sei_nt2_have_event(void)
+int pci_chsc_sei_nt2_have_event(void)
 {
 S390pciState *s = s390_get_phb();
 
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index 67af2c12ff..5df6292509 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -319,8 +319,8 @@ typedef struct S390pciState {
 } S390pciState;
 
 S390pciState *s390_get_phb(void);
-int chsc_sei_nt2_get_event(void *res);
-int chsc_sei_nt2_have_event(void);
+int pci_chsc_sei_nt2_get_event(void *res);
+int pci_chsc_sei_nt2_have_event(void);
 void s390_pci_sclp_configure(SCCB *sccb);
 void s390_pci_sclp_deconfigure(SCCB *sccb);
 void s390_pci_iommu_enable(S390PCIIOMMU *iommu);
diff --git a/hw/s390x/s390-pci-stub.c b/hw/s390x/s390-pci-stub.c
index 2e7e42a2af..cc7278a865 100644
--- a/hw/s390x/s390-pci-stub.c
+++ b/hw/s390x/s390-pci-stub.c
@@ -7,12 +7,12 @@
 #include "s390-pci-bus.h"
 
 /* target/s390x/ioinst.c */
-int chsc_sei_nt2_get_event(void *res)
+int pci_chsc_sei_nt2_get_event(void *res)
 {
 return 1;
 }
 
-int chsc_sei_nt2_have_event(void)
+int pci_chsc_sei_nt2_have_event(void)
 {
 return 0;
 }
diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
index 51fbea620d..3fa3301f50 100644
--- a/target/s390x/ioinst.c
+++ b/target/s390x/ioinst.c
@@ -599,6 +599,22 @@ static int chsc_sei_nt0_have_event(void)
 return 0;
 }
 
+static int chsc_sei_nt2_get_event(void *res)
+{
+if (s390_has_feat(S390_FEAT_ZPCI)) {
+return pci_chsc_sei_nt2_get_event(res);
+}
+return 1;
+}
+
+static int chsc_sei_nt2_have_event(void)
+{
+if (s390_has_feat(S390_FEAT_ZPCI)) {
+return pci_chsc_sei_nt2_have_event();
+}
+return 0;
+}
+
 #define CHSC_SEI_NT0(1ULL << 63)
 #define CHSC_SEI_NT2(1ULL << 61)
 static void ioinst_handle_chsc_sei(ChscReq *req, ChscResp *res)
-- 
2.13.3

[Qemu-devel] [PATCH v2] qemu-img: Clarify about relative backing file options

2017-08-04 Thread Fam Zheng

It's not too surprising when a user specifies the backing file relative
to the current working directory instead of the top layer image. This
causes error when they differ. Though the error message has enough
information to infer the fact about the misunderstanding, it is better
if we document this explicitly, so that users don't have to learn from
mistakes.

Signed-off-by: Fam Zheng 

---
v2: Improve wording. [Eric]
---
 qemu-img.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/qemu-img.texi b/qemu-img.texi
index 72dabd6b3e..90c7eab4a8 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -244,6 +244,9 @@ only the differences from @var{backing_file}. No size needs 
to be specified in
 this case. @var{backing_file} will never be modified unless you use the
 @code{commit} monitor command (or qemu-img commit).
 
+If a relative path name is given, the backing file is looked up relative to
+the directory containing @var{filename}.
+
 Note that a given backing file will be opened to check that it is valid. Use
 the @code{-u} option to enable unsafe backing file mode, which means that the
 image will be created even if the associated backing file cannot be opened. A
@@ -343,6 +346,9 @@ created as a copy on write image of the specified base 
image; the
 @var{backing_file} should have the same content as the input's base image,
 however the path, image format, etc may differ.
 
+If a relative path name is given, the backing file is looked up relative to
+the directory containing @var{output_filename}.
+
 If the @code{-n} option is specified, the target volume creation will be
 skipped. This is useful for formats such as @code{rbd} if the target
 volume has already been created with site specific options that cannot
@@ -490,6 +496,9 @@ The backing file is changed to @var{backing_file} and (if 
the image format of
 string), then the image is rebased onto no backing file (i.e. it will exist
 independently of any backing file).
 
+If a relative path name is given, the backing file is looked up relative to
+the directory containing @var{filename}.
+
 @var{cache} specifies the cache mode to be used for @var{filename}, whereas
 @var{src_cache} specifies the cache mode for reading backing files.
 
-- 
2.13.3

Re: [Qemu-devel] [PATCHv2 3/4] scsi: clarify sense codes for LUN0 emulation

2017-08-04 Thread Paolo Bonzini

On 04/08/2017 10:36, Hannes Reinecke wrote:
> The LUN0 emulation is just that, an emulation for a non-existing
> LUN0. So we should be returning LUN_NOT_SUPPORTED for any request
> coming from any other LUN.
> And we should be aborting unhandled commands with INVALID OPCODE,
> not LUN NOT SUPPORTED.
> 
> Signed-off-by: Hannes Reinecke 
> ---
>  hw/scsi/scsi-bus.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index 8419c75..79a222f 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -583,6 +583,11 @@ static int32_t scsi_target_send_command(SCSIRequest 
> *req, uint8_t *buf)
>  {
>  SCSITargetReq *r = DO_UPCAST(SCSITargetReq, req, req);
>  
> +if (req->lun != 0) {
> +scsi_req_build_sense(req, SENSE_CODE(LUN_NOT_SUPPORTED));
> +scsi_req_complete(req, CHECK_CONDITION);
> +return 0;
> +}
>  switch (buf[0]) {
>  case REPORT_LUNS:
>  if (!scsi_target_emulate_report_luns(r)) {
> @@ -613,7 +618,7 @@ static int32_t scsi_target_send_command(SCSIRequest *req, 
> uint8_t *buf)
>  case TEST_UNIT_READY:
>  break;
>  default:
> -scsi_req_build_sense(req, SENSE_CODE(LUN_NOT_SUPPORTED));
> +scsi_req_build_sense(req, SENSE_CODE(INVALID_OPCODE));
>  scsi_req_complete(req, CHECK_CONDITION);
>  return 0;
>  illegal_request:
> 

I am queuing this one since it's an independent bugfix.

Paolo

Re: [Qemu-devel] Prohibit Windows from running in QEMU

2017-08-04 Thread Paolo Bonzini

On 04/08/2017 12:27, Peter Lieven wrote:
> Am 04.08.2017 um 12:23 schrieb Paolo Bonzini:
>> On 04/08/2017 11:58, Peter Lieven wrote:
>>> Am 29.10.2013 um 10:59 schrieb Paolo Bonzini:
 Il 29/10/2013 10:48, Peter Lieven ha scritto:
> Hi all,
>
> this question might seem a bit weird, but does anyone see a good way to
> avoid
> that Windows is able to boot inside qemu?
>
> We have defined several profiles for different operation systems and I 
> want
> to avoid that someone chooses Linux and then installs Windows within
> a VM. Reason is licensing.
 Patch QEMU to crash when Hyper-V extensions are enabled...
>>> Hi all,
>>>
>>> this is an old topic that has become important for me again recently.
>>> Now all Linux versions should be able to detect KVM even if Hyper-V is 
>>> enabled.
>>>
>>> But how do I detect from Qemu userspace that Hyper-V is enabled?
>> Maybe a better one: make KVM crash the guest if CR8 is nonzero on a
>> vmexit.  Linux doesn't use it, Windows should not survive long.
> 
> You mean the kvm kernel module? Or can I access this register also
> from Qemu on any call that is handled in userspace? It would be easier
> to have a cmdline option to Qemu than an option to a kernel module.

Yes, the kernel module.  Accessing it in QEMU requires
cpu_synchronize_state so it's slow.  However, you could piggyback on
some other functionality that is never used by Linux to do the check,
for example the RTC I/O port.

That is, in the RTC I/O port code you call cpu_synchronize_state and
check CR8.

Paolo

>> Warning, I don't know if UEFI firmware uses CR8.
> 
> UEFI firmware is not important in this case.
> Do you know if FreeBSD, OpenBSD or NetBSD use it?
> 
> 
> Thank for your ideas,
> Peter
>

[Qemu-devel] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Daniel P. Berrange

Signed-off-by: Daniel P. Berrange 
---
 include/block/block_int.h | 29 +
 1 file changed, 29 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index d4f4ea7584..deb81a58bd 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -147,12 +147,41 @@ struct BlockDriver {
 
 int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+
+/**
+ * @offset: position in bytes to read at
+ * @bytes: number of bytes to read
+ * @qiov: the buffers to fill with read data
+ *
+ * @offset and @bytes will be a multiple of 'request_alignment',
+ * but the length of individual @qiov elements does not have to
+ * be a multiple.
+ *
+ * @bytes may be less than the total sizeof @iov, and will be
+ * no larger than 'max_transfer'.
+ *
+ * The buffer in @qiov may point directly to guest memory.
+ */
 int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
 int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags);
+/**
+ * @offset: position in bytes to write at
+ * @bytes: number of bytes to write
+ * @qiov: the buffers containing data to write
+ *
+ * @offset and @bytes will be a multiple of 'request_alignment',
+ * but the length of individual @qiov elements does not have to
+ * be a multiple.
+ *
+ * @bytes may be less than the total sizeof @iov, and will be
+ * no larger than 'max_transfer'.
+ *
+ * The buffer in @qiov may point directly to guest memory.
+ */
 int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
 
-- 
2.13.3

Re: [Qemu-devel] Prohibit Windows from running in QEMU

2017-08-04 Thread Peter Lieven

Am 04.08.2017 um 12:43 schrieb Paolo Bonzini:
> On 04/08/2017 12:27, Peter Lieven wrote:
>> Am 04.08.2017 um 12:23 schrieb Paolo Bonzini:
>>> On 04/08/2017 11:58, Peter Lieven wrote:
 Am 29.10.2013 um 10:59 schrieb Paolo Bonzini:
> Il 29/10/2013 10:48, Peter Lieven ha scritto:
>> Hi all,
>>
>> this question might seem a bit weird, but does anyone see a good way to
>> avoid
>> that Windows is able to boot inside qemu?
>>
>> We have defined several profiles for different operation systems and I 
>> want
>> to avoid that someone chooses Linux and then installs Windows within
>> a VM. Reason is licensing.
> Patch QEMU to crash when Hyper-V extensions are enabled...
 Hi all,

 this is an old topic that has become important for me again recently.
 Now all Linux versions should be able to detect KVM even if Hyper-V is 
 enabled.

 But how do I detect from Qemu userspace that Hyper-V is enabled?
>>> Maybe a better one: make KVM crash the guest if CR8 is nonzero on a
>>> vmexit.  Linux doesn't use it, Windows should not survive long.
>> You mean the kvm kernel module? Or can I access this register also
>> from Qemu on any call that is handled in userspace? It would be easier
>> to have a cmdline option to Qemu than an option to a kernel module.
> Yes, the kernel module.  Accessing it in QEMU requires
> cpu_synchronize_state so it's slow.  However, you could piggyback on
> some other functionality that is never used by Linux to do the check,
> for example the RTC I/O port.
>
> That is, in the RTC I/O port code you call cpu_synchronize_state and
> check CR8.

Thanks, I will try that out and let you know.

Peter

[Qemu-devel] [PATCH v4 4/9] s390x/pci: do not advertise pci on non-pci builds

2017-08-04 Thread Cornelia Huck

Only set the zpci feature bit on builds that actually support pci.

Signed-off-by: Cornelia Huck 
---
 target/s390x/kvm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index c4c5791d27..bc62bba5b7 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -2662,7 +2662,9 @@ void kvm_s390_get_host_cpu_model(S390CPUModel *model, 
Error **errp)
 }
 
 /* We emulate a zPCI bus and AEN, therefore we don't need HW support */
-set_bit(S390_FEAT_ZPCI, model->features);
+if (pci_available) {
+set_bit(S390_FEAT_ZPCI, model->features);
+}
 set_bit(S390_FEAT_ADAPTER_EVENT_NOTIFICATION, model->features);
 
 if (s390_known_cpu_type(cpu_type)) {
-- 
2.13.3

Re: [Qemu-devel] [PULL 0/2] slirp updates

2017-08-04 Thread Peter Maydell

On 2 August 2017 at 23:28, Samuel Thibault  wrote:
> warning: redirection vers https://people.debian.org/~sthibault/qemu.git/
> The following changes since commit aaaec6acad7cf97372d48c1b09126a09697519c8:
>
>   Update version for v2.10.0-rc1 release (2017-08-02 16:36:32 +0100)
>
> are available in the git repository at:
>
>   http://people.debian.org/~sthibault/qemu.git tags/samuel-thibault
>
> for you to fetch changes up to 413d463f43fbc4dd3a601e80a5724aa384a265a0:
>
>   slirp: check len against dhcp options array end (2017-08-03 00:26:44 +0200)
>
> 
> slirp updates
>
> 
> Hervé Poussineau (1):
>   slirp: fill error when failing to initialize user network
>
> Prasad J Pandit (1):
>   slirp: check len against dhcp options array end
>
>  net/slirp.c   | 134 
> --
>  slirp/bootp.c |   3 ++
>  2 files changed, 97 insertions(+), 40 deletions(-)

Applied, thanks.

-- PMM

[Qemu-devel] [PATCH] block: move trace probes into bdrv_co_preadv|pwritev

2017-08-04 Thread Daniel P. Berrange

There are trace probes in bdrv_co_readv|writev, however, the
block drivers are being gradually moved over to using the
bdrv_co_preadv|pwritev functions instead. As a result some
block drivers miss the current probes. Move the probes
into bdrv_co_preadv|pwritev instead, so that they are triggered
by more (all?) I/O code paths.

Signed-off-by: Daniel P. Berrange 
---
 block/io.c | 8 
 block/trace-events | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index d9dc822173..26003814eb 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1135,6 +1135,8 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
 bool use_local_qiov = false;
 int ret;
 
+trace_bdrv_co_preadv(child->bs, offset, bytes, flags);
+
 if (!drv) {
 return -ENOMEDIUM;
 }
@@ -1207,8 +1209,6 @@ static int coroutine_fn bdrv_co_do_readv(BdrvChild *child,
 int coroutine_fn bdrv_co_readv(BdrvChild *child, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov)
 {
-trace_bdrv_co_readv(child->bs, sector_num, nb_sectors);
-
 return bdrv_co_do_readv(child, sector_num, nb_sectors, qiov, 0);
 }
 
@@ -1526,6 +1526,8 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
 bool use_local_qiov = false;
 int ret;
 
+trace_bdrv_co_pwritev(child->bs, offset, bytes, flags);
+
 if (!bs->drv) {
 return -ENOMEDIUM;
 }
@@ -1660,8 +1662,6 @@ static int coroutine_fn bdrv_co_do_writev(BdrvChild 
*child,
 int coroutine_fn bdrv_co_writev(BdrvChild *child, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov)
 {
-trace_bdrv_co_writev(child->bs, sector_num, nb_sectors);
-
 return bdrv_co_do_writev(child, sector_num, nb_sectors, qiov, 0);
 }
 
diff --git a/block/trace-events b/block/trace-events
index 071a8d77ba..25dd5a3026 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -9,8 +9,8 @@ blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int 
bytes, int flags
 blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags 0x%x"
 
 # block/io.c
-bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"PRId64" nb_sectors %d"
-bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"PRId64" nb_sectors %d"
+bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) 
"bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
+bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) 
"bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
 bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p 
offset %"PRId64" count %d flags 0x%x"
 bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t 
cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u 
cluster_offset %"PRId64" cluster_bytes %u"
 
-- 
2.13.3

Re: [Qemu-devel] [PATCH v4 4/9] s390x/pci: do not advertise pci on non-pci builds

2017-08-04 Thread David Hildenbrand

On 04.08.2017 13:29, Cornelia Huck wrote:
> Only set the zpci feature bit on builds that actually support pci.
> 
> Signed-off-by: Cornelia Huck 
> ---
>  target/s390x/kvm.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index c4c5791d27..bc62bba5b7 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -2662,7 +2662,9 @@ void kvm_s390_get_host_cpu_model(S390CPUModel *model, 
> Error **errp)
>  }
>  
>  /* We emulate a zPCI bus and AEN, therefore we don't need HW support */
> -set_bit(S390_FEAT_ZPCI, model->features);
> +if (pci_available) {
> +set_bit(S390_FEAT_ZPCI, model->features);
> +}
>  set_bit(S390_FEAT_ADAPTER_EVENT_NOTIFICATION, model->features);
>  
>  if (s390_known_cpu_type(cpu_type)) {
> 

Yup, looks good to me.

Reviewed-by: David Hildenbrand 

-- 

Thanks,

David

Re: [Qemu-devel] [PATCH for-2.10 2/5] block: Allow reopen rw without BDRV_O_ALLOW_RDWR

2017-08-04 Thread Eric Blake

On 08/04/2017 05:20 AM, Kevin Wolf wrote:
>>>
>>> Hmm, I wonder.  https://bugzilla.redhat.com/show_bug.cgi?id=1465320
>>> details a failure when starting qemu with a read-write NBD disk, then
>>> taking several snapshots (nbd <- snap1 <- snap2 <- snap3), then where
>>> intermediate commit (snap2 into nbd) works but live commit (snap3 into
>>> nbd) fails with a message that nbd does not support reopening.  I'm
>>> presuming that your series may help to address that; I'll give it a spin
>>> and see what happens.
>>
>> Nope, even with your patches, I'm still getting:
>>
>> {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar2'}}
>> {"return": {}}
>> {"timestamp": {"seconds": 1501811285, "microseconds": 439748}, "event":
>> "BLOCK_JOB_COMPLETED", "data": {"device": "drive-image1", "len":
>> 2097152, "offset": 2097152, "speed": 0, "type": "commit"}}
>>
>> {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar3'}}
>> {"error": {"class": "GenericError", "desc": "Block format 'nbd' used by
>> node '#block048' does not support reopening files"}}
> 
> That's simply NBD not implementing .bdrv_reopen_*. In other words, not a
> bug, but just a missing feature.

But why does intermediate commit work, while live commit fails?  Both
require that the image being committed into be read-write, and the NBD
image was opened read-write (even if it can be treated as read-only for
the duration of time that it is a backing image).  I understand missing
.bdrv_reopen making it impossible to change read-only to read-write, but
when missing it means you can never change read-write down to the
tighter read-only originally, then what is making live commit different
from intermediate in not being able to handle it?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PULL] virtio: fix for rc2

2017-08-04 Thread Peter Maydell

On 3 August 2017 at 15:07, Michael S. Tsirkin  wrote:
> The following changes since commit aaaec6acad7cf97372d48c1b09126a09697519c8:
>
>   Update version for v2.10.0-rc1 release (2017-08-02 16:36:32 +0100)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to e6a74868d92f858ac33915b6772999d7de2fd288:
>
>   build-sys: add --disable-vhost-user (2017-08-03 15:55:41 +0300)
>
> 
> virtio: fix for rc2
>
> Looks like the constant stream of additions of vhost-user devices is a
> problem for some people who are concerned about external connections
> from qemu. A per-device flag seems like an overkill, but a single
> configure flag seems like a sane way to support that, and it looks like
> we need to do it before the release.
>
> Signed-off-by: Michael S. Tsirkin 
>
> 
> Marc-André Lureau (1):
>   build-sys: add --disable-vhost-user

Applied, thanks.

-- PMM

[Qemu-devel] [PATCH for 2.10] block: use 1 MB bounce buffers for crypto instead of 16KB

2017-08-04 Thread Daniel P. Berrange

Using 16KB bounce buffers creates a significant performance
penalty for I/O to encrypted volumes on storage with high
I/O latency (rotating rust & network drives), because it
triggers lots of fairly small I/O operations.

On tests with rotating rust, and cache=none|directsync,
write speed increased from 2MiB/s to 32MiB/s, on a par
with that achieved by the in-kernel luks driver.

Signed-off-by: Daniel P. Berrange 
---
 block/crypto.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 58ef6f2f52..207941db9a 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -379,7 +379,7 @@ static void block_crypto_close(BlockDriverState *bs)
 }
 
 
-#define BLOCK_CRYPTO_MAX_SECTORS 32
+#define BLOCK_CRYPTO_MAX_SECTORS 2048
 
 static coroutine_fn int
 block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
@@ -396,9 +396,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t 
sector_num,
 
 qemu_iovec_init(&hd_qiov, qiov->niov);
 
-/* Bounce buffer so we have a linear mem region for
- * entire sector. XXX optimize so we avoid bounce
- * buffer in case that qiov->niov == 1
+/* Bounce buffer because we're not permitted to touch
+ * contents of qiov - it points to guest memory.
  */
 cipher_data =
 qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
@@ -464,9 +463,8 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t 
sector_num,
 
 qemu_iovec_init(&hd_qiov, qiov->niov);
 
-/* Bounce buffer so we have a linear mem region for
- * entire sector. XXX optimize so we avoid bounce
- * buffer in case that qiov->niov == 1
+/* Bounce buffer because we're not permitted to touch
+ * contents of qiov - it points to guest memory.
  */
 cipher_data =
 qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
-- 
2.13.3

[Qemu-devel] [PATCH for 2.11] vga/migration: Update memory map in post_load

2017-08-04 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

After migration the chain4 alias mapping added by 80763888 (in 2011)
might be missing, since there's no call to vga_update_memory_access
in the post_load after the registers are updated.  Add it back.

Signed-off-by: Dr. David Alan Gilbert 
---
 hw/display/vga.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/display/vga.c b/hw/display/vga.c
index 63421f9ee8..5d3848a37e 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2044,6 +2044,7 @@ static int vga_common_post_load(void *opaque, int 
version_id)
 /* force refresh */
 s->graphic_mode = -1;
 vbe_update_vgaregs(s);
+vga_update_memory_access(s);
 return 0;
 }
 
-- 
2.13.3

Re: [Qemu-devel] Is the use of bdrv_getlength() in quorum_co_flush() kosher?

2017-08-04 Thread Alberto Garcia

On Fri 04 Aug 2017 02:48:03 PM CEST, Markus Armbruster wrote:
> Have a look at quorum_co_flush():
>
> quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
>   bdrv_getlength(s->children[i]->bs),
>   s->children[i]->bs->node_name, result);
>
> bdrv_getlength() can fail.  Does it do the right thing then?

If it fails then it returns -errno, but then quorum_report_bad() turns
into uint64_t and assumes it's valid.

Since that number is then rounded up to the next multiple of
BDRV_SECTOR_SIZE in order to calculate end_sector, I think that what
happens in practice is that the user sees a QUORUM_REPORT_BAD event with
sectors-count = 0 (in most cases) or with a very high value in
sectors-count (if errno > BDRV_SECTOR_SIZE).

The result of bdrv_getlength() is only used to report the number of
affected sectors in the QUORUM_REPORT_BAD event, so there are no other
consequences.

Anyway I think it's a good idea not to make assumptions, detect the
error and pass 0 instead.

--- a/block/quorum.c
+++ b/block/quorum.c
@@ -785,8 +785,9 @@ static coroutine_fn int
quorum_co_flush(BlockDriverState *bs)
 for (i = 0; i < s->num_children; i++) {
 result = bdrv_co_flush(s->children[i]->bs);
 if (result) {
+int64_t length = bdrv_getlength(s->children[i]->bs);
 quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
-  bdrv_getlength(s->children[i]->bs),
+  length > 0 ? length : 0,
   s->children[i]->bs->node_name, result);
 result_value.l = result;
 quorum_count_vote(&error_votes, &result_value, i);

Berto

Re: [Qemu-devel] [PATCH v4 00/22] Clean up around qmp() and hmp()

2017-08-04 Thread Eric Blake

On 08/03/2017 08:54 PM, no-re...@patchew.org wrote:
> Hi,
> 
> This series failed automatic build test. Please find the testing commands and
> their output below. If you have docker installed, you can probably reproduce 
> it
> locally.

When will patchew have a syntax for stating dependencies? (Of course, I
should actually mention those dependencies in my cover letter, not after
the fact).

> /tmp/qemu-test/src/tests/test-qga.c: In function ‘test_qga_set_time’:
> /tmp/qemu-test/src/tests/test-qga.c:652: warning: implicit declaration of 
> function ‘qmp_fd’

This patch series depends on my earlier cleanup:
https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg00385.html
[PATCH] test-qga: Kill broken and dead QGA_TEST_SIDE_EFFECT

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/3] i386/kvm: use a switch statement for MSR detection

2017-08-04 Thread David Hildenbrand

On 04.08.2017 11:14, Ladi Prosek wrote:
> Switch is easier on the eye and might lead to better codegen.
> 
> Signed-off-by: Ladi Prosek 


Reviewed-by: David Hildenbrand 

-- 

Thanks,

David

Re: [Qemu-devel] [PATCH 3/3] i386/kvm: advertise Hyper-V frequency MSRs

2017-08-04 Thread David Hildenbrand

On 04.08.2017 11:14, Ladi Prosek wrote:
> As of kernel commit eb82feea59d6 ("KVM: hyperv: support 
> HV_X64_MSR_TSC_FREQUENCY
> and HV_X64_MSR_APIC_FREQUENCY"), KVM supports two new MSRs which are required
> for nested Hyper-V to read timestamps with RDTSC + TSC page.
> 
> This commit makes QEMU advertise the MSRs with CPUID.4003H:EAX[11] and
> CPUID.4003H:EDX[8] as specified in the Hyper-V TLFS and experimentally
> verified on a Hyper-V host. The feature is enabled with the existing hv-time 
> CPU
> flag, and only if the TSC frequency is stable across migration and known.
> 
> Signed-off-by: Ladi Prosek 
> ---
>  target/i386/kvm.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 77b6373..7e484a7 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -89,6 +89,7 @@ static bool has_msr_hv_vpindex;
>  static bool has_msr_hv_runtime;
>  static bool has_msr_hv_synic;
>  static bool has_msr_hv_stimer;
> +static bool has_msr_hv_frequencies;
>  static bool has_msr_xss;
>  
>  static bool has_msr_architectural_pmu;
> @@ -631,7 +632,17 @@ static int hyperv_handle_properties(CPUState *cs)
>  if (cpu->hyperv_time) {
>  env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_HYPERCALL_AVAILABLE;
>  env->features[FEAT_HYPERV_EAX] |= 
> HV_X64_MSR_TIME_REF_COUNT_AVAILABLE;
> -env->features[FEAT_HYPERV_EAX] |= 0x200;
> +env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_REFERENCE_TSC_AVAILABLE;
> +if (has_msr_hv_frequencies> +/* TSC clock must be stable 
> and known for this feature. */
> +&& ((env->features[FEAT_8000_0007_EDX] & CPUID_APM_INVTSC)
> +|| env->user_tsc_khz != 0)
> +&& env->tsc_khz != 0) {

I'd drop the != 0 in both cases and move the env->tsc_khz check up to
has_msr_hv_frequencies.

if (has_msr_hv_frequencies && env->tsc_khz && ...

Wonder if it even would make sense to move some parts of this check into
a helper function, to beautify this a bit. tsc_stable(env)
tsc_known(env) ...

> +
> +env->features[FEAT_HYPERV_EAX] |= HV_X64_ACCESS_FREQUENCY_MSRS;
> +env->features[FEAT_HYPERV_EDX] |=
> +HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
> +}
>  }
>  if (cpu->hyperv_crash && has_msr_hv_crash) {
>  env->features[FEAT_HYPERV_EDX] |= HV_X64_GUEST_CRASH_MSR_AVAILABLE;
> @@ -1127,6 +1138,9 @@ static int kvm_get_supported_msrs(KVMState *s)
>  case HV_X64_MSR_STIMER0_CONFIG:
>  has_msr_hv_stimer = true;
>  break;
> +case HV_X64_MSR_TSC_FREQUENCY:
> +has_msr_hv_frequencies = true;
> +break;
>  }
>  
>  }
> 


-- 

Thanks,

David

[Qemu-devel] Is the use of bdrv_getlength() in handle_aiocb_write_zeroes() kosher?

2017-08-04 Thread Markus Armbruster

Denis, you added this in commit d50d822:

#ifdef CONFIG_FALLOCATE
if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
if (ret == 0 || ret != -ENOTSUP) {
return ret;
}
s->has_fallocate = false;
}
#endif

bdrv_getlength() can fail.  Does it do the right thing then?  For what
it's worth, the comparison of its value is signed.

1 2 3 >

1 - 100 of 231 matches

Mail list logo