Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote: [...] > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > > +{ > > +/* > > + * This means source VM is ready to resume the postcopy migration. > > + * It's time to switch state and release the fault thread to > > + * continue service page faults. > > + */ > > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER, > > + MIGRATION_STATUS_POSTCOPY_ACTIVE); > > +qemu_sem_post(&mis->postcopy_pause_sem_fault); > > Is it worth sanity checking that you were in RECOVER at this point? Yeah, it never hurts. Will do. -- Peter Xu
Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote: > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote: > > [...] > > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > > > +{ > > > +/* > > > + * This means source VM is ready to resume the postcopy migration. > > > + * It's time to switch state and release the fault thread to > > > + * continue service page faults. > > > + */ > > > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER, > > > + MIGRATION_STATUS_POSTCOPY_ACTIVE); > > > +qemu_sem_post(&mis->postcopy_pause_sem_fault); > > > > Is it worth sanity checking that you were in RECOVER at this point? > > Yeah, it never hurts. Will do. Not sure whether this would be good (note: I returned 0 in the if): diff --git a/migration/savevm.c b/migration/savevm.c index b7843c2..b34f59b 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1709,6 +1709,12 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) { +if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { +error_report("%s: illegal resume received", __func__); +/* Don't fail the load, only for this. */ +return 0; +} + /* * This means source VM is ready to resume the postcopy migration. * It's time to switch state and release the fault thread to Basically I just don't want to crash the dest VM (it holds hot dirty pages) even if it receives a faulty RESUME command. -- Peter Xu
Re: [Qemu-devel] [PATCH for-2.11 0/8] tcg/s390 improvments
On Thu, 08/03 22:50, no-re...@patchew.org wrote: > /var/tmp/patchew-tester-tmp-_vw086tj/src/accel/tcg/translate-all.c:2218:1: > fatal error: error writing to /tmp/cctUT4t6.s: No space left on device Sorry for the false positive. I've cleaned up the disk space and restarted the test. Fam
Re: [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK
On Thu, Aug 03, 2017 at 12:21:41PM +0100, Dr. David Alan Gilbert wrote: [...] > > +static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value) > > +{ > > +trace_source_return_path_thread_resume_ack(value); > > + > > +/* > > + * Currently value will always be one. It can be used in the > > + * future to notify source that destination cannot continue. > > + */ > > +assert(value == 1); > > Again I prefer the routine to fail than to assert. > Maybe it's worth having a constant rather than the magic 1. Will do. Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH 0/2] scsi: enclosure support
On 08/04/2017 08:10 AM, Paolo Bonzini wrote: > >> On 08/03/2017 05:10 PM, Paolo Bonzini wrote: >>> On 03/08/2017 15:26, Hannes Reinecke wrote: Hi all, due to a customer issue I've added simple subenclosure support to the SCSI emulation. The patch simply converts the current invisible LUN0 into an enclosure device; existing setups using LUN0 as disks or CD-ROMs will not be affected. >>> >>> What is the issue exactly? That is, for what is it necessary to have a >>> dummy enclosure? >>> >> Well, stock linux displays some very interesting error messages for >> these types of enclosures. Which was the prime mover for doing this. > > --verbose? > [ 12.958454] scsi 1:0:0:254: Wrong diagnostic page; asked for 2 got 0 [ 12.958456] scsi 1:0:0:254: Failed to get diagnostic page 0xffea [ 12.958457] scsi 1:0:0:254: Failed to bind enclosure -19 [ 12.959392] ses 1:0:0:254: Attached Enclosure device >>> I agree with Dan that this need machine type compatibility gunk. For >>> example, could the new device affect /dev/sgN numbering? >> >> Yes, indeed it would. >> >> What about a new option to the scsi driver? > > If you do that, you've done 99% of the work to do compatibility so I > won't complain and do the 1% myself. :) > Okay, will be doing so. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare
On Thu, Aug 03, 2017 at 12:38:01PM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > This is hook function to be called when a postcopy migration wants to > > resume from a failure. For each module, it should provide its own > > recovery logic before we switch to the postcopy-active state. > > Would a change-state handler be able to do this, We don't have such a change-state handler, do we? > or perhaps > the notifier chain I have in my shared memory world: > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06459.html The postcopy_notify() can do this as well. I just need a way to hook up all the system modules for migration. In our case, it's only RAM, but I think maybe one day we need block support. So as long as the mechanism (either current SaveVMHandlers interface, or postcopy_notify) can do the notification, IMHO it'll be fine. -- Peter Xu
Re: [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume
On Thu, Aug 03, 2017 at 12:56:31PM +0100, Dr. David Alan Gilbert wrote: [...] > > @@ -256,6 +257,8 @@ struct RAMState { > > RAMBlock *last_req_rb; > > /* Queue of outstanding page requests from the destination */ > > QemuMutex src_page_req_mutex; > > +/* Ramblock counts to sync dirty bitmap. Only used for recovery */ > > +int ramblock_to_sync; > > QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests; > > }; > > typedef struct RAMState RAMState; > > @@ -2731,6 +2734,57 @@ static int ram_load(QEMUFile *f, void *opaque, int > > version_id) > > return ret; > > } > > > > +/* Sync all the dirty bitmap with destination VM. */ > > +static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) > > +{ > > +RAMBlock *block; > > +QEMUFile *file = s->to_dst_file; > > +int ramblock_count = 0; > > + > > +trace_ram_dirty_bitmap_sync("start"); > > Most (but not all) of our trace_ uses have separate trace_ entries for > each step; e.g. trace_ram_dirty_bitmap_sync_start9) Okay. > > > +/* > > + * We need to take the resume lock to make sure that the send > > + * thread (current thread) and the rp-thread will do their work in > > + * order. > > + */ > > +qemu_mutex_lock(&s->resume_lock); > > + > > +/* Request for receive-bitmap for each block */ > > +RAMBLOCK_FOREACH(block) { > > +ramblock_count++; > > +qemu_savevm_send_recv_bitmap(file, block->idstr); > > +} > > + > > +/* Init the ramblock count to total */ > > +atomic_set(&rs->ramblock_to_sync, ramblock_count); > > + > > +trace_ram_dirty_bitmap_sync("wait-bitmap"); > > + > > +/* Wait until all the ramblocks' dirty bitmap synced */ > > +while (rs->ramblock_to_sync) { > > +qemu_cond_wait(&s->resume_cond, &s->resume_lock); > > +} > > Does the locking here get simpler if you: > a) count the number of RAMBlocks 'n' > b) Initialise a sempahore to -(n-1) > c) Call qemu_savevm_send_recv_bitmap for each bitmap > d) sem_wait on the semaphore - which is waiting for the semaphore to > be >0 > > as you receive each bitmap do a sem_post; on the last one > it should go from 0->1 and the sem_wait should wake up? I think you are right. :-) A single semaphore suffice here (and also for the following up handshake). I will touch up the commit message as well. Thanks, -- Peter Xu
[Qemu-devel] [Bug 1336794] Re: 9pfs does not honor open file handles on unlinked files
Hi Maxim, No I didn't... -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1336794 Title: 9pfs does not honor open file handles on unlinked files Status in QEMU: In Progress Status in Ubuntu: Confirmed Bug description: This was originally filed over here: https://bugzilla.redhat.com/show_bug.cgi?id=1114221 The open-unlink-fstat idiom used in some places to create an anonymous private temporary file does not work in a QEMU guest over a virtio-9p filesystem. Version-Release number of selected component (if applicable): qemu-kvm-1.6.2-6.fc20.x86_64 qemu-system-x86-1.6.2-6.fc20.x86_64 (those are fedora RPMs) How reproducible: Always. See this example C program: https://bugzilla.redhat.com/attachment.cgi?id=913069 Steps to Reproduce: 1. Export a filesystem with virt-manager for the guest. (type: mount, driver: default, mode: passthrough) 2. Start guest and mount that filesystem (mount -t 9p -o trans=virtio,version=9p2000.L ...) 3. Run a program that uses open-unlink-fstat (in my case it was trying to compile Perl 5.20) Actual results: fstat fails: open("/home/tst/filename", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 unlink("/home/tst/filename")= 0 fstat(3, 0x23aa1a8) = -1 ENOENT (No such file or directory) close(3) Expected results: open("/home/tst/filename", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 unlink("/home/tst/filename")= 0 fstat(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 close(3) Additional info: There was a patch put into the kernel back in '07 to handle this very problem for other filesystems; maybe its helpful: http://lwn.net/Articles/251228/ There is also a thread on LKML from last December specifically about this very problem: https://lkml.org/lkml/2013/12/31/163 There was a discussion on the QEMU list back in '11 that doesn't seem to have come to a conclusion, but did provide the test program that i've attached to this report: http://marc.info/?l=qemu-devel&m=130443605720648&w=2 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1336794/+subscriptions
Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
* Peter Xu (pet...@redhat.com) wrote: > On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote: > > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote: > > > > [...] > > > > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > > > > +{ > > > > +/* > > > > + * This means source VM is ready to resume the postcopy migration. > > > > + * It's time to switch state and release the fault thread to > > > > + * continue service page faults. > > > > + */ > > > > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER, > > > > + MIGRATION_STATUS_POSTCOPY_ACTIVE); > > > > +qemu_sem_post(&mis->postcopy_pause_sem_fault); > > > > > > Is it worth sanity checking that you were in RECOVER at this point? > > > > Yeah, it never hurts. Will do. > > Not sure whether this would be good (note: I returned 0 in the if): > > diff --git a/migration/savevm.c b/migration/savevm.c > index b7843c2..b34f59b 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1709,6 +1709,12 @@ static int > loadvm_postcopy_handle_run(MigrationIncomingState *mis) > > static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > { > +if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { > +error_report("%s: illegal resume received", __func__); > +/* Don't fail the load, only for this. */ > +return 0; > +} > + > /* > * This means source VM is ready to resume the postcopy migration. > * It's time to switch state and release the fault thread to > > Basically I just don't want to crash the dest VM (it holds hot dirty > pages) even if it receives a faulty RESUME command. Yes, so now that's a fun problem; effectively you then have 3 valid failure modes: a) An IO failure so we need to go into POSTCOPY_PAUSE b) A fatal migration stream problem to quit c) A non-fatal migration stream problem to go .. back into PAUSE? Dave > -- > Peter Xu -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] [PATCHv2 1/4] scsi: Make LUN 0 a simple enclosure
Instead of having an 'invisible' LUN0 (in case LUN 0 is not connected) this patch maks LUN0 a enclosure service, exposing it to the OS. Signed-off-by: Hannes Reinecke --- hw/scsi/scsi-bus.c | 56 +- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index 23c51de..c89e82d 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -493,10 +493,11 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq *r) if (r->req.lun != 0) { r->buf[0] = TYPE_NO_LUN; } else { -r->buf[0] = TYPE_NOT_PRESENT | TYPE_INACTIVE; +r->buf[0] = TYPE_ENCLOSURE; r->buf[2] = 5; /* Version */ r->buf[3] = 2 | 0x10; /* HiSup, response data format */ r->buf[4] = r->len - 5; /* Additional Length = (Len - 1) - 4 */ +r->buf[6] = 0x40; /* Enclosure service */ r->buf[7] = 0x10 | (r->req.bus->info->tcq ? 0x02 : 0); /* Sync, TCQ. */ memcpy(&r->buf[8], "QEMU", 8); memcpy(&r->buf[16], "QEMU TARGET ", 16); @@ -505,6 +506,54 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq *r) return true; } +static bool scsi_target_emulate_receive_diagnostic(SCSITargetReq *r) +{ +uint8_t page_code = r->req.cmd.buf[2]; +unsigned char *enc_desc, *type_desc; + +assert(r->req.dev->lun != r->req.lun); + +scsi_target_alloc_buf(&r->req, 0x38); + +switch (page_code) { +case 0x00: +r->buf[r->len++] = page_code ; /* this page */ +r->buf[r->len++] = 0x00; +r->buf[r->len++] = 0x00; +r->buf[r->len++] = 0x03; +r->buf[r->len++] = 0x00; +r->buf[r->len++] = 0x01; +r->buf[r->len++] = 0x08; +break; +case 0x01: +memset(r->buf, 0, 0x38); +r->buf[0] = page_code; +r->buf[3] = 0x30; +enc_desc = &r->buf[8]; +enc_desc[0] = 0x09; +enc_desc[2] = 1; +enc_desc[3] = 0x24; +memcpy(&enc_desc[12], "QEMU", 8); +memcpy(&enc_desc[20], "QEMU TARGET ", 16); +pstrcpy((char *)&enc_desc[36], 4, qemu_hw_version()); +type_desc = &r->buf[48]; +type_desc[1] = 1; +r->len = 0x38; +break; +case 0x08: +r->buf[0] = page_code; +r->buf[1] = 0x00; +r->buf[2] = 0x00; +r->buf[3] = 0x00; +r->len = 4; +break; +default: +return false; +} +r->len = MIN(r->req.cmd.xfer, r->len); +return true; +} + static size_t scsi_sense_len(SCSIRequest *req) { if (req->dev->type == TYPE_SCANNER) @@ -528,6 +577,11 @@ static int32_t scsi_target_send_command(SCSIRequest *req, uint8_t *buf) goto illegal_request; } break; +case RECEIVE_DIAGNOSTIC: +if (!scsi_target_emulate_receive_diagnostic(r)) { +goto illegal_request; +} +break; case REQUEST_SENSE: scsi_target_alloc_buf(&r->req, scsi_sense_len(req)); r->len = scsi_device_get_sense(r->req.dev, r->buf, -- 1.8.5.6
[Qemu-devel] [PATCHv2 2/4] scsi: use qemu_uuid to generate logical identifier for SES
The SES enclosure descriptor requires a logical identifier, so generate one using the qemu_uuid and the Qumranet OUI. Signed-off-by: Hannes Reinecke --- hw/scsi/scsi-bus.c | 17 + 1 file changed, 17 insertions(+) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index c89e82d..8419c75 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -10,6 +10,7 @@ #include "trace.h" #include "sysemu/dma.h" #include "qemu/cutils.h" +#include "qemu/crc32c.h" static char *scsibus_get_dev_path(DeviceState *dev); static char *scsibus_get_fw_dev_path(DeviceState *dev); @@ -533,6 +534,22 @@ static bool scsi_target_emulate_receive_diagnostic(SCSITargetReq *r) enc_desc[0] = 0x09; enc_desc[2] = 1; enc_desc[3] = 0x24; +if (qemu_uuid_set) { +uint32_t crc; + +/* + * Make this a NAA IEEE Registered identifier + * using Qumranet OUI (0x001A4A) and the + * crc32 from the system UUID. + */ +enc_desc[4] = 0x50; +enc_desc[5] = 0x01; +enc_desc[6] = 0xa4; +enc_desc[7] = 0xa0; +crc = crc32c(0x, qemu_uuid.data, 16); +cpu_to_le32s(&crc); +memcpy(&enc_desc[8], &crc, 4); +} memcpy(&enc_desc[12], "QEMU", 8); memcpy(&enc_desc[20], "QEMU TARGET ", 16); pstrcpy((char *)&enc_desc[36], 4, qemu_hw_version()); -- 1.8.5.6
[Qemu-devel] [PATCHv2 0/4] scsi: enclosure support
Hi all, due to a customer issue I've added simple subenclosure support to the SCSI emulation. By setting the 'enclosure' option to a SCSI device we will now present an enclosure device on LUN0 (if LUN0 is otherwise unassigned) or setting the 'EncServ' bit in the inquiry data if LUN0 is assigned to a device. Changes to v1: - Add patch to clarify sense code responses - Add 'enclosure' option for SCSI devices Hannes Reinecke (4): scsi: Make LUN 0 a simple enclosure scsi: use qemu_uuid to generate logical identifier for SES scsi: clarify sense codes for LUN0 emulation scsi: Add 'enclosure' option for scsi devices hw/scsi/scsi-bus.c | 85 -- hw/scsi/scsi-disk.c| 4 ++- include/hw/scsi/scsi.h | 1 + 3 files changed, 87 insertions(+), 3 deletions(-) -- 1.8.5.6
[Qemu-devel] [PATCHv2 4/4] scsi: Add 'enclosure' option for scsi devices
Make enclosure support optional with the 'enclosure' argument to the scsi device. Adding 'enclosure=on' as option to the SCSI device will present an enclosure service device on LUN0, either as a stand-alone LUN (in case LUN0 is not assigned) or by setting the EncServ bit int the inquiry data if LUN0 is assigned to a block devices. Signed-off-by: Hannes Reinecke --- hw/scsi/scsi-bus.c | 11 --- hw/scsi/scsi-disk.c| 4 +++- include/hw/scsi/scsi.h | 1 + 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index 79a222f..c11422b 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -22,6 +22,7 @@ static Property scsi_props[] = { DEFINE_PROP_UINT32("channel", SCSIDevice, channel, 0), DEFINE_PROP_UINT32("scsi-id", SCSIDevice, id, -1), DEFINE_PROP_UINT32("lun", SCSIDevice, lun, -1), +DEFINE_PROP_BOOL("enclosure", SCSIDevice, enclosure, false), DEFINE_PROP_END_OF_LIST(), }; @@ -494,11 +495,14 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq *r) if (r->req.lun != 0) { r->buf[0] = TYPE_NO_LUN; } else { -r->buf[0] = TYPE_ENCLOSURE; +r->buf[0] = r->req.dev->enclosure ? +TYPE_ENCLOSURE : TYPE_NOT_PRESENT | TYPE_INACTIVE; r->buf[2] = 5; /* Version */ r->buf[3] = 2 | 0x10; /* HiSup, response data format */ r->buf[4] = r->len - 5; /* Additional Length = (Len - 1) - 4 */ -r->buf[6] = 0x40; /* Enclosure service */ +if (r->req.dev->enclosure) { +r->buf[6] = 0x40; /* Enclosure service */ +} r->buf[7] = 0x10 | (r->req.bus->info->tcq ? 0x02 : 0); /* Sync, TCQ. */ memcpy(&r->buf[8], "QEMU", 8); memcpy(&r->buf[16], "QEMU TARGET ", 16); @@ -600,7 +604,8 @@ static int32_t scsi_target_send_command(SCSIRequest *req, uint8_t *buf) } break; case RECEIVE_DIAGNOSTIC: -if (!scsi_target_emulate_receive_diagnostic(r)) { +if (!r->req.dev->enclosure || +!scsi_target_emulate_receive_diagnostic(r)) { goto illegal_request; } break; diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 5f1e5e8..153d97d 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -792,7 +792,9 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf) the additional length is not adjusted */ outbuf[4] = 36 - 5; } - +if (s->qdev.lun == 0 && s->qdev.enclosure) { +outbuf[6] = 0x40; /* Enclosure service */ +} /* Sync data transfer and TCQ. */ outbuf[7] = 0x10 | (req->bus->info->tcq ? 0x02 : 0); return buflen; diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index 6b85786..243c185 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -100,6 +100,7 @@ struct SCSIDevice BlockConf conf; SCSISense unit_attention; bool sense_is_ua; +bool enclosure; uint8_t sense[SCSI_SENSE_BUF_SIZE]; uint32_t sense_len; QTAILQ_HEAD(, SCSIRequest) requests; -- 1.8.5.6
[Qemu-devel] [PATCHv2 3/4] scsi: clarify sense codes for LUN0 emulation
The LUN0 emulation is just that, an emulation for a non-existing LUN0. So we should be returning LUN_NOT_SUPPORTED for any request coming from any other LUN. And we should be aborting unhandled commands with INVALID OPCODE, not LUN NOT SUPPORTED. Signed-off-by: Hannes Reinecke --- hw/scsi/scsi-bus.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index 8419c75..79a222f 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -583,6 +583,11 @@ static int32_t scsi_target_send_command(SCSIRequest *req, uint8_t *buf) { SCSITargetReq *r = DO_UPCAST(SCSITargetReq, req, req); +if (req->lun != 0) { +scsi_req_build_sense(req, SENSE_CODE(LUN_NOT_SUPPORTED)); +scsi_req_complete(req, CHECK_CONDITION); +return 0; +} switch (buf[0]) { case REPORT_LUNS: if (!scsi_target_emulate_report_luns(r)) { @@ -613,7 +618,7 @@ static int32_t scsi_target_send_command(SCSIRequest *req, uint8_t *buf) case TEST_UNIT_READY: break; default: -scsi_req_build_sense(req, SENSE_CODE(LUN_NOT_SUPPORTED)); +scsi_req_build_sense(req, SENSE_CODE(INVALID_OPCODE)); scsi_req_complete(req, CHECK_CONDITION); return 0; illegal_request: -- 1.8.5.6
[Qemu-devel] [Bug 1708617] [NEW] qemu2.9 meet a question using reconnect about ovs+dpdk
Public bug reported: env: qemu-2.9 dpdk-16.11 ovs-2.6.1 host os:Linux compute 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux guest os is same with host 1. ovs service is normal 2. start 2 qemu vm, using vhostuser port as server taskset 0x3f qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0,server -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6 -nographic -vnc :0 qemu-system-x86_64: -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/dpdkvhostuser0,server: QEMU waiting for connection on: disconnected:unix:/usr/local/var/run/openvswitch/dpdkvhostuser0,server taskset 0x3f qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0,server -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6 -nographic -vnc :0 qemu-system-x86_64: -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/dpdkvhostuser0,server: QEMU waiting for connection on: disconnected:unix:/usr/local/var/run/openvswitch/dpdkvhostuser1,server 3. add br and vhostuser port as client ovs-vsctl add-br br1 -- set bridge br1 datapath_type=netdev ovs-vsctl add-port br1 dpdkvhostuser0 -- set Interface dpdkvhostuser0 type=dpdkvhostuserclient options:vhost-server-path="/usr/local/var/run/openvswitch/dpdkvhostuser0" ovs-vsctl add-port br1 dpdkvhostuser1 -- set Interface dpdkvhostuser1 type=dpdkvhostuserclient options:vhost-server-path="/usr/local/var/run/openvswitch/dpdkvhostuser1" 4. ping between 2 vm is ok 5. restart ovs process systemctl restart openvswitch 6. there is a question ping between 2 vm error 7. change qemu from 2.9 to 2.7, everything become ok ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1708617 Title: qemu2.9 meet a question using reconnect about ovs+dpdk Status in QEMU: New Bug description: env: qemu-2.9 dpdk-16.11 ovs-2.6.1 host os:Linux compute 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux guest os is same with host 1. ovs service is normal 2. start 2 qemu vm, using vhostuser port as server taskset 0x3f qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0,server -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6 -nographic -vnc :0 qemu-system-x86_64: -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/dpdkvhostuser0,server: QEMU waiting for connection on: disconnected:unix:/usr/local/var/run/openvswitch/dpdkvhostuser0,server taskset 0x3f qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0,server -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6 -nographic -vnc :0 qemu-system-x86_64: -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/dpdkvhostuser0,server: QEMU waiting for connection on: disconnected:unix:/usr/local/var/run/openvswitch/dpdkvhostuser1,server 3. add br and vhostuser port as client ovs-vsctl add-br br1 -- set bridge br1 datapath_type=netdev ovs-vsctl add-port br1 dpdkvhostuser0 -- set Interface dpdkvhostuser0 type=dpdkvhostuserclient options:vhost-server-path="/usr/local/var/run/openvswitch/dpdkvhostuser0" ovs-vsctl add-port br1 dpdkvhostuser1 -- set Interface dpdkvhostuser1 type=dpdkvhostuserclient options:vhost-server-path="/usr/local/var/run/openvswitch/dpdkvhostuser1" 4. ping between 2 vm is ok 5. restart ovs process systemctl restart openvswitch 6. there is a que
Re: [Qemu-devel] [RFC 27/29] migration: setup ramstate for resume
On Thu, Aug 03, 2017 at 01:37:04PM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > After we updated the dirty bitmaps of ramblocks, we also need to update > > the critical fields in RAMState to make sure it is ready for a resume. > > > > Signed-off-by: Peter Xu > > --- > > migration/ram.c | 35 ++- > > 1 file changed, 34 insertions(+), 1 deletion(-) > > > > diff --git a/migration/ram.c b/migration/ram.c > > index c695b13..427bf6e 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -1947,6 +1947,31 @@ static int ram_state_init(RAMState **rsp) > > return 0; > > } > > > > +static void ram_state_resume_prepare(RAMState *rs) > > +{ > > +RAMBlock *block; > > +long pages = 0; > > + > > +/* > > + * Postcopy is not using xbzrle/compression, so no need for that. > > + * Also, since source are already halted, we don't need to care > > + * about dirty page logging as well. > > + */ > > + > > +RAMBLOCK_FOREACH(block) { > > +pages += bitmap_count_one(block->bmap, > > + block->max_length >> TARGET_PAGE_BITS); > > Again I think that needs to be block->used_length (see > migration_bitmap_sync). Fixing. > > > +} > > + > > +/* This may not be aligned with current bitmaps. Recalculate. */ > > +rs->migration_dirty_pages = pages; > > + > > +rs->last_seen_block = NULL; > > +rs->last_sent_block = NULL; > > +rs->last_page = 0; > > +rs->last_version = ram_list.version; > > A trace at this point with the pages count might be worthwhile. Added. > > > +} > > + > > /* > > * Each of ram_save_setup, ram_save_iterate and ram_save_complete has > > * long-running RCU critical section. When rcu-reclaims in the code > > @@ -2842,8 +2867,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, > > RAMBlock *block) > > static int ram_resume_prepare(MigrationState *s, void *opaque) > > { > > RAMState *rs = *(RAMState **)opaque; > > +int ret; > > > > -return ram_dirty_bitmap_sync_all(s, rs); > > +ret = ram_dirty_bitmap_sync_all(s, rs); > > Interesting; I'd assumed you'd load directly into this > bitmap rather than loading into the bitmap on each block. > Do we ever get the case where a bitmap is set on the source > bitmap but not in the loaded bitmap? (confirmed with Dave offlist that this is not a problem, blame myself on using a poor function name "ram_dirty_bitmap_sync_all" which is too close to existng "migration_bitmap_sync") -- Peter Xu
Re: [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed
On Thu, Aug 03, 2017 at 02:54:35PM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > Firstly, MigThrError enumeration is introduced to describe the error in > > migration_detect_error() better. This gives the migration_thread() a > > chance to know whether a recovery has happened. > > > > Then, if a recovery is detected, migration_thread() will reset its local > > variables to prepare for that. > > > > Signed-off-by: Peter Xu > > --- > > migration/migration.c | 40 +--- > > 1 file changed, 29 insertions(+), 11 deletions(-) > > > > diff --git a/migration/migration.c b/migration/migration.c > > index ecebe30..439bc22 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -2159,6 +2159,15 @@ static bool postcopy_should_start(MigrationState *s) > > return atomic_read(&s->start_postcopy) || s->start_postcopy_fast; > > } > > > > +typedef enum MigThrError { > > +/* No error detected */ > > +MIG_THR_ERR_NONE = 0, > > +/* Detected error, but resumed successfully */ > > +MIG_THR_ERR_RECOVERED = 1, > > +/* Detected fatal error, need to exit */ > > +MIG_THR_ERR_FATAL = 2, > > +} MigThrError; > > + > > Could you move this patch earlier to when postcopy_pause is created > so it's created with this enum? Sure. [...] > > @@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque) > > /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */ > > enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE; > > bool enable_colo = migrate_colo_enabled(); > > +MigThrError thr_error; > > > > rcu_register_thread(); > > > > @@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque) > > * Try to detect any kind of failures, and see whether we > > * should stop the migration now. > > */ > > -if (migration_detect_error(s)) { > > +thr_error = migration_detect_error(s); > > +if (thr_error == MIG_THR_ERR_FATAL) { > > +/* Stop migration */ > > break; > > +} else if (thr_error == MIG_THR_ERR_RECOVERED) { > > +/* > > + * Just recovered from a e.g. network failure, reset all > > + * the local variables. > > + */ > > +initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > > +initial_bytes = 0; > > They don't seem that important to reset? The problem is that we have this in migration_thread(): if (current_time >= initial_time + BUFFER_DELAY) { uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) - initial_bytes; uint64_t time_spent = current_time - initial_time; double bandwidth = (double)transferred_bytes / time_spent; threshold_size = bandwidth * s->parameters.downtime_limit; ... } Here qemu_ftell() would possibly be very small since we have just resumed... and then transferred_bytes will be extremely huge since "qemu_ftell(s->to_dst_file) - initial_bytes" is actually negative... Then, with luck, we'll got extremely huge "bandwidth" as well. -- Peter Xu
Re: [Qemu-devel] [PATCH] pc/acpi: Fix booting of macOS with Clover EFI bootloader
On Fri, 4 Aug 2017 12:15:40 +0530 Dhiru Kholia wrote: > This was tested with macOS 10.12.5 and Clover r4114. > > Without this patch, the macOS boot process gets stuck at the Apple logo > without showing any progress bar. > > I have documented the process of running macOS on QEMU/KVM at, > > https://github.com/kholia/OSX-KVM/ > > Instead of using this patch, adding an additional command-line knob > which exposes this setting (force_rev1_fadt) to the user might be a more > general solution. it's been reported that OVMF had issues that were fixed, you probably want to read this thread: https://www.mail-archive.com/qemu-devel@nongnu.org/msg468456.html CCing interested parties > --- > hw/i386/acpi-build.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > index b9c245c..0f8df19 100644 > --- a/hw/i386/acpi-build.c > +++ b/hw/i386/acpi-build.c > @@ -145,6 +145,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm) > object_property_get_uint(obj, ACPI_PCIHP_IO_LEN_PROP, NULL); > } > if (lpc) { > +pm->force_rev1_fadt = true; > obj = lpc; > pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE; > }
Re: [Qemu-devel] [RFC 28/29] migration: final handshake for the resume
On Thu, Aug 03, 2017 at 02:47:44PM +0100, Dr. David Alan Gilbert wrote: [...] > > +static int postcopy_resume_handshake(MigrationState *s) > > +{ > > +qemu_mutex_lock(&s->resume_lock); > > + > > +qemu_savevm_send_postcopy_resume(s->to_dst_file); > > + > > +while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { > > +qemu_cond_wait(&s->resume_cond, &s->resume_lock); > > +} > > + > > +qemu_mutex_unlock(&s->resume_lock); > > + > > +if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { > > +return 0; > > +} > > That feels to be a small racy - couldn't that validly become a > MIGRATION_STATUS_COMPLETED before that check? Since postcopy_resume_handshake() is called in migration_thread() context, so it won't change to complete at this point (confirmed with Dave offlist on the question). > > I wonder if we need to change migrate_fd_cancel to be able to > cause a cancel in this case? Yeah that's important, but haven't considered in current series. Do you mind to postpone it as TODO as well (along with the work to allow the user to manually switch to PAUSED state, as Dan suggested)? -- Peter Xu
[Qemu-devel] [PATCH 1/3] i386/kvm: use a switch statement for MSR detection
Switch is easier on the eye and might lead to better codegen. Signed-off-by: Ladi Prosek --- target/i386/kvm.c | 76 +++ 1 file changed, 32 insertions(+), 44 deletions(-) diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 6db7783..ed119ca 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -1081,66 +1081,54 @@ static int kvm_get_supported_msrs(KVMState *s) int i; for (i = 0; i < kvm_msr_list->nmsrs; i++) { -if (kvm_msr_list->indices[i] == MSR_STAR) { +switch (kvm_msr_list->indices[i]) { +case MSR_STAR: has_msr_star = true; -continue; -} -if (kvm_msr_list->indices[i] == MSR_VM_HSAVE_PA) { +break; +case MSR_VM_HSAVE_PA: has_msr_hsave_pa = true; -continue; -} -if (kvm_msr_list->indices[i] == MSR_TSC_AUX) { +break; +case MSR_TSC_AUX: has_msr_tsc_aux = true; -continue; -} -if (kvm_msr_list->indices[i] == MSR_TSC_ADJUST) { +break; +case MSR_TSC_ADJUST: has_msr_tsc_adjust = true; -continue; -} -if (kvm_msr_list->indices[i] == MSR_IA32_TSCDEADLINE) { +break; +case MSR_IA32_TSCDEADLINE: has_msr_tsc_deadline = true; -continue; -} -if (kvm_msr_list->indices[i] == MSR_IA32_SMBASE) { +break; +case MSR_IA32_SMBASE: has_msr_smbase = true; -continue; -} -if (kvm_msr_list->indices[i] == MSR_IA32_MISC_ENABLE) { +break; +case MSR_IA32_MISC_ENABLE: has_msr_misc_enable = true; -continue; -} -if (kvm_msr_list->indices[i] == MSR_IA32_BNDCFGS) { +break; +case MSR_IA32_BNDCFGS: has_msr_bndcfgs = true; -continue; -} -if (kvm_msr_list->indices[i] == MSR_IA32_XSS) { +break; +case MSR_IA32_XSS: has_msr_xss = true; -continue; -} -if (kvm_msr_list->indices[i] == HV_X64_MSR_CRASH_CTL) { +break;; +case HV_X64_MSR_CRASH_CTL: has_msr_hv_crash = true; -continue; -} -if (kvm_msr_list->indices[i] == HV_X64_MSR_RESET) { +break; +case HV_X64_MSR_RESET: has_msr_hv_reset = true; -continue; -} -if (kvm_msr_list->indices[i] == HV_X64_MSR_VP_INDEX) { +break; +case HV_X64_MSR_VP_INDEX: has_msr_hv_vpindex = true; -continue; -} -if (kvm_msr_list->indices[i] == HV_X64_MSR_VP_RUNTIME) { +break; +case HV_X64_MSR_VP_RUNTIME: has_msr_hv_runtime = true; -continue; -} -if (kvm_msr_list->indices[i] == HV_X64_MSR_SCONTROL) { +break; +case HV_X64_MSR_SCONTROL: has_msr_hv_synic = true; -continue; -} -if (kvm_msr_list->indices[i] == HV_X64_MSR_STIMER0_CONFIG) { +break; +case HV_X64_MSR_STIMER0_CONFIG: has_msr_hv_stimer = true; -continue; +break; } + } } -- 2.9.3
[Qemu-devel] [PATCH 3/3] i386/kvm: advertise Hyper-V frequency MSRs
As of kernel commit eb82feea59d6 ("KVM: hyperv: support HV_X64_MSR_TSC_FREQUENCY and HV_X64_MSR_APIC_FREQUENCY"), KVM supports two new MSRs which are required for nested Hyper-V to read timestamps with RDTSC + TSC page. This commit makes QEMU advertise the MSRs with CPUID.4003H:EAX[11] and CPUID.4003H:EDX[8] as specified in the Hyper-V TLFS and experimentally verified on a Hyper-V host. The feature is enabled with the existing hv-time CPU flag, and only if the TSC frequency is stable across migration and known. Signed-off-by: Ladi Prosek --- target/i386/kvm.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 77b6373..7e484a7 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -89,6 +89,7 @@ static bool has_msr_hv_vpindex; static bool has_msr_hv_runtime; static bool has_msr_hv_synic; static bool has_msr_hv_stimer; +static bool has_msr_hv_frequencies; static bool has_msr_xss; static bool has_msr_architectural_pmu; @@ -631,7 +632,17 @@ static int hyperv_handle_properties(CPUState *cs) if (cpu->hyperv_time) { env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_HYPERCALL_AVAILABLE; env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE; -env->features[FEAT_HYPERV_EAX] |= 0x200; +env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_REFERENCE_TSC_AVAILABLE; +if (has_msr_hv_frequencies +/* TSC clock must be stable and known for this feature. */ +&& ((env->features[FEAT_8000_0007_EDX] & CPUID_APM_INVTSC) +|| env->user_tsc_khz != 0) +&& env->tsc_khz != 0) { + +env->features[FEAT_HYPERV_EAX] |= HV_X64_ACCESS_FREQUENCY_MSRS; +env->features[FEAT_HYPERV_EDX] |= +HV_FEATURE_FREQUENCY_MSRS_AVAILABLE; +} } if (cpu->hyperv_crash && has_msr_hv_crash) { env->features[FEAT_HYPERV_EDX] |= HV_X64_GUEST_CRASH_MSR_AVAILABLE; @@ -1127,6 +1138,9 @@ static int kvm_get_supported_msrs(KVMState *s) case HV_X64_MSR_STIMER0_CONFIG: has_msr_hv_stimer = true; break; +case HV_X64_MSR_TSC_FREQUENCY: +has_msr_hv_frequencies = true; +break; } } -- 2.9.3
[Qemu-devel] [PATCH 0/3] i386/kvm: advertise Hyper-V frequency MSRs
This is the QEMU part of the changes required for nested Hyper-V to read timestamps with RDTSC + TSC page. Without exposing the frequency MSRs, Windows with the Hyper-V role enabled use the much slower HV_X64_MSR_TIME_REF_COUNT (0x4020) RDMSR to read timestamps. The new registers are exposed only if the TSC frequency is stable across migration and known, as suggested by Paolo. Ladi Prosek (3): i386/kvm: use a switch statement for MSR detection i386/kvm: set tsc_khz before configuring Hyper-V CPUID i386/kvm: advertise Hyper-V frequency MSRs target/i386/kvm.c | 130 +++--- 1 file changed, 66 insertions(+), 64 deletions(-) -- 2.9.3
[Qemu-devel] [PATCH 2/3] i386/kvm: set tsc_khz before configuring Hyper-V CPUID
Timing-related Hyper-V enlightenments will benefit from knowing the final tsc_khz value. This commit just moves the code in preparation for further changes. Signed-off-by: Ladi Prosek --- target/i386/kvm.c | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/target/i386/kvm.c b/target/i386/kvm.c index ed119ca..77b6373 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -695,6 +695,25 @@ int kvm_arch_init_vcpu(CPUState *cs) cpuid_i = 0; +r = kvm_arch_set_tsc_khz(cs); +if (r < 0) { +goto fail; +} + +/* vcpu's TSC frequency is either specified by user, or following + * the value used by KVM if the former is not present. In the + * latter case, we query it from KVM and record in env->tsc_khz, + * so that vcpu's TSC frequency can be migrated later via this field. + */ +if (!env->tsc_khz) { +r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ? +kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) : +-ENOTSUP; +if (r > 0) { +env->tsc_khz = r; +} +} + /* Paravirtualization CPUIDs */ if (hyperv_enabled(cpu)) { c = &cpuid_data.entries[cpuid_i++]; @@ -961,25 +980,6 @@ int kvm_arch_init_vcpu(CPUState *cs) } } -r = kvm_arch_set_tsc_khz(cs); -if (r < 0) { -goto fail; -} - -/* vcpu's TSC frequency is either specified by user, or following - * the value used by KVM if the former is not present. In the - * latter case, we query it from KVM and record in env->tsc_khz, - * so that vcpu's TSC frequency can be migrated later via this field. - */ -if (!env->tsc_khz) { -r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ? -kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) : --ENOTSUP; -if (r > 0) { -env->tsc_khz = r; -} -} - if (cpu->vmware_cpuid_freq /* Guests depend on 0x4000 to detect this feature, so only expose * it if KVM exposes leaf 0x4000. (Conflicts with Hyper-V) */ -- 2.9.3
Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
On Fri, Aug 04, 2017 at 09:30:01AM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote: > > > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote: > > > > > > [...] > > > > > > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > > > > > +{ > > > > > +/* > > > > > + * This means source VM is ready to resume the postcopy > > > > > migration. > > > > > + * It's time to switch state and release the fault thread to > > > > > + * continue service page faults. > > > > > + */ > > > > > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER, > > > > > + MIGRATION_STATUS_POSTCOPY_ACTIVE); > > > > > +qemu_sem_post(&mis->postcopy_pause_sem_fault); > > > > > > > > Is it worth sanity checking that you were in RECOVER at this point? > > > > > > Yeah, it never hurts. Will do. > > > > Not sure whether this would be good (note: I returned 0 in the if): > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > index b7843c2..b34f59b 100644 > > --- a/migration/savevm.c > > +++ b/migration/savevm.c > > @@ -1709,6 +1709,12 @@ static int > > loadvm_postcopy_handle_run(MigrationIncomingState *mis) > > > > static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > > { > > +if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { > > +error_report("%s: illegal resume received", __func__); > > +/* Don't fail the load, only for this. */ > > +return 0; > > +} > > + > > /* > > * This means source VM is ready to resume the postcopy migration. > > * It's time to switch state and release the fault thread to > > > > Basically I just don't want to crash the dest VM (it holds hot dirty > > pages) even if it receives a faulty RESUME command. > > Yes, so now that's a fun problem; effectively you then have 3 valid > failure modes: > a) An IO failure so we need to go into POSTCOPY_PAUSE > b) A fatal migration stream problem to quit > c) A non-fatal migration stream problem to go .. back into PAUSE? Hmm yes... So I got at least three TODO ITEMs now: - support manual switch source into PAUSED state - support migrate_cancel during PAUSED/RECOVER state - when anything wrong happens during PAUSED/RECOVER, switching back to PAUSED state on both sides It just depends on whether we would like to postpone these work, or we think any of them are essential even for the first version. IMHO we can postpone this 3rd one as well. -- Peter Xu
Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy
* Peter Xu (pet...@redhat.com) wrote: > On Thu, Aug 03, 2017 at 03:03:57PM +0100, Dr. David Alan Gilbert wrote: > > * Peter Xu (pet...@redhat.com) wrote: > > > On Tue, Aug 01, 2017 at 10:47:16AM +0100, Dr. David Alan Gilbert wrote: > > > > * Peter Xu (pet...@redhat.com) wrote: > > > > > > [...] > > > > > > > > +/* Return true if we should continue the migration, or false. */ > > > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis) > > > > > +{ > > > > > +trace_postcopy_pause_incoming(); > > > > > + > > > > > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, > > > > > + MIGRATION_STATUS_POSTCOPY_PAUSED); > > > > > + > > > > > +assert(mis->from_src_file); > > > > > +qemu_file_shutdown(mis->from_src_file); > > > > > +qemu_fclose(mis->from_src_file); > > > > > +mis->from_src_file = NULL; > > > > > + > > > > > +assert(mis->to_src_file); > > > > > +qemu_mutex_lock(&mis->rp_mutex); > > > > > +qemu_file_shutdown(mis->to_src_file); > > > > > +qemu_fclose(mis->to_src_file); > > > > > +mis->to_src_file = NULL; > > > > > +qemu_mutex_unlock(&mis->rp_mutex); > > > > > > > > Hmm is that safe? If we look at migrate_send_rp_message we have: > > > > > > > > static void migrate_send_rp_message(MigrationIncomingState *mis, > > > > enum mig_rp_message_type > > > > message_type, > > > > uint16_t len, void *data) > > > > { > > > > trace_migrate_send_rp_message((int)message_type, len); > > > > qemu_mutex_lock(&mis->rp_mutex); > > > > qemu_put_be16(mis->to_src_file, (unsigned int)message_type); > > > > qemu_put_be16(mis->to_src_file, len); > > > > qemu_put_buffer(mis->to_src_file, data, len); > > > > qemu_fflush(mis->to_src_file); > > > > qemu_mutex_unlock(&mis->rp_mutex); > > > > } > > > > > > > > If we came into postcopy_pause_incoming at about the same time > > > > migrate_send_rp_message was being called and pause_incoming took the > > > > lock first, then once it release the lock, send_rp_message carries on > > > > and uses mis->to_src_file that's now NULL. > > > > > > > > One solution here is to just call qemu_file_shutdown() but leave the > > > > files open at this point, but clean the files up sometime later. > > > > > > I see the commnent on patch 14 as well - yeah, we need patch 14 to > > > co-op here, and as long as we are with patch 14, we should be ok. > > > > > > > > > > > > + > > > > > +while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { > > > > > +qemu_sem_wait(&mis->postcopy_pause_sem_dst); > > > > > +} > > > > > + > > > > > +trace_postcopy_pause_incoming_continued(); > > > > > + > > > > > +return true; > > > > > +} > > > > > + > > > > > static int qemu_loadvm_state_main(QEMUFile *f, > > > > > MigrationIncomingState *mis) > > > > > { > > > > > uint8_t section_type; > > > > > int ret = 0; > > > > > > > > > > +retry: > > > > > while (true) { > > > > > section_type = qemu_get_byte(f); > > > > > > > > > > @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, > > > > > MigrationIncomingState *mis) > > > > > out: > > > > > if (ret < 0) { > > > > > qemu_file_set_error(f, ret); > > > > > + > > > > > +/* > > > > > + * Detect whether it is: > > > > > + * > > > > > + * 1. postcopy running > > > > > + * 2. network failure (-EIO) > > > > > + * > > > > > + * If so, we try to wait for a recovery. > > > > > + */ > > > > > +if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && > > > > > +ret == -EIO && postcopy_pause_incoming(mis)) { > > > > > +/* Reset f to point to the newly created channel */ > > > > > +f = mis->from_src_file; > > > > > +goto retry; > > > > > +} > > > > > > > > I wonder if: > > > > > > > >if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && > > > >ret == -EIO && postcopy_pause_incoming(mis)) { > > > >/* Try again after postcopy recovery */ > > > >return qemu_loadvm_state_main(mis->from_src_file, mis); > > > >} > > > > would be nicer; it avoids the goto loop. > > > > > > I agree we should avoid using goto loops. However I do see vast usages > > > of goto like this one when we want to redo part of the procedures of a > > > function (or, of course, when handling errors in "C-style"). > > > > We mostly use them to jump forward to an error exit; only rarely do > > we do loops with them; so if we can sensibly avoid them it's best. > > > > > Calling qemu_loadvm_state_main() inside itself is ok as well, but it > > > also has defect: stack usage would be out of control, or even, it can > > > be controled by malicious users. E.g., if someone used program to > > > period
Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy
On Fri, Aug 04, 2017 at 10:33:19AM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > On Thu, Aug 03, 2017 at 03:03:57PM +0100, Dr. David Alan Gilbert wrote: > > > * Peter Xu (pet...@redhat.com) wrote: > > > > On Tue, Aug 01, 2017 at 10:47:16AM +0100, Dr. David Alan Gilbert wrote: > > > > > * Peter Xu (pet...@redhat.com) wrote: > > > > > > > > [...] > > > > > > > > > > +/* Return true if we should continue the migration, or false. */ > > > > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis) > > > > > > +{ > > > > > > +trace_postcopy_pause_incoming(); > > > > > > + > > > > > > +migrate_set_state(&mis->state, > > > > > > MIGRATION_STATUS_POSTCOPY_ACTIVE, > > > > > > + MIGRATION_STATUS_POSTCOPY_PAUSED); > > > > > > + > > > > > > +assert(mis->from_src_file); > > > > > > +qemu_file_shutdown(mis->from_src_file); > > > > > > +qemu_fclose(mis->from_src_file); > > > > > > +mis->from_src_file = NULL; > > > > > > + > > > > > > +assert(mis->to_src_file); > > > > > > +qemu_mutex_lock(&mis->rp_mutex); > > > > > > +qemu_file_shutdown(mis->to_src_file); > > > > > > +qemu_fclose(mis->to_src_file); > > > > > > +mis->to_src_file = NULL; > > > > > > +qemu_mutex_unlock(&mis->rp_mutex); > > > > > > > > > > Hmm is that safe? If we look at migrate_send_rp_message we have: > > > > > > > > > > static void migrate_send_rp_message(MigrationIncomingState *mis, > > > > > enum mig_rp_message_type > > > > > message_type, > > > > > uint16_t len, void *data) > > > > > { > > > > > trace_migrate_send_rp_message((int)message_type, len); > > > > > qemu_mutex_lock(&mis->rp_mutex); > > > > > qemu_put_be16(mis->to_src_file, (unsigned int)message_type); > > > > > qemu_put_be16(mis->to_src_file, len); > > > > > qemu_put_buffer(mis->to_src_file, data, len); > > > > > qemu_fflush(mis->to_src_file); > > > > > qemu_mutex_unlock(&mis->rp_mutex); > > > > > } > > > > > > > > > > If we came into postcopy_pause_incoming at about the same time > > > > > migrate_send_rp_message was being called and pause_incoming took the > > > > > lock first, then once it release the lock, send_rp_message carries on > > > > > and uses mis->to_src_file that's now NULL. > > > > > > > > > > One solution here is to just call qemu_file_shutdown() but leave the > > > > > files open at this point, but clean the files up sometime later. > > > > > > > > I see the commnent on patch 14 as well - yeah, we need patch 14 to > > > > co-op here, and as long as we are with patch 14, we should be ok. > > > > > > > > > > > > > > > + > > > > > > +while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { > > > > > > +qemu_sem_wait(&mis->postcopy_pause_sem_dst); > > > > > > +} > > > > > > + > > > > > > +trace_postcopy_pause_incoming_continued(); > > > > > > + > > > > > > +return true; > > > > > > +} > > > > > > + > > > > > > static int qemu_loadvm_state_main(QEMUFile *f, > > > > > > MigrationIncomingState *mis) > > > > > > { > > > > > > uint8_t section_type; > > > > > > int ret = 0; > > > > > > > > > > > > +retry: > > > > > > while (true) { > > > > > > section_type = qemu_get_byte(f); > > > > > > > > > > > > @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile > > > > > > *f, MigrationIncomingState *mis) > > > > > > out: > > > > > > if (ret < 0) { > > > > > > qemu_file_set_error(f, ret); > > > > > > + > > > > > > +/* > > > > > > + * Detect whether it is: > > > > > > + * > > > > > > + * 1. postcopy running > > > > > > + * 2. network failure (-EIO) > > > > > > + * > > > > > > + * If so, we try to wait for a recovery. > > > > > > + */ > > > > > > +if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && > > > > > > +ret == -EIO && postcopy_pause_incoming(mis)) { > > > > > > +/* Reset f to point to the newly created channel */ > > > > > > +f = mis->from_src_file; > > > > > > +goto retry; > > > > > > +} > > > > > > > > > > I wonder if: > > > > > > > > > >if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && > > > > >ret == -EIO && postcopy_pause_incoming(mis)) { > > > > >/* Try again after postcopy recovery */ > > > > >return qemu_loadvm_state_main(mis->from_src_file, mis); > > > > >} > > > > > would be nicer; it avoids the goto loop. > > > > > > > > I agree we should avoid using goto loops. However I do see vast usages > > > > of goto like this one when we want to redo part of the procedures of a > > > > function (or, of course, when handling errors in "C-style"). > > > > > > We mostly use them to jump forward to an error exit; only rarely do > > >
[Qemu-devel] [PATCH v9 1/3] ACPI: add APEI/HEST/CPER structures and macros
(1) Add related APEI/HEST table structures and macros, these definition refer to ACPI 6.1 and UEFI 2.6 spec. (2) Add generic error status block and CPER memory section definition, user space only handle memory section errors. Signed-off-by: Dongjiu Geng --- thanks Michael and Laszlo's review: change since v5: (1) update the commit message title (2) remove UUID_BE and UEFI_CPER_SEC_PLATFORM_MEM macros (3) remove including "qemu/uuid.h" file in this patch (4) remove including "hest_ghes.h" file in this patch (5) drop "this is" from the comment of structures and macros (6) replace the "QemuUUID section_type_le" to "uint8_t section_type_le[16]" chnage since v4: (1) fix email threading in this series is incorrect issue change since v3: (1) separate the original one patch into three patches: one is new ACPI structures and macros, another is C source file to generate ACPI HEST table and dynamically record CPER ,final patch is the change about Makefile and configuration (2) add comments about where the ACPI structures and macros come from, for example, they come from the UEFI Spec 2.6, ""; ACPI 6.1 spec, "xx". (3) correct the macros name, for emaple, prefix some macro names with "UEFI_". (4) remove the uuid_le struct and use the QemuUUID in the include/qemu/uuid.h" (5) remove the duplicate ACPI address space, because it already defined in the "include/hw/acpi/aml-build.h" (6) remove the acpi_generic_address structure because same definition exists in the AcpiGenericAddress. (7) rename the struct acpi_hest_notify to AcpiHestNotifyType (8) rename the struct acpi_hest_types to AcpiHestSourceType (9) rename enum constants AcpiHestSourceType to ACPI_HEST_SOURCE_xxx from ACPI_HEST_TYPE_xxx (10) remove the NOT_USED{3,4,5} enum constants in the AcpiHestSourceType. (11) add missed QEMU_PACKED for the struct definition. (12) remove the defnition of AcpiGenericErrorData, and rename the AcpiGenericErrorDataV300 to AcpiGenericErrorData. (13) use the QemuUUID type for the "section_type" field AcpiGenericErrorData, and rename it to section_type_le. (14) moving type AcpiGenericErrorSeverity above AcpiGenericErrorData and AcpiGenericErrorDataV300, and remarking on the "error_severity" fields that they take their values from AcpiGenericErrorSeverity (15) remove the wrongly call to BERT(Boot Error Record Table) (16) add comments for the struction member, for example, pint out that the block_status member in the AcpiGenericErrorStatus is a bitmask composed of ACPI_GEBS_xxx macros (17) remove the hardware error source notification type list, and rename the MAX_ERROR_SOURCE_COUNT_V6 to ACPI_HEST_NOTIFY_RESERVED. (18) remove the physical_addr member of GhesErrorState (19) change the "uint64_t ghes_addr_le[8]" in GhesErrorState to uint64_t ghes_addr_le (20) change the second parameter to "error_physical_addr" in the ghes_update_guest API statement --- include/hw/acpi/acpi-defs.h | 193 1 file changed, 193 insertions(+) diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h index 4cc3630..ff9525e 100644 --- a/include/hw/acpi/acpi-defs.h +++ b/include/hw/acpi/acpi-defs.h @@ -295,6 +295,44 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable; #define ACPI_APIC_GENERIC_TRANSLATOR15 #define ACPI_APIC_RESERVED 16 /* 16 and greater are reserved */ +/* UEFI Spec 2.6, "N.2.5 Memory Error Section */ +#define UEFI_CPER_MEM_VALID_ERROR_STATUS 0x0001 +#define UEFI_CPER_MEM_VALID_PA 0x0002 +#define UEFI_CPER_MEM_VALID_PA_MASK 0x0004 +#define UEFI_CPER_MEM_VALID_NODE 0x0008 +#define UEFI_CPER_MEM_VALID_CARD 0x0010 +#define UEFI_CPER_MEM_VALID_MODULE 0x0020 +#define UEFI_CPER_MEM_VALID_BANK 0x0040 +#define UEFI_CPER_MEM_VALID_DEVICE 0x0080 +#define UEFI_CPER_MEM_VALID_ROW 0x0100 +#define UEFI_CPER_MEM_VALID_COLUMN 0x0200 +#define UEFI_CPER_MEM_VALID_BIT_POSITION 0x0400 +#define UEFI_CPER_MEM_VALID_REQUESTOR0x0800 +#define UEFI_CPER_MEM_VALID_RESPONDER0x1000 +#define UEFI_CPER_MEM_VALID_TARGET 0x2000 +#define UEFI_CPER_MEM_VALID_ERROR_TYPE 0x4000 +#define UEFI_CPER_MEM_VALID_RANK_NUMBER 0x8000 +#define UEFI_CPER_MEM_VALID_CARD_HANDLE 0x1 +#define UEFI_CPER_MEM_VALID_MODULE_HANDLE0x2 +#define UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC 3 + +/* From the ACPI 6.1 spec, "18.3.2.9 Hardware Error Notification" */ + +enum AcpiHestNotifyType { +ACPI_HEST_NOTIFY_POLLED = 0, +ACPI_HEST_NOTIFY_EXTERNAL = 1, +ACPI_HEST_NOTIFY_LOCAL = 2, +ACPI_HEST_NOTIFY_SCI = 3, +ACPI_HEST_NOTIFY_NMI = 4, +ACPI_HEST_NOTIFY_CMCI = 5, /* ACPI 5.0 */ +ACPI_HEST_NOTIFY_MCE = 6, /* ACPI 5.0 */ +ACPI_HEST_NOTIFY_GPIO = 7, /* ACPI 6.0 */ +ACPI_HEST_NOTIFY_SEA = 8, /* ACPI 6.1 */
[Qemu-devel] [PATCH v9 3/3] ACPI: build and enable APEI GHES in the Makefile and configuration
Add CONFIG_ACPI_APEI configuration in the Makefile and enable it in the arm-softmmu.mak Signed-off-by: Dongjiu Geng --- thanks a lot Michael and Laszlo's review and comments: change since v5: (1) no change change since v4: (1) fix email threading in this series is incorrect issue change since v3: (1) change name to "CONFIG_ACPI_APEI" from CONFIG_ACPI_APEI_GENERATION --- default-configs/arm-softmmu.mak | 1 + hw/acpi/Makefile.objs | 1 + 2 files changed, 2 insertions(+) diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index 1e3bd2b..ee6f5fc 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -121,3 +121,4 @@ CONFIG_ACPI=y CONFIG_SMBIOS=y CONFIG_ASPEED_SOC=y CONFIG_GPIO_KEY=y +CONFIG_ACPI_APEI=y diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs index 11c35bc..bafb148 100644 --- a/hw/acpi/Makefile.objs +++ b/hw/acpi/Makefile.objs @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o +common-obj-$(CONFIG_ACPI_APEI) += hest_ghes.o common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o common-obj-y += acpi_interface.o -- 1.8.3.1
[Qemu-devel] [PATCH v9 2/3] ACPI: Add APEI GHES Table Generation support
This implements APEI GHES Table by passing the error CPER info to the guest via a fw_cfg_blob. After a CPER info is recorded, an SEA(Synchronous External Abort)/SEI(SError Interrupt) exception will be injected into the guest OS. Below is the table layout, the max number of error soure is 11, which is classified by notification type. etc/acpi/tables etc/hardware_errors == + +--++--+ | | HEST ||address | +--+ | +--+|registers | | Error Status | | | GHES0|| ++ | Data Block 0 | | +--+ +->| |status_address0 |->| ++ | | .| | | ++ | | CPER | | | error_status_address-+-+ +--->| |status_address1 |--+ | | CPER | | | .| || ++ | | | | | | read_ack_register+-+ || . | | | | CPER | | | read_ack_preserve| | |+--+ | | +-++ | | read_ack_write | | | +->| |status_address10|+ | | Error Status | + +--+ | | | | ++| | | Data Block 1 | | | GHES1| +-+-+->| | ack_value0 || +-->| ++ + +--+ | | | ++| | | CPER | | | .| | | +--->| | ack_value1 || | | CPER | | | error_status_address-+---+ | || ++| | | | | | .| | || | . || | | CPER | | | read_ack_register+-+-+| ++| +-++ | | read_ack_preserve| | +->| | ack_value10|| | |.. | | | read_ack_write | | | | ++| | ++ + +--| | | | | Error Status | | | ... | | | | | Data Block 10| + +--+ | | +>| ++ | | GHES10 | | || | CPER | + +--+ | || | CPER | | | .| | || | | | | error_status_address-+-+ || | CPER | | | .| | +-++ | | read_ack_register+-+ | | read_ack_preserve| | | read_ack_write | + +--+ For GHESv2 error source, the OSPM must acknowledges the error via Read Ack register. so user space must check the ack value to avoid read-write race condition. Signed-off-by: Dongjiu Geng --- thanks a lot Michael and Laszlo's review and comments: change since v8: (1) remove the ACK value address change since v6: (1) update the commit message change since v5: (1) move to GHESv2 error source from GHESv1, because we maily use the GHESv2 (2) add the GHESv2 error source layout in the comments (2) add the logic to read the ack value from OSPM to avoid read-write race condition (3) including header files "aml-build.h" and "hest_ghes.h" in this patch to make it reasonable (4) calculate the fw_cfg blob offsets that should be patched in more fine-grained steps, with multiple separate increments, using: - structure type names, - sizeof operators, - offsetof macros, - and possibly a separate comment for each offset increment. (5) remove page boundary refers to 4096 in the "bios_linker_loader_alloc" (6) using build_append_int_noprefix to avoid use pointer math, so that the code is easily readable (7) change memory section definition to array from QemuUUID (8) only enable SEA/SEI notification hardware error source to avoid guest OS ACPI driver probe warning change since v4: 1. fix email threading in this series is incorrect issue change since v3: 1. remove the unnecessary include for "hw/acpi/vmgenid.h" in hw/arm/virt-acpi-build.c 2. add conversion between LE and host-endian for the CPER record 3. handle the case that run out of the preallocated memory for the CPER record 4. change to use g_malloc0 instead of g_malloc 5. change block_reqr_size name to block_rer_size 6. change QEMU coding style, that
Re: [Qemu-devel] [PATCH 10/15] target/arm: Don't use cpsr_write/cpsr_read to transfer M profile XPSR
On 3 August 2017 at 23:13, Richard Henderson wrote: > On 08/02/2017 09:43 AM, Peter Maydell wrote: >> +if (val & XPSR_EXCP) { >> +/* This is a CPSR format value from an older QEMU. (We can tell >> + * because values transferred in XPSR format always have zero >> + * for the EXCP field, and CPSR format will always have bit 4 >> + * set in CPSR_M.) Rearrange it into XPSR format. The >> significant >> + * differences are that the T bit is not in the same place, the >> + * primask/faultmask info may be in the CPSR I and F bits, and >> + * we do not want the mode bits. >> + */ >> +uint32_t newval = val; >> + >> +newval &= (CPSR_NZCV | CPSR_Q | CPSR_IT | CPSR_GE); >> +if (val & CPSR_T) { >> +newval |= XPSR_T; >> +} >> +/* If the I or F bits are set then this is a migration from >> + * an old QEMU which still stored the M profile FAULTMASK >> + * and PRIMASK in env->daif. For a new QEMU, the data is >> + * transferred using the vmstate_m_faultmask_primask subsection. >> + */ > > The second comment seems sort of redundant with the first now. I felt that the migration-compat stuff was sufficiently subtle that it was worth retaining the second detailed comment as well as the brief summary in the new first comment. thanks -- PMM
Re: [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP
* Peter Xu (pet...@redhat.com) wrote: > On Thu, Aug 03, 2017 at 11:50:22AM +0100, Dr. David Alan Gilbert wrote: > > > +/* Size of the bitmap, in bytes */ > > > +size = (block->max_length >> TARGET_PAGE_BITS) / 8; > > > +qemu_put_be64(file, size); > > > +qemu_put_buffer(file, (const uint8_t *)block->receivedmap, size); > > > > Do we need to be careful about endianness and length of long here? > > The migration stream can (theoretically) migrate between hosts of > > different endianness, e.g. a Power LE and Power BE host it can also > > migrate between a 32bit and 64bit host where the 'long' used in our > > bitmap is a different length. > > Ah, good catch... > > I feel like we'd better provide a new bitmap helper for this when the > bitmap will be sent to somewhere else, like: > > void bitmap_to_le(unsigned long *dst, const unsigned long *src, > long nbits); > void bitmap_from_le(unsigned long *dst, const unsigned long *src, > long nbits); > > I used little endian since I *think* that should work even cross 32/64 > bits machines (and I think big endian should not work). Lets think about some combinations: 64 bit LE G0,G1...G7 64 bit BE G7,G6...G0 32 bit LE A0,A1,A2,A3, B0,B1,B2,B3 32 bit BE A3,A2,A1,A0 B3,B2,B1,B0 considering a 64bit BE src to a 32bit LE dest: 64 bit BE G7,G6...G0 bitmap_to_le swaps that to G0,G1,..G7 destination reads two 32bit chunks: G0,G1,G2,G3G4,G5,G6,G7 dest is LE so no byteswap is needed. Yes, I _think_ that's OK. > > I think that means you have to save it as a series of long's; > > and also just make sure 'size' is a multiple of 'long' - otherwise > > you lose the last few bytes, which on a big endian system would > > be a problem. > > Yeah, then the size should possibly be pre-cooked with > BITS_TO_LONGS(). However that's slightly tricky as well, maybe I > should provide another bitmap helper: > > static inline long bitmap_size(long nbits) > { > return BITS_TO_LONGS(nbits); > } > > Since the whole thing should be part of bitmap APIs imho. The macro is enough I think. > > > > Also, should we be using 'max_length' or 'used_length' - ram_save_setup > > stores the used_length. I don't think we should be accessing outside > > the used_length? That might also make the thing about 'size' being > > rounded to a 'long' more interesting; maybe need to check you don't use > > the bits outside the used_length. > > Yes. AFAIU max_length and used_length are always the same currently in > our codes. I used max_length since in ram_state_init() we inited > block->bmap and block->unsentmap with it. I can switch to used_length > though. I remember it went in a couple of years ago because there were cases it was different; they're rare though - I think it was an ACPI case. > > > +qemu_fflush(file); > > > + > > > +if (qemu_file_get_error(file)) { > > > +return qemu_file_get_error(file); > > > +} > > > + > > > +return sizeof(size) + size; > > > > I think since size is always sent as a 64bit that's size + 8. > > Yes. I "offloaded" the calcluation of sizeof(size) to compiler (in > case I got brain furt when writting the codes...). So you prefer > digits directly in these cases? It might be just fragile if we changed > the type of "size" someday (though I guess we won't). > > Let me use "size + 8". Lets stick with what you have actually; it's OK since size is a uint64_t. Dave > > > > > > +} > > > + > > > +/* > > > * An outstanding page request, on the source, having been received > > > * and queued > > > */ > > > @@ -2705,6 +2731,54 @@ static int ram_load(QEMUFile *f, void *opaque, int > > > version_id) > > > return ret; > > > } > > > > > > +/* > > > + * Read the received bitmap, revert it as the initial dirty bitmap. > > > + * This is only used when the postcopy migration is paused but wants > > > + * to resume from a middle point. > > > + */ > > > +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) > > > +{ > > > +QEMUFile *file = s->rp_state.from_dst_file; > > > +uint64_t local_size = (block->max_length >> TARGET_PAGE_BITS) / 8; > > > +uint64_t size; > > > + > > > +if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { > > > +error_report("%s: incorrect state %s", __func__, > > > + MigrationStatus_lookup[s->state]); > > > +return -EINVAL; > > > +} > > > + > > > +size = qemu_get_be64(file); > > > + > > > +/* The size of the bitmap should match with our ramblock */ > > > +if (size != local_size) { > > > +error_report("%s: ramblock '%s' bitmap size mismatch " > > > + "(0x%lx != 0x%lx)", __func__, block->idstr, > > > + size, local_size); > > > +return -EINVAL; > > > +} > > > > Coming back to the used_length thing above; again I think the rule > > is that the used_length has to match not the max_leng
[Qemu-devel] [PATCH v9 0/3] Generate APEI GHES table and dynamically record CPER
In the armv8 platform, the mainly hardware error source are ARMv8 SEA/SEI/GSIV. For the ARMv8 SEA/SEI, the KVM or host kernel will signal SIGBUS or use other interface to notify user space, such as Qemu. After Qemu gets the notification, it will record the CPER and inject the SEA/SEI to KVM. this series of patches will generate APEI table when guest OS boot up, and dynamically record CPER for the guest OS about the generic hardware errors, currently the userspace only handle the memory section hardware errors. Before Qemu record the CPER, it needs to check the ACK value written by the guest OS to avoid read-write race condition. Below is the APEI/GHESV2/CPER table layout, the max number of error soure is 11, which is classified by notification type, now only enable the SEA/SEI notification type error source. etc/acpi/tables etc/hardware_errors == + +--++--+ | | HEST ||address | +--+ | +--+|registers | | Error Status | | | GHES0|| ++ | Data Block 0 | | +--+ +->| |status_address0 |->| ++ | | .| | | ++ | | CPER | | | error_status_address-+-+ +--->| |status_address1 |--+ | | CPER | | | .| || ++ | | | | | | read_ack_register+-+ || . | | | | CPER | | | read_ack_preserve| | |+--+ | | +-++ | | read_ack_write | | | +->| |status_address10|+ | | Error Status | + +--+ | | | | ++| | | Data Block 1 | | | GHES1| +-+-+->| |ack_address0|--+ | +-->| ++ + +--+ | | | ++ | | | | CPER | | | .| | | +--->| |ack_address1|--+-+ | | | CPER | | | error_status_address-+---+ | || ++ | | | | | | | | .| | || | . | | | | | | CPER | | | read_ack_register+-+-+| ++ | | | +-++ | | read_ack_preserve| | +->| |ack_address10 |--+-+-+ | | |.. | | | read_ack_write | | | | ++ | | | | | ++ + +--| | | | | ack0 |<-+ | | | | Error Status | | | ... | | | | ++| | | | Data Block 10| + +--+ | | | | ack1 |<---+ | +>| ++ | | GHES10 | | | | ++ | | | CPER | + +--+ | | | | | | | | CPER | | | .| | | | +--+ | | | | | | | error_status_address-+-+ | | | ack10 |< + | | CPER | | | .| | | ++ +-++ | | read_ack_register+-+ | | read_ack_preserve| | | read_ack_write | + +--+ After injecting a SEA/SEI ghes error, the gueset OS kernel log will be shown as below: [ 142.95] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 8 [ 142.913141] {1}[Hardware Error]: event severity: recoverable [ 142.914498] {1}[Hardware Error]: Error 0, type: recoverable [ 142.915851] {1}[Hardware Error]: section_type: memory error [ 142.917163] {1}[Hardware Error]: physical_address: 0x [ 142.918792] {1}[Hardware Error]: error_type: 3, multi-bit ECC how to test: 1. In the guest OS, use this command to dump the APEI table: "iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST" 2. And find the address for the generic error status block according to the notification type 3. then find the CPER record through the generic error status block. For example(notification type is SEA): (1) root@genericarmv8:~# iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST (2) root@genericarmv8:~# cat HEST.dsl /* * Intel ACPI Component Architecture * AML/ASL+ Disassembler version 20170728 (64-bit version) * Copyright (c) 2000 - 2017 Intel Corporation * * Disassembly of /sys/firmware/acpi/tables/HEST, Mon Sep 5 07:59:17 2016 * * ACPI Data Table [HEST] * * Forma
Re: [Qemu-devel] [RFC 28/29] migration: final handshake for the resume
* Peter Xu (pet...@redhat.com) wrote: > On Thu, Aug 03, 2017 at 02:47:44PM +0100, Dr. David Alan Gilbert wrote: > > [...] > > > > +static int postcopy_resume_handshake(MigrationState *s) > > > +{ > > > +qemu_mutex_lock(&s->resume_lock); > > > + > > > +qemu_savevm_send_postcopy_resume(s->to_dst_file); > > > + > > > +while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { > > > +qemu_cond_wait(&s->resume_cond, &s->resume_lock); > > > +} > > > + > > > +qemu_mutex_unlock(&s->resume_lock); > > > + > > > +if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { > > > +return 0; > > > +} > > > > That feels to be a small racy - couldn't that validly become a > > MIGRATION_STATUS_COMPLETED before that check? > > Since postcopy_resume_handshake() is called in migration_thread() > context, so it won't change to complete at this point (confirmed with > Dave offlist on the question). Yes. > > > > I wonder if we need to change migrate_fd_cancel to be able to > > cause a cancel in this case? > > Yeah that's important, but haven't considered in current series. Do > you mind to postpone it as TODO as well (along with the work to allow > the user to manually switch to PAUSED state, as Dan suggested)? Yes I don't the cancel in that case is that important; it's already in the recovery from a bad situation. Dave > -- > Peter Xu -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed
* Peter Xu (pet...@redhat.com) wrote: > On Thu, Aug 03, 2017 at 02:54:35PM +0100, Dr. David Alan Gilbert wrote: > > * Peter Xu (pet...@redhat.com) wrote: > > > Firstly, MigThrError enumeration is introduced to describe the error in > > > migration_detect_error() better. This gives the migration_thread() a > > > chance to know whether a recovery has happened. > > > > > > Then, if a recovery is detected, migration_thread() will reset its local > > > variables to prepare for that. > > > > > > Signed-off-by: Peter Xu > > > --- > > > migration/migration.c | 40 +--- > > > 1 file changed, 29 insertions(+), 11 deletions(-) > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > index ecebe30..439bc22 100644 > > > --- a/migration/migration.c > > > +++ b/migration/migration.c > > > @@ -2159,6 +2159,15 @@ static bool postcopy_should_start(MigrationState > > > *s) > > > return atomic_read(&s->start_postcopy) || s->start_postcopy_fast; > > > } > > > > > > +typedef enum MigThrError { > > > +/* No error detected */ > > > +MIG_THR_ERR_NONE = 0, > > > +/* Detected error, but resumed successfully */ > > > +MIG_THR_ERR_RECOVERED = 1, > > > +/* Detected fatal error, need to exit */ > > > +MIG_THR_ERR_FATAL = 2, > > > +} MigThrError; > > > + > > > > Could you move this patch earlier to when postcopy_pause is created > > so it's created with this enum? > > Sure. > > [...] > > > > @@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque) > > > /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */ > > > enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE; > > > bool enable_colo = migrate_colo_enabled(); > > > +MigThrError thr_error; > > > > > > rcu_register_thread(); > > > > > > @@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque) > > > * Try to detect any kind of failures, and see whether we > > > * should stop the migration now. > > > */ > > > -if (migration_detect_error(s)) { > > > +thr_error = migration_detect_error(s); > > > +if (thr_error == MIG_THR_ERR_FATAL) { > > > +/* Stop migration */ > > > break; > > > +} else if (thr_error == MIG_THR_ERR_RECOVERED) { > > > +/* > > > + * Just recovered from a e.g. network failure, reset all > > > + * the local variables. > > > + */ > > > +initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > > > +initial_bytes = 0; > > > > They don't seem that important to reset? > > The problem is that we have this in migration_thread(): > > if (current_time >= initial_time + BUFFER_DELAY) { > uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) - > initial_bytes; > uint64_t time_spent = current_time - initial_time; > double bandwidth = (double)transferred_bytes / time_spent; > threshold_size = bandwidth * s->parameters.downtime_limit; > ... > } > > Here qemu_ftell() would possibly be very small since we have just > resumed... and then transferred_bytes will be extremely huge since > "qemu_ftell(s->to_dst_file) - initial_bytes" is actually negative... > Then, with luck, we'll got extremely huge "bandwidth" as well. Ah yes that's a good reason to reset it then; add a comment like 'important to avoid breaking transferred_bytes and bandwidth calculation' Dave > -- > Peter Xu -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On Thu, Aug 3, 2017 at 11:36 PM, Paolo Bonzini wrote: > - Original Message - >> From: "Dr. David Alan Gilbert" >> To: "Alberto Garcia" >> Cc: qemu-devel@nongnu.org, pbonz...@redhat.com, js...@redhat.com >> Sent: Thursday, August 3, 2017 6:45:17 PM >> Subject: Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block >> devices >> >> * Alberto Garcia (be...@igalia.com) wrote: >> > On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) >> > wrote: >> > > --- a/vl.c >> > > +++ b/vl.c >> > > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp) >> > > replay_disable_events(); >> > > iothread_stop_all(); >> > > >> > > -bdrv_close_all(); >> > > pause_all_vcpus(); >> > > +bdrv_close_all(); >> > > res_free(); >> > >> > I haven't debugged it yet, but in my computer iotest 093 stops working >> > (it never finishes) after this change. >> >> Yes, I can reproduce that here (I've got to explicitly run 093 - it >> doesn't do it automatically for me): > > The culprit so to speak is this: > > if (qtest_enabled()) { > /* For testing block IO throttling only */ > tg->clock_type = QEMU_CLOCK_VIRTUAL; > } > > So after pause_all_vcpus(), the clock doesn't advance and bdrv_close_all > hangs. Should throttling be disabled by the time bdrv_close drains the > BlockDriverState, and likewise for bdrv_close_all? bdrv_drain_all() is called before blk_remove_all_bs(), so throttling is still enabled while draining: void bdrv_close_all(void) { block_job_cancel_sync_all(); nbd_export_close_all(); /* Drop references from requests still in flight, such as canceled block * jobs whose AIO context has not been polled yet */ bdrv_drain_all(); blk_remove_all_bs(); Perhaps we need a bdrv_throttle_disable_all() function. Manos is moving throttling to a block driver so there is no longer a 1:1 relationship between BB and throttling - there might be multiple throtting nodes in a BDS graph. Stefan
Re: [Qemu-devel] [PATCH v3 1/9] kvm: remove hard dependency on pci
On Wed, 26 Jul 2017 10:26:23 +0200 Cornelia Huck wrote: > On Wed, 26 Jul 2017 08:52:44 +0200 > Thomas Huth wrote: > > Since this missed the 2.10 freeze anyway and we've got some more time to > > discuss it: I still think it would be much cleaner to move the functions > > from kvm-all.c that use these PCI functions into a separate file > > kvm-pci.c instead and compile that only with CONFIG_PCI=y. That way we > > likely also could prevent that more dependencies on CONFIG_PCI creep > > into kvm-all.c again at later points in time. > > > > I've now had a closer look, and it seems like the only affected > > functions are kvm_irqchip_add_msi_route() and > > kvm_irqchip_update_msi_route() ... and these seems only to be used in > > code we would not care about on s390x with CONFIG_PCI=n. So could you > > please have at least a try to move these two functions (and other > > related msi functions) to a new file kvm-pci.c instead to see whether > > that would work, too? > > I can try, as this missed the 2.10 boat anyway. > > The code contains some parts that are not relevant in all cases (msi > routes if we don't use pci, adapter routes for anything not > s390x, ...), but I don't think trying to rip this out would be much of > an improvement. I'll try the minimal (well, not quite as minimal as > this patch) route instead. I tried this; but it turned out to be one of those cases where you pull on a stick and then find half a tree attached to it... Much of the code is used by all routing variants, and I ended up exposing so many functions that it was just ugly and unreadable. The variant I'm now going with is keeping the original approach and adding a pci_available variable, which I also use in the patch that exposes the zpci facility patch only when pci is compiled in. I'll try to post this after lunch.
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On Mon, Jul 17, 2017 at 5:43 PM, John Snow wrote: > On 07/17/2017 06:26 AM, Dr. David Alan Gilbert wrote: >> * Stefan Hajnoczi (stefa...@gmail.com) wrote: >>> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) >>> wrote: From: "Dr. David Alan Gilbert" There's a rare exit seg if the guest is accessing IO during exit. It's always hitting the atomic_inc(&bs->in_flight) with a NULL bs. This was added recently in 99723548 but I don't see it as the cause. Flip vl.c around so we pause the cpus before closing the block devices, that way we shouldn't have anything trying to access them when they're gone. This was originally Red Hat bz https://bugzilla.redhat.com/show_bug.cgi?id=1451015 Signed-off-by: Dr. David Alan Gilbert Reported-by: Cong Li -- This is a very rare race, I'll leave it running in a loop to see if we hit anything else and to check this really fixes it. I do worry if there are other cases that can trigger this - e.g. hot-unplug or ejecting a CD. --- vl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> Reviewed-by: Stefan Hajnoczi >> >> Thanks; and the test I left running seems solid - ~12k runs >> over the weekend with no seg. >> >> Dave >> >> -- >> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >> > > the root cause of this bug is related to this as well: > https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html > > From commit 99723548 we started assuming (incorrectly?) that blk_ > functions always WILL have an attached BDS, but this is not always true, > for instance, flushing the cache from an empty CDROM. > > Paolo, can we move the flight counter increment outside of the > block-backend layer, is that safe? I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed regardless of the throttling timer issue discussed below. BB cannot assume that the BDS graph is non-empty. Stefan
Re: [Qemu-devel] Prohibit Windows from running in QEMU
Am 29.10.2013 um 10:59 schrieb Paolo Bonzini: > Il 29/10/2013 10:48, Peter Lieven ha scritto: >> Hi all, >> >> this question might seem a bit weird, but does anyone see a good way to >> avoid >> that Windows is able to boot inside qemu? >> >> We have defined several profiles for different operation systems and I want >> to avoid that someone chooses Linux and then installs Windows within >> a VM. Reason is licensing. > Patch QEMU to crash when Hyper-V extensions are enabled... Hi all, this is an old topic that has become important for me again recently. Now all Linux versions should be able to detect KVM even if Hyper-V is enabled. But how do I detect from Qemu userspace that Hyper-V is enabled? Thanks, Peter
[Qemu-devel] [PATCH v10 3/3] ACPI: build and enable APEI GHES in the Makefile and configuration
Add CONFIG_ACPI_APEI configuration in the Makefile and enable it in the arm-softmmu.mak Signed-off-by: Dongjiu Geng --- thanks a lot Michael and Laszlo's review and comments: change since v5: (1) no change change since v4: (1) fix email threading in this series is incorrect issue change since v3: (1) change name to "CONFIG_ACPI_APEI" from CONFIG_ACPI_APEI_GENERATION --- default-configs/arm-softmmu.mak | 1 + hw/acpi/Makefile.objs | 1 + 2 files changed, 2 insertions(+) diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index 1e3bd2b..ee6f5fc 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -121,3 +121,4 @@ CONFIG_ACPI=y CONFIG_SMBIOS=y CONFIG_ASPEED_SOC=y CONFIG_GPIO_KEY=y +CONFIG_ACPI_APEI=y diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs index 11c35bc..bafb148 100644 --- a/hw/acpi/Makefile.objs +++ b/hw/acpi/Makefile.objs @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o +common-obj-$(CONFIG_ACPI_APEI) += hest_ghes.o common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o common-obj-y += acpi_interface.o -- 1.8.3.1
[Qemu-devel] [PATCH v10 2/3] ACPI: Add APEI GHES Table Generation support
This implements APEI GHES Table by passing the error CPER info to the guest via a fw_cfg_blob. After a CPER info is recorded, an SEA(Synchronous External Abort)/SEI(SError Interrupt) exception will be injected into the guest OS. Below is the table layout, the max number of error soure is 11, which is classified by notification type. etc/acpi/tables etc/hardware_errors == + +--++--+ | | HEST ||address | +--+ | +--+|registers | | Error Status | | | GHES0|| ++ | Data Block 0 | | +--+ +->| |status_address0 |->| ++ | | .| | | ++ | | CPER | | | error_status_address-+-+ +--->| |status_address1 |--+ | | CPER | | | .| || ++ | | | | | | read_ack_register+-+ || . | | | | CPER | | | read_ack_preserve| | |+--+ | | ++ | | read_ack_write | | | +->| |status_address10|+ | | Error Status | + +--+ | | | | ++| | | Data Block 1 | | | GHES1| +-+-+->| | ack_value0 || +-->| ++ + +--+ | | | ++| | | CPER | | | .| | | +--->| | ack_value1 || | | CPER | | | error_status_address-+---+ | || ++| | | | | | .| | || | . || | | CPER | | | read_ack_register+-+-+| ++| +-++ | | read_ack_preserve| | +->| | ack_value10|| | |.. | | | read_ack_write | | | | ++| | ++ + +--| | | | | Error Status | | | ... | | | | | Data Block 10| + +--+ | | +>| ++ | | GHES10 | | || | CPER | + +--+ | || | CPER | | | .| | || | | | | error_status_address-+-+ || | CPER | | | .| | +-++ | | read_ack_register+-+ | | read_ack_preserve| | | read_ack_write | + +--+ For GHESv2 error source, the OSPM must acknowledges the error via Read Ack register. so user space must check the ack value to avoid read-write race condition. Signed-off-by: Dongjiu Geng --- thanks a lot Michael and Laszlo's review and comments: change since v8: (1) remove the ACK value address change since v6: (1) update the commit message change since v5: (1) move to GHESv2 error source from GHESv1, because we maily use the GHESv2 (2) add the GHESv2 error source layout in the comments (2) add the logic to read the ack value from OSPM to avoid read-write race condition (3) including header files "aml-build.h" and "hest_ghes.h" in this patch to make it reasonable (4) calculate the fw_cfg blob offsets that should be patched in more fine-grained steps, with multiple separate increments, using: - structure type names, - sizeof operators, - offsetof macros, - and possibly a separate comment for each offset increment. (5) remove page boundary refers to 4096 in the "bios_linker_loader_alloc" (6) using build_append_int_noprefix to avoid use pointer math, so that the code is easily readable (7) change memory section definition to array from QemuUUID (8) only enable SEA/SEI notification hardware error source to avoid guest OS ACPI driver probe warning change since v4: 1. fix email threading in this series is incorrect issue change since v3: 1. remove the unnecessary include for "hw/acpi/vmgenid.h" in hw/arm/virt-acpi-build.c 2. add conversion between LE and host-endian for the CPER record 3. handle the case that run out of the preallocated memory for the CPER record 4. change to use g_malloc0 instead of g_malloc 5. change block_reqr_size name to block_rer_size 6. change QEMU coding style, that is
[Qemu-devel] [PATCH v10 0/3] Generate APEI GHES table and dynamically record CPER
In the armv8 platform, the mainly hardware error source are ARMv8 SEA/SEI/GSIV. For the ARMv8 SEA/SEI, the KVM or host kernel will signal SIGBUS or use other interface to notify user space, such as Qemu. After Qemu gets the notification, it will record the CPER and inject the SEA/SEI to KVM. this series of patches will generate APEI table when guest OS boot up, and dynamically record CPER for the guest OS about the generic hardware errors, currently the userspace only handle the memory section hardware errors. Before Qemu record the CPER, it needs to check the ACK value written by the guest OS to avoid read-write race condition. Below is the APEI/GHESV2/CPER table layout, the max number of error soure is 11, which is classified by notification type, now only enable the SEA/SEI notification type error source. etc/acpi/tables etc/hardware_errors == + +--++--+ | | HEST ||address | +--+ | +--+|registers | | Error Status | | | GHES0|| ++ | Data Block 0 | | +--+ +->| |status_address0 |->| ++ | | .| | | ++ | | CPER | | | error_status_address-+-+ +--->| |status_address1 |--+ | | CPER | | | .| || ++ | | | | | | read_ack_register+-+ || . | | | | CPER | | | read_ack_preserve| | |+--+ | | ++ | | read_ack_write | | | +->| |status_address10|+ | | Error Status | + +--+ | | | | ++| | | Data Block 1 | | | GHES1| +-+-+->| | ack_value0 || +-->| ++ + +--+ | | | ++| | | CPER | | | .| | | +--->| | ack_value1 || | | CPER | | | error_status_address-+---+ | || ++| | | | | | .| | || | . || | | CPER | | | read_ack_register+-+-+| ++| +-++ | | read_ack_preserve| | +->| | ack_value10|| | |.. | | | read_ack_write | | | | ++| | ++ + +--| | | | | Error Status | | | ... | | | | | Data Block 10| + +--+ | | +>| ++ | | GHES10 | | || | CPER | + +--+ | || | CPER | | | .| | || | | | | error_status_address-+-+ || | CPER | | | .| | +-++ | | read_ack_register+-+ | | read_ack_preserve| | | read_ack_write | + +--+ After injecting a SEA/SEI ghes error, the gueset OS kernel log will be shown as below: [ 142.95] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 8 [ 142.913141] {1}[Hardware Error]: event severity: recoverable [ 142.914498] {1}[Hardware Error]: Error 0, type: recoverable [ 142.915851] {1}[Hardware Error]: section_type: memory error [ 142.917163] {1}[Hardware Error]: physical_address: 0x [ 142.918792] {1}[Hardware Error]: error_type: 3, multi-bit ECC how to test: 1. In the guest OS, use this command to dump the APEI table: "iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST" 2. And find the address for the generic error status block according to the notification type 3. then find the CPER record through the generic error status block. For example(notification type is SEA): (1) root@genericarmv8:~# iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST (2) root@genericarmv8:~# cat HEST.dsl /* * Intel ACPI Component Architecture * AML/ASL+ Disassembler version 20170728 (64-bit version) * Copyright (c) 2000 - 2017 Intel Corporation * * Disassembly of /sys/firmware/acpi/tables/HEST, Mon Sep 5 07:59:17 2016 * * ACPI Data Table [HEST] * * F
[Qemu-devel] [PATCH v10 1/3] ACPI: add APEI/HEST/CPER structures and macros
(1) Add related APEI/HEST table structures and macros, these definition refer to ACPI 6.1 and UEFI 2.6 spec. (2) Add generic error status block and CPER memory section definition, user space only handle memory section errors. Signed-off-by: Dongjiu Geng --- thanks Michael and Laszlo's review: change since v5: (1) update the commit message title (2) remove UUID_BE and UEFI_CPER_SEC_PLATFORM_MEM macros (3) remove including "qemu/uuid.h" file in this patch (4) remove including "hest_ghes.h" file in this patch (5) drop "this is" from the comment of structures and macros (6) replace the "QemuUUID section_type_le" to "uint8_t section_type_le[16]" chnage since v4: (1) fix email threading in this series is incorrect issue change since v3: (1) separate the original one patch into three patches: one is new ACPI structures and macros, another is C source file to generate ACPI HEST table and dynamically record CPER ,final patch is the change about Makefile and configuration (2) add comments about where the ACPI structures and macros come from, for example, they come from the UEFI Spec 2.6, ""; ACPI 6.1 spec, "xx". (3) correct the macros name, for emaple, prefix some macro names with "UEFI_". (4) remove the uuid_le struct and use the QemuUUID in the include/qemu/uuid.h" (5) remove the duplicate ACPI address space, because it already defined in the "include/hw/acpi/aml-build.h" (6) remove the acpi_generic_address structure because same definition exists in the AcpiGenericAddress. (7) rename the struct acpi_hest_notify to AcpiHestNotifyType (8) rename the struct acpi_hest_types to AcpiHestSourceType (9) rename enum constants AcpiHestSourceType to ACPI_HEST_SOURCE_xxx from ACPI_HEST_TYPE_xxx (10) remove the NOT_USED{3,4,5} enum constants in the AcpiHestSourceType. (11) add missed QEMU_PACKED for the struct definition. (12) remove the defnition of AcpiGenericErrorData, and rename the AcpiGenericErrorDataV300 to AcpiGenericErrorData. (13) use the QemuUUID type for the "section_type" field AcpiGenericErrorData, and rename it to section_type_le. (14) moving type AcpiGenericErrorSeverity above AcpiGenericErrorData and AcpiGenericErrorDataV300, and remarking on the "error_severity" fields that they take their values from AcpiGenericErrorSeverity (15) remove the wrongly call to BERT(Boot Error Record Table) (16) add comments for the struction member, for example, pint out that the block_status member in the AcpiGenericErrorStatus is a bitmask composed of ACPI_GEBS_xxx macros (17) remove the hardware error source notification type list, and rename the MAX_ERROR_SOURCE_COUNT_V6 to ACPI_HEST_NOTIFY_RESERVED. (18) remove the physical_addr member of GhesErrorState (19) change the "uint64_t ghes_addr_le[8]" in GhesErrorState to uint64_t ghes_addr_le (20) change the second parameter to "error_physical_addr" in the ghes_update_guest API statement --- include/hw/acpi/acpi-defs.h | 193 1 file changed, 193 insertions(+) diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h index 4cc3630..ff9525e 100644 --- a/include/hw/acpi/acpi-defs.h +++ b/include/hw/acpi/acpi-defs.h @@ -295,6 +295,44 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable; #define ACPI_APIC_GENERIC_TRANSLATOR15 #define ACPI_APIC_RESERVED 16 /* 16 and greater are reserved */ +/* UEFI Spec 2.6, "N.2.5 Memory Error Section */ +#define UEFI_CPER_MEM_VALID_ERROR_STATUS 0x0001 +#define UEFI_CPER_MEM_VALID_PA 0x0002 +#define UEFI_CPER_MEM_VALID_PA_MASK 0x0004 +#define UEFI_CPER_MEM_VALID_NODE 0x0008 +#define UEFI_CPER_MEM_VALID_CARD 0x0010 +#define UEFI_CPER_MEM_VALID_MODULE 0x0020 +#define UEFI_CPER_MEM_VALID_BANK 0x0040 +#define UEFI_CPER_MEM_VALID_DEVICE 0x0080 +#define UEFI_CPER_MEM_VALID_ROW 0x0100 +#define UEFI_CPER_MEM_VALID_COLUMN 0x0200 +#define UEFI_CPER_MEM_VALID_BIT_POSITION 0x0400 +#define UEFI_CPER_MEM_VALID_REQUESTOR0x0800 +#define UEFI_CPER_MEM_VALID_RESPONDER0x1000 +#define UEFI_CPER_MEM_VALID_TARGET 0x2000 +#define UEFI_CPER_MEM_VALID_ERROR_TYPE 0x4000 +#define UEFI_CPER_MEM_VALID_RANK_NUMBER 0x8000 +#define UEFI_CPER_MEM_VALID_CARD_HANDLE 0x1 +#define UEFI_CPER_MEM_VALID_MODULE_HANDLE0x2 +#define UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC 3 + +/* From the ACPI 6.1 spec, "18.3.2.9 Hardware Error Notification" */ + +enum AcpiHestNotifyType { +ACPI_HEST_NOTIFY_POLLED = 0, +ACPI_HEST_NOTIFY_EXTERNAL = 1, +ACPI_HEST_NOTIFY_LOCAL = 2, +ACPI_HEST_NOTIFY_SCI = 3, +ACPI_HEST_NOTIFY_NMI = 4, +ACPI_HEST_NOTIFY_CMCI = 5, /* ACPI 5.0 */ +ACPI_HEST_NOTIFY_MCE = 6, /* ACPI 5.0 */ +ACPI_HEST_NOTIFY_GPIO = 7, /* ACPI 6.0 */ +ACPI_HEST_NOTIFY_SEA = 8, /* ACPI 6.1 */
Re: [Qemu-devel] Prohibit Windows from running in QEMU
On 04/08/2017 11:58, Peter Lieven wrote: > Am 29.10.2013 um 10:59 schrieb Paolo Bonzini: >> Il 29/10/2013 10:48, Peter Lieven ha scritto: >>> Hi all, >>> >>> this question might seem a bit weird, but does anyone see a good way to >>> avoid >>> that Windows is able to boot inside qemu? >>> >>> We have defined several profiles for different operation systems and I want >>> to avoid that someone chooses Linux and then installs Windows within >>> a VM. Reason is licensing. >> Patch QEMU to crash when Hyper-V extensions are enabled... > > Hi all, > > this is an old topic that has become important for me again recently. > Now all Linux versions should be able to detect KVM even if Hyper-V is > enabled. > > But how do I detect from Qemu userspace that Hyper-V is enabled? Maybe a better one: make KVM crash the guest if CR8 is nonzero on a vmexit. Linux doesn't use it, Windows should not survive long. Warning, I don't know if UEFI firmware uses CR8. Paolo
Re: [Qemu-devel] [PATCH for-2.10 2/5] block: Allow reopen rw without BDRV_O_ALLOW_RDWR
Am 04.08.2017 um 03:49 hat Eric Blake geschrieben: > On 08/03/2017 10:21 AM, Eric Blake wrote: > > On 08/03/2017 10:02 AM, Kevin Wolf wrote: > >> BDRV_O_ALLOW_RDWR is a flag that tells whether qemu can internally > >> reopen a node read-write temporarily because the user requested > >> read-write for the top-level image, but qemu decided that read-only is > >> enough for this node (a backing file). > >> > >> bdrv_reopen() is different, it is also used for cases where the user > >> changed their mind and wants to update the options. There is no reason > >> to forbid making a node read-write in that case. > > > > Hmm, I wonder. https://bugzilla.redhat.com/show_bug.cgi?id=1465320 > > details a failure when starting qemu with a read-write NBD disk, then > > taking several snapshots (nbd <- snap1 <- snap2 <- snap3), then where > > intermediate commit (snap2 into nbd) works but live commit (snap3 into > > nbd) fails with a message that nbd does not support reopening. I'm > > presuming that your series may help to address that; I'll give it a spin > > and see what happens. > > Nope, even with your patches, I'm still getting: > > {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar2'}} > {"return": {}} > {"timestamp": {"seconds": 1501811285, "microseconds": 439748}, "event": > "BLOCK_JOB_COMPLETED", "data": {"device": "drive-image1", "len": > 2097152, "offset": 2097152, "speed": 0, "type": "commit"}} > > {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar3'}} > {"error": {"class": "GenericError", "desc": "Block format 'nbd' used by > node '#block048' does not support reopening files"}} That's simply NBD not implementing .bdrv_reopen_*. In other words, not a bug, but just a missing feature. Kevin signature.asc Description: PGP signature
[Qemu-devel] [PATCH v2 5/6] hw/block: Use errp directly rather than local_err
Pass the error message to errp directly rather than the local variable local_err and propagate it to errp via error_propagate(). Cc: John Snow Cc: Kevin Wolf Cc: Max Reitz Cc: Keith Busch Cc: Stefan Hajnoczi Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Gerd Hoffmann Cc: Markus Armbruster Signed-off-by: Mao Zhongyi --- hw/block/fdc.c| 17 ++--- hw/block/nvme.c | 8 +++- hw/block/virtio-blk.c | 17 + hw/ide/qdev.c | 12 hw/scsi/scsi-disk.c | 13 - hw/usb/dev-storage.c | 9 +++-- 6 files changed, 25 insertions(+), 51 deletions(-) diff --git a/hw/block/fdc.c b/hw/block/fdc.c index 02f4a46..192a4aa 100644 --- a/hw/block/fdc.c +++ b/hw/block/fdc.c @@ -473,16 +473,13 @@ static void fd_revalidate(FDrive *drv) static void fd_change_cb(void *opaque, bool load, Error **errp) { FDrive *drive = opaque; -Error *local_err = NULL; if (!load) { blk_set_perm(drive->blk, 0, BLK_PERM_ALL, &error_abort); } else { -blkconf_apply_backend_options(drive->conf, - blk_is_read_only(drive->blk), false, - &local_err); -if (local_err) { -error_propagate(errp, local_err); +if (!blkconf_apply_backend_options(drive->conf, + blk_is_read_only(drive->blk), + false, errp)) { return; } } @@ -522,7 +519,6 @@ static void floppy_drive_realize(DeviceState *qdev, Error **errp) FloppyDrive *dev = FLOPPY_DRIVE(qdev); FloppyBus *bus = FLOPPY_BUS(qdev->parent_bus); FDrive *drive; -Error *local_err = NULL; int ret; if (dev->unit == -1) { @@ -568,10 +564,9 @@ static void floppy_drive_realize(DeviceState *qdev, Error **errp) dev->conf.rerror = BLOCKDEV_ON_ERROR_AUTO; dev->conf.werror = BLOCKDEV_ON_ERROR_AUTO; -blkconf_apply_backend_options(&dev->conf, blk_is_read_only(dev->conf.blk), - false, &local_err); -if (local_err) { -error_propagate(errp, local_err); +if (!blkconf_apply_backend_options(&dev->conf, + blk_is_read_only(dev->conf.blk), + false, errp)) { return; } diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 43c60ab..1856053 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -928,7 +928,6 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) int i; int64_t bs_size; uint8_t *pci_conf; -Error *local_err = NULL; if (!n->conf.blk) { error_setg(errp, "drive property not set"); @@ -947,10 +946,9 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) return; } blkconf_blocksizes(&n->conf); -blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk), - false, &local_err); -if (local_err) { -error_propagate(errp, local_err); +if (!blkconf_apply_backend_options(&n->conf, + blk_is_read_only(n->conf.blk), + false, errp)) { return; } diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index b750bd8..2940337 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -911,7 +911,6 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIOBlock *s = VIRTIO_BLK(dev); VirtIOBlkConf *conf = &s->conf; -Error *err = NULL; unsigned i; if (!conf->conf.blk) { @@ -928,17 +927,13 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) } blkconf_serial(&conf->conf, &conf->serial); -blkconf_apply_backend_options(&conf->conf, - blk_is_read_only(conf->conf.blk), true, - &err); -if (err) { -error_propagate(errp, err); +if (!blkconf_apply_backend_options(&conf->conf, + blk_is_read_only(conf->conf.blk), + true, errp)) { return; } s->original_wce = blk_enable_write_cache(conf->conf.blk); -blkconf_geometry(&conf->conf, NULL, 65535, 255, 255, &err); -if (err) { -error_propagate(errp, err); +if (!blkconf_geometry(&conf->conf, NULL, 65535, 255, 255, errp)) { return; } blkconf_blocksizes(&conf->conf); @@ -953,9 +948,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) for (i = 0; i < conf->num_queues; i++) { virtio_add_queue(vdev, 128, virtio_blk_handle_output); } -virtio_blk_data_plane_create(vdev, conf, &s->dataplane, &err); -if (err != NULL) { -error_propagate(errp, err); +if (!virtio_blk_data_plane_create(vde
Re: [Qemu-devel] Prohibit Windows from running in QEMU
Am 04.08.2017 um 12:23 schrieb Paolo Bonzini: > On 04/08/2017 11:58, Peter Lieven wrote: >> Am 29.10.2013 um 10:59 schrieb Paolo Bonzini: >>> Il 29/10/2013 10:48, Peter Lieven ha scritto: Hi all, this question might seem a bit weird, but does anyone see a good way to avoid that Windows is able to boot inside qemu? We have defined several profiles for different operation systems and I want to avoid that someone chooses Linux and then installs Windows within a VM. Reason is licensing. >>> Patch QEMU to crash when Hyper-V extensions are enabled... >> Hi all, >> >> this is an old topic that has become important for me again recently. >> Now all Linux versions should be able to detect KVM even if Hyper-V is >> enabled. >> >> But how do I detect from Qemu userspace that Hyper-V is enabled? > Maybe a better one: make KVM crash the guest if CR8 is nonzero on a > vmexit. Linux doesn't use it, Windows should not survive long. You mean the kvm kernel module? Or can I access this register also from Qemu on any call that is handled in userspace? It would be easier to have a cmdline option to Qemu than an option to a kernel module. > Warning, I don't know if UEFI firmware uses CR8. UEFI firmware is not important in this case. Do you know if FreeBSD, OpenBSD or NetBSD use it? Thank for your ideas, Peter
[Qemu-devel] [PATCH v2 4/6] hw/block: Fix the return type
When the function no success value to transmit, it usually make the function return void. It has turned out not to be a success, because it means that the extra local_err variable and error_propagate() will be needed. It leads to cumbersome code, therefore, transmit success/ failure in the return value is worth. So fix the return type of blkconf_apply_backend_options(), blkconf_geometry() and virtio_blk_data_plane_create() to avoid it. Cc: John Snow Cc: Kevin Wolf Cc: Max Reitz Cc: Stefan Hajnoczi Signed-off-by: Mao Zhongyi --- hw/block/block.c| 15 +-- hw/block/dataplane/virtio-blk.c | 12 +++- hw/block/dataplane/virtio-blk.h | 2 +- include/hw/block/block.h| 4 ++-- 4 files changed, 19 insertions(+), 14 deletions(-) diff --git a/hw/block/block.c b/hw/block/block.c index 27878d0..b0269c8 100644 --- a/hw/block/block.c +++ b/hw/block/block.c @@ -51,7 +51,7 @@ void blkconf_blocksizes(BlockConf *conf) } } -void blkconf_apply_backend_options(BlockConf *conf, bool readonly, +bool blkconf_apply_backend_options(BlockConf *conf, bool readonly, bool resizable, Error **errp) { BlockBackend *blk = conf->blk; @@ -76,7 +76,7 @@ void blkconf_apply_backend_options(BlockConf *conf, bool readonly, ret = blk_set_perm(blk, perm, shared_perm, errp); if (ret < 0) { -return; +return false; } switch (conf->wce) { @@ -99,9 +99,11 @@ void blkconf_apply_backend_options(BlockConf *conf, bool readonly, blk_set_enable_write_cache(blk, wce); blk_set_on_error(blk, rerror, werror); + +return true; } -void blkconf_geometry(BlockConf *conf, int *ptrans, +bool blkconf_geometry(BlockConf *conf, int *ptrans, unsigned cyls_max, unsigned heads_max, unsigned secs_max, Error **errp) { @@ -129,15 +131,16 @@ void blkconf_geometry(BlockConf *conf, int *ptrans, if (conf->cyls || conf->heads || conf->secs) { if (conf->cyls < 1 || conf->cyls > cyls_max) { error_setg(errp, "cyls must be between 1 and %u", cyls_max); -return; +return false; } if (conf->heads < 1 || conf->heads > heads_max) { error_setg(errp, "heads must be between 1 and %u", heads_max); -return; +return false; } if (conf->secs < 1 || conf->secs > secs_max) { error_setg(errp, "secs must be between 1 and %u", secs_max); -return; +return false; } } +return true; } diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 5556f0e..f6fc639 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -76,7 +76,7 @@ static void notify_guest_bh(void *opaque) } /* Context: QEMU global mutex held */ -void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf, +bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf, VirtIOBlockDataPlane **dataplane, Error **errp) { @@ -91,11 +91,11 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf, error_setg(errp, "device is incompatible with iothread " "(transport does not support notifiers)"); -return; +return false; } if (!virtio_device_ioeventfd_enabled(vdev)) { error_setg(errp, "ioeventfd is required for iothread"); -return; +return false; } /* If dataplane is (re-)enabled while the guest is running there could @@ -103,12 +103,12 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf, */ if (blk_op_is_blocked(conf->conf.blk, BLOCK_OP_TYPE_DATAPLANE, errp)) { error_prepend(errp, "cannot start virtio-blk dataplane: "); -return; +return false; } } /* Don't try if transport does not support notifiers. */ if (!virtio_device_ioeventfd_enabled(vdev)) { -return; +return false; } s = g_new0(VirtIOBlockDataPlane, 1); @@ -126,6 +126,8 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf, s->batch_notify_vqs = bitmap_new(conf->num_queues); *dataplane = s; + +return true; } /* Context: QEMU global mutex held */ diff --git a/hw/block/dataplane/virtio-blk.h b/hw/block/dataplane/virtio-blk.h index db3f47b..5e18bb9 100644 --- a/hw/block/dataplane/virtio-blk.h +++ b/hw/block/dataplane/virtio-blk.h @@ -19,7 +19,7 @@ typedef struct VirtIOBlockDataPlane VirtIOBlockDataPlane; -void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf, +bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf, VirtIOBlockData
[Qemu-devel] [PATCH v2 2/6] hw/block/fdc: Convert to realize
Convert floppy_drive_init() to realize and rename it to floppy_drive_realize(). Cc: John Snow Cc: Kevin Wolf Cc: Max Reitz Cc: Markus Armbruster Signed-off-by: Mao Zhongyi --- hw/block/fdc.c | 33 - 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/hw/block/fdc.c b/hw/block/fdc.c index 4011290..02f4a46 100644 --- a/hw/block/fdc.c +++ b/hw/block/fdc.c @@ -517,7 +517,7 @@ static Property floppy_drive_properties[] = { DEFINE_PROP_END_OF_LIST(), }; -static int floppy_drive_init(DeviceState *qdev) +static void floppy_drive_realize(DeviceState *qdev, Error **errp) { FloppyDrive *dev = FLOPPY_DRIVE(qdev); FloppyBus *bus = FLOPPY_BUS(qdev->parent_bus); @@ -535,15 +535,15 @@ static int floppy_drive_init(DeviceState *qdev) } if (dev->unit >= MAX_FD) { -error_report("Can't create floppy unit %d, bus supports only %d units", - dev->unit, MAX_FD); -return -1; +error_setg(errp, "Can't create floppy unit %d, bus supports " + "only %d units", dev->unit, MAX_FD); +return; } drive = get_drv(bus->fdc, dev->unit); if (drive->blk) { -error_report("Floppy unit %d is in use", dev->unit); -return -1; +error_setg(errp, "Floppy unit %d is in use", dev->unit); +return; } if (!dev->conf.blk) { @@ -557,8 +557,9 @@ static int floppy_drive_init(DeviceState *qdev) if (dev->conf.logical_block_size != 512 || dev->conf.physical_block_size != 512) { -error_report("Physical and logical block size must be 512 for floppy"); -return -1; +error_setg(errp, "Physical and logical block size must " + "be 512 for floppy"); +return; } /* rerror/werror aren't supported by fdc and therefore not even registered @@ -570,20 +571,20 @@ static int floppy_drive_init(DeviceState *qdev) blkconf_apply_backend_options(&dev->conf, blk_is_read_only(dev->conf.blk), false, &local_err); if (local_err) { -error_report_err(local_err); -return -1; +error_propagate(errp, local_err); +return; } /* 'enospc' is the default for -drive, 'report' is what blk_new() gives us * for empty drives. */ if (blk_get_on_error(dev->conf.blk, 0) != BLOCKDEV_ON_ERROR_ENOSPC && blk_get_on_error(dev->conf.blk, 0) != BLOCKDEV_ON_ERROR_REPORT) { -error_report("fdc doesn't support drive option werror"); -return -1; +error_setg(errp, "fdc doesn't support drive option werror"); +return; } if (blk_get_on_error(dev->conf.blk, 1) != BLOCKDEV_ON_ERROR_REPORT) { -error_report("fdc doesn't support drive option rerror"); -return -1; +error_setg(errp, "fdc doesn't support drive option rerror"); +return; } drive->conf = &dev->conf; @@ -599,14 +600,12 @@ static int floppy_drive_init(DeviceState *qdev) dev->type = drive->drive; fd_revalidate(drive); - -return 0; } static void floppy_drive_class_init(ObjectClass *klass, void *data) { DeviceClass *k = DEVICE_CLASS(klass); -k->init = floppy_drive_init; +k->realize = floppy_drive_realize; set_bit(DEVICE_CATEGORY_STORAGE, k->categories); k->bus_type = TYPE_FLOPPY_BUS; k->props = floppy_drive_properties; -- 2.9.4
[Qemu-devel] [PATCH v2 1/6] hw/ide: Convert DeviceClass init to realize
Replace init with realize in IDEDeviceClass, which has errp as a parameter. So all the implementations now use error_setg instead of error_report for reporting error. Cc: John Snow Cc: Markus Armbruster Signed-off-by: Mao Zhongyi --- hw/ide/core.c | 7 ++-- hw/ide/qdev.c | 86 +++ include/hw/ide/internal.h | 5 +-- 3 files changed, 49 insertions(+), 49 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 0b48b64..7c86fc7 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -34,6 +34,7 @@ #include "hw/block/block.h" #include "sysemu/block-backend.h" #include "qemu/cutils.h" +#include "qemu/error-report.h" #include "hw/ide/internal.h" @@ -2398,7 +2399,7 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, IDEDriveKind kind, const char *version, const char *serial, const char *model, uint64_t wwn, uint32_t cylinders, uint32_t heads, uint32_t secs, - int chs_trans) + int chs_trans, Error **errp) { uint64_t nb_sectors; @@ -2423,11 +2424,11 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, IDEDriveKind kind, blk_set_guest_block_size(blk, 2048); } else { if (!blk_is_inserted(s->blk)) { -error_report("Device needs media, but drive is empty"); +error_setg(errp, "Device needs media, but drive is empty"); return -1; } if (blk_is_read_only(blk)) { -error_report("Can't use a read-only drive"); +error_setg(errp, "Can't use a read-only drive"); return -1; } blk_set_dev_ops(blk, &ide_hd_block_ops, s); diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c index cc2f5bd..d60ac25 100644 --- a/hw/ide/qdev.c +++ b/hw/ide/qdev.c @@ -80,7 +80,7 @@ static char *idebus_get_fw_dev_path(DeviceState *dev) return g_strdup(path); } -static int ide_qdev_init(DeviceState *qdev) +static void ide_qdev_realize(DeviceState *qdev, Error **errp) { IDEDevice *dev = IDE_DEVICE(qdev); IDEDeviceClass *dc = IDE_DEVICE_GET_CLASS(dev); @@ -91,34 +91,31 @@ static int ide_qdev_init(DeviceState *qdev) } if (dev->unit >= bus->max_units) { -error_report("Can't create IDE unit %d, bus supports only %d units", +error_setg(errp, "Can't create IDE unit %d, bus supports only %d units", dev->unit, bus->max_units); -goto err; +return; } switch (dev->unit) { case 0: if (bus->master) { -error_report("IDE unit %d is in use", dev->unit); -goto err; +error_setg(errp, "IDE unit %d is in use", dev->unit); +return; } bus->master = dev; break; case 1: if (bus->slave) { -error_report("IDE unit %d is in use", dev->unit); -goto err; +error_setg(errp, "IDE unit %d is in use", dev->unit); +return; } bus->slave = dev; break; default: -error_report("Invalid IDE unit %d", dev->unit); -goto err; +error_setg(errp, "Invalid IDE unit %d", dev->unit); +return; } -return dc->init(dev); - -err: -return -1; +dc->realize(dev, errp); } IDEDevice *ide_create_drive(IDEBus *bus, int unit, DriveInfo *drive) @@ -159,7 +156,7 @@ typedef struct IDEDrive { IDEDevice dev; } IDEDrive; -static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind) +static void ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind, Error **errp) { IDEBus *bus = DO_UPCAST(IDEBus, qbus, dev->qdev.parent_bus); IDEState *s = bus->ifs + dev->unit; @@ -168,8 +165,8 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind) if (!dev->conf.blk) { if (kind != IDE_CD) { -error_report("No drive specified"); -return -1; +error_setg(errp, "No drive specified"); +return; } else { /* Anonymous BlockBackend for an empty drive */ dev->conf.blk = blk_new(0, BLK_PERM_ALL); @@ -182,36 +179,36 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind) dev->conf.discard_granularity = 512; } else if (dev->conf.discard_granularity && dev->conf.discard_granularity != 512) { -error_report("discard_granularity must be 512 for ide"); -return -1; +error_setg(errp, "discard_granularity must be 512 for ide"); +return; } blkconf_blocksizes(&dev->conf); if (dev->conf.logical_block_size != 512) { -error_report("logical_block_size must be 512 for IDE"); -return -1; +error_setg(errp, "logical_block_size must be 512 for IDE"); +return; } blkconf_serial(&dev->conf, &dev->serial); if (kind != IDE_CD) { blkconf_geometry(&dev->conf, &dev->chs_trans, 65535,
[Qemu-devel] [PATCH v2 0/6] Convert to realize and improve error handling
This series mainly implements the conversions of ide, floppy and nvme device to realize. Add some error handling messages and remove the local variable local_err, use errp to propagate the error directly. Also fix the unusual function name. v2: -use bool as the return type instead of int. [Markus Armbruster & Stefan Hajnoczi] Cc: John Snow Cc: Kevin Wolf Cc: Max Reitz Cc: Keith Busch Cc: Stefan Hajnoczi Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Gerd Hoffmann Cc: Markus Armbruster Mao Zhongyi (6): hw/ide: Convert DeviceClass init to realize hw/block/fdc: Convert to realize hw/block/nvme: Convert to realize hw/block: Fix the return type hw/block: Use errp directly rather than local_err dev-storage: Fix the unusual function name hw/block/block.c| 15 --- hw/block/dataplane/virtio-blk.c | 12 +++--- hw/block/dataplane/virtio-blk.h | 2 +- hw/block/fdc.c | 48 + hw/block/nvme.c | 24 +-- hw/block/virtio-blk.c | 17 +++- hw/ide/core.c | 7 +-- hw/ide/qdev.c | 94 +++-- hw/scsi/scsi-disk.c | 13 ++ hw/usb/dev-storage.c| 29 ++--- include/hw/block/block.h| 4 +- include/hw/ide/internal.h | 5 ++- 12 files changed, 125 insertions(+), 145 deletions(-) -- 2.9.4
[Qemu-devel] [PATCH v2 3/6] hw/block/nvme: Convert to realize
Convert nvme_init() to realize and rename it to nvme_realize(). Cc: John Snow Cc: Keith Busch Cc: Kevin Wolf Cc: Max Reitz Cc: Markus Armbruster Signed-off-by: Mao Zhongyi --- hw/block/nvme.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 6071dc1..43c60ab 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -920,7 +920,7 @@ static const MemoryRegionOps nvme_cmb_ops = { }, }; -static int nvme_init(PCIDevice *pci_dev) +static void nvme_realize(PCIDevice *pci_dev, Error **errp) { NvmeCtrl *n = NVME(pci_dev); NvmeIdCtrl *id = &n->id_ctrl; @@ -931,24 +931,27 @@ static int nvme_init(PCIDevice *pci_dev) Error *local_err = NULL; if (!n->conf.blk) { -return -1; +error_setg(errp, "drive property not set"); +return; } bs_size = blk_getlength(n->conf.blk); if (bs_size < 0) { -return -1; +error_setg(errp, "could not get backing file size"); +return; } blkconf_serial(&n->conf, &n->serial); if (!n->serial) { -return -1; +error_setg(errp, "serial property not set"); +return; } blkconf_blocksizes(&n->conf); blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk), false, &local_err); if (local_err) { -error_report_err(local_err); -return -1; +error_propagate(errp, local_err); +return; } pci_conf = pci_dev->config; @@ -1046,7 +1049,6 @@ static int nvme_init(PCIDevice *pci_dev) cpu_to_le64(n->ns_size >> id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas)].ds); } -return 0; } static void nvme_exit(PCIDevice *pci_dev) @@ -1081,7 +1083,7 @@ static void nvme_class_init(ObjectClass *oc, void *data) DeviceClass *dc = DEVICE_CLASS(oc); PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc); -pc->init = nvme_init; +pc->realize = nvme_realize; pc->exit = nvme_exit; pc->class_id = PCI_CLASS_STORAGE_EXPRESS; pc->vendor_id = PCI_VENDOR_ID_INTEL; -- 2.9.4
Re: [Qemu-devel] [PATCH for-2.11 02/23] tcg: Rearrange ldst label tracking
On 04/08/2017 07:44, Richard Henderson wrote: > Dispense with TCGBackendData, as it has never been used for more than > holding a single pointer. Use a define in the cpu/tcg-target.h to > signal requirement for TCGLabelQemuLdst, so that we can drop the no-op > tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c. > > Signed-off-by: Richard Henderson Reviewed-by: Paolo Bonzini > --- > tcg/aarch64/tcg-target.h | 4 > tcg/arm/tcg-target.h | 4 > tcg/i386/tcg-target.h | 4 > tcg/ia64/tcg-target.h | 4 > tcg/mips/tcg-target.h | 4 > tcg/ppc/tcg-target.h | 4 > tcg/s390/tcg-target.h | 4 > tcg/tcg-be-null.h | 44 > --- > tcg/tcg.h | 6 +++-- > tcg/aarch64/tcg-target.inc.c | 3 ++- > tcg/arm/tcg-target.inc.c | 3 ++- > tcg/i386/tcg-target.inc.c | 4 ++-- > tcg/ia64/tcg-target.inc.c | 19 --- > tcg/mips/tcg-target.inc.c | 4 ++-- > tcg/ppc/tcg-target.inc.c | 4 ++-- > tcg/s390/tcg-target.inc.c | 4 ++-- > tcg/sparc/tcg-target.inc.c| 2 -- > tcg/{tcg-be-ldst.h => tcg-ldst.inc.c} | 27 - > tcg/tcg.c | 17 +++--- > tcg/tci/tcg-target.inc.c | 2 -- > 20 files changed, 61 insertions(+), 106 deletions(-) > delete mode 100644 tcg/tcg-be-null.h > rename tcg/{tcg-be-ldst.h => tcg-ldst.inc.c} (85%) > > diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h > index 3c3b1e603d..484cf6236c 100644 > --- a/tcg/aarch64/tcg-target.h > +++ b/tcg/aarch64/tcg-target.h > @@ -120,4 +120,8 @@ static inline void flush_icache_range(uintptr_t start, > uintptr_t stop) > > void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t); > > +#ifdef CONFIG_SOFTMMU > +#define TCG_TARGET_NEED_LDST_LABELS > +#endif > + > #endif /* AARCH64_TCG_TARGET_H */ > diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h > index b836f7f127..55de35a691 100644 > --- a/tcg/arm/tcg-target.h > +++ b/tcg/arm/tcg-target.h > @@ -138,4 +138,8 @@ static inline void flush_icache_range(uintptr_t start, > uintptr_t stop) > /* not defined -- call should be eliminated at compile time */ > void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t); > > +#ifdef CONFIG_SOFTMMU > +#define TCG_TARGET_NEED_LDST_LABELS > +#endif > + > #endif > diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h > index 2fd28fa6a5..11ee7fadd1 100644 > --- a/tcg/i386/tcg-target.h > +++ b/tcg/i386/tcg-target.h > @@ -186,4 +186,8 @@ static inline void tb_target_set_jmp_target(uintptr_t > tc_ptr, > > #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD) > > +#ifdef CONFIG_SOFTMMU > +#define TCG_TARGET_NEED_LDST_LABELS > +#endif > + > #endif > diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h > index 5c9ca8c1ce..83107e1407 100644 > --- a/tcg/ia64/tcg-target.h > +++ b/tcg/ia64/tcg-target.h > @@ -199,4 +199,8 @@ static inline void flush_icache_range(uintptr_t start, > uintptr_t stop) > /* not defined -- call should be eliminated at compile time */ > void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t); > > +#ifdef CONFIG_SOFTMMU > +#define TCG_TARGET_NEED_LDST_LABELS > +#endif > + > #endif > diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h > index 557c8ddc46..bea5290b9f 100644 > --- a/tcg/mips/tcg-target.h > +++ b/tcg/mips/tcg-target.h > @@ -209,4 +209,8 @@ static inline void flush_icache_range(uintptr_t start, > uintptr_t stop) > > void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t); > > +#ifdef CONFIG_SOFTMMU > +#define TCG_TARGET_NEED_LDST_LABELS > +#endif > + > #endif > diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h > index 5bab3387e5..c1226ea5b6 100644 > --- a/tcg/ppc/tcg-target.h > +++ b/tcg/ppc/tcg-target.h > @@ -127,4 +127,8 @@ extern bool have_isa_3_00; > void flush_icache_range(uintptr_t start, uintptr_t stop); > void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t); > > +#ifdef CONFIG_SOFTMMU > +#define TCG_TARGET_NEED_LDST_LABELS > +#endif > + > #endif > diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h > index 1398952d6b..8fea9646b4 100644 > --- a/tcg/s390/tcg-target.h > +++ b/tcg/s390/tcg-target.h > @@ -153,4 +153,8 @@ static inline void tb_target_set_jmp_target(uintptr_t > tc_ptr, > /* no need to flush icache explicitly */ > } > > +#ifdef CONFIG_SOFTMMU > +#define TCG_TARGET_NEED_LDST_LABELS > +#endif > + > #endif > diff --git a/tcg/tcg-be-null.h b/tcg/tcg-be-null.h > deleted file mode 100644 > index 5222fe29e2..00 > --- a/tcg/tcg-be-null.h > +++ /dev/null > @@ -1,44 +0,0 @@ > -/* > - * TCG Backend Data: No backend data > - * > - * Permission is hereby granted, free of charge, to any person obtaining a > copy
[Qemu-devel] [PATCH v2 6/6] dev-storage: Fix the unusual function name
The function name of usb_msd_{realize,unrealize}_*, usb_msd_class_initfn_* are unusual. Rename it to usb_msd_*_{realize,unrealize}, usb_msd_class_*_initfn. Cc: Gerd Hoffmann Signed-off-by: Mao Zhongyi --- hw/usb/dev-storage.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c index 801f552..2a05cd5 100644 --- a/hw/usb/dev-storage.c +++ b/hw/usb/dev-storage.c @@ -596,7 +596,7 @@ static void usb_msd_unrealize_storage(USBDevice *dev, Error **errp) object_unref(OBJECT(&s->bus)); } -static void usb_msd_realize_storage(USBDevice *dev, Error **errp) +static void usb_msd_storage_realize(USBDevice *dev, Error **errp) { MSDState *s = USB_STORAGE_DEV(dev); BlockBackend *blk = s->conf.blk; @@ -643,14 +643,14 @@ static void usb_msd_realize_storage(USBDevice *dev, Error **errp) s->scsi_dev = scsi_dev; } -static void usb_msd_unrealize_bot(USBDevice *dev, Error **errp) +static void usb_msd_bot_unrealize(USBDevice *dev, Error **errp) { MSDState *s = USB_STORAGE_DEV(dev); object_unref(OBJECT(&s->bus)); } -static void usb_msd_realize_bot(USBDevice *dev, Error **errp) +static void usb_msd_bot_realize(USBDevice *dev, Error **errp) { MSDState *s = USB_STORAGE_DEV(dev); DeviceState *d = DEVICE(dev); @@ -764,12 +764,12 @@ static void usb_msd_class_initfn_common(ObjectClass *klass, void *data) dc->vmsd = &vmstate_usb_msd; } -static void usb_msd_class_initfn_storage(ObjectClass *klass, void *data) +static void usb_msd_class_storage_initfn(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); USBDeviceClass *uc = USB_DEVICE_CLASS(klass); -uc->realize = usb_msd_realize_storage; +uc->realize = usb_msd_storage_realize; uc->unrealize = usb_msd_unrealize_storage; dc->props = msd_properties; } @@ -828,26 +828,26 @@ static void usb_msd_instance_init(Object *obj) object_property_set_int(obj, -1, "bootindex", NULL); } -static void usb_msd_class_initfn_bot(ObjectClass *klass, void *data) +static void usb_msd_class_bot_initfn(ObjectClass *klass, void *data) { USBDeviceClass *uc = USB_DEVICE_CLASS(klass); -uc->realize = usb_msd_realize_bot; -uc->unrealize = usb_msd_unrealize_bot; +uc->realize = usb_msd_bot_realize; +uc->unrealize = usb_msd_bot_unrealize; uc->attached_settable = true; } static const TypeInfo msd_info = { .name = "usb-storage", .parent= TYPE_USB_STORAGE, -.class_init= usb_msd_class_initfn_storage, +.class_init= usb_msd_class_storage_initfn, .instance_init = usb_msd_instance_init, }; static const TypeInfo bot_info = { .name = "usb-bot", .parent= TYPE_USB_STORAGE, -.class_init= usb_msd_class_initfn_bot, +.class_init= usb_msd_class_bot_initfn, }; static void usb_msd_register_types(void) -- 2.9.4
Re: [Qemu-devel] [PATCH] pc/acpi: Fix booting of macOS with Clover EFI bootloader
On Fri, Aug 4, 2017 at 2:35 PM, Igor Mammedov wrote: > On Fri, 4 Aug 2017 12:15:40 +0530 > Dhiru Kholia wrote: > >> This was tested with macOS 10.12.5 and Clover r4114. >> >> Without this patch, the macOS boot process gets stuck at the Apple logo >> without showing any progress bar. >> >> I have documented the process of running macOS on QEMU/KVM at, >> >> https://github.com/kholia/OSX-KVM/ >> >> Instead of using this patch, adding an additional command-line knob >> which exposes this setting (force_rev1_fadt) to the user might be a more >> general solution. > > it's been reported that OVMF had issues that were fixed, > you probably want to read this thread: > https://www.mail-archive.com/qemu-devel@nongnu.org/msg468456.html Hi Igor, I have now tested various OVMF versions with the latest QEMU (commit aaaec6acad7c). When using edk2.git-ovmf-x64-0-20170726.b2826.gbb4831c03d.noarch.rpm from [1] macOS does not boot and gets stuck. The CPU usage goes to 100%. I haven't confirmed this but this OVMF build should have both 198a46d and 072060a edk2 commits in it, based on the build date. The OVMF blob from Gabriel Somlo [2] hangs on the OVMF logo screen and it too results in 100% CPU usage after hanging. I am using "boot-clover.sh" script from my repository [4] to test the various OVMF versions. The only OVMF blob which works with the current QEMU for booting macOS is the one from Proxmox [3]. Unfortunately, I don't know the corresponding commit in the edk2 repository for this working OVMF blob. References, [1] https://www.kraxel.org/repos/jenkins/edk2/ [2] https://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/ [3] https://git.proxmox.com/?p=pve-qemu-kvm.git;a=tree;f=debian [4] https://github.com/kholia/OSX-KVM/ It would be great if someone else could reproduce these results too. Thanks, Dhiru
Re: [Qemu-devel] [PATCHv2 4/4] scsi: Add 'enclosure' option for scsi devices
On 04/08/2017 10:36, Hannes Reinecke wrote: > Make enclosure support optional with the 'enclosure' argument to > the scsi device. > Adding 'enclosure=on' as option to the SCSI device will present > an enclosure service device on LUN0, either as a stand-alone > LUN (in case LUN0 is not assigned) or by setting the EncServ bit > int the inquiry data if LUN0 is assigned to a block devices. > > Signed-off-by: Hannes Reinecke > --- > hw/scsi/scsi-bus.c | 11 --- > hw/scsi/scsi-disk.c| 4 +++- > include/hw/scsi/scsi.h | 1 + > 3 files changed, 12 insertions(+), 4 deletions(-) > > diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c > index 79a222f..c11422b 100644 > --- a/hw/scsi/scsi-bus.c > +++ b/hw/scsi/scsi-bus.c > @@ -22,6 +22,7 @@ static Property scsi_props[] = { > DEFINE_PROP_UINT32("channel", SCSIDevice, channel, 0), > DEFINE_PROP_UINT32("scsi-id", SCSIDevice, id, -1), > DEFINE_PROP_UINT32("lun", SCSIDevice, lun, -1), > +DEFINE_PROP_BOOL("enclosure", SCSIDevice, enclosure, false), > DEFINE_PROP_END_OF_LIST(), > }; > > @@ -494,11 +495,14 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq > *r) > if (r->req.lun != 0) { > r->buf[0] = TYPE_NO_LUN; > } else { > -r->buf[0] = TYPE_ENCLOSURE; > +r->buf[0] = r->req.dev->enclosure ? > +TYPE_ENCLOSURE : TYPE_NOT_PRESENT | TYPE_INACTIVE; > r->buf[2] = 5; /* Version */ > r->buf[3] = 2 | 0x10; /* HiSup, response data format */ > r->buf[4] = r->len - 5; /* Additional Length = (Len - 1) - 4 */ > -r->buf[6] = 0x40; /* Enclosure service */ > +if (r->req.dev->enclosure) { > +r->buf[6] = 0x40; /* Enclosure service */ > +} > r->buf[7] = 0x10 | (r->req.bus->info->tcq ? 0x02 : 0); /* Sync, TCQ. > */ > memcpy(&r->buf[8], "QEMU", 8); > memcpy(&r->buf[16], "QEMU TARGET ", 16); > @@ -600,7 +604,8 @@ static int32_t scsi_target_send_command(SCSIRequest *req, > uint8_t *buf) > } > break; > case RECEIVE_DIAGNOSTIC: > -if (!scsi_target_emulate_receive_diagnostic(r)) { > +if (!r->req.dev->enclosure || > +!scsi_target_emulate_receive_diagnostic(r)) { > goto illegal_request; > } > break; > diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c > index 5f1e5e8..153d97d 100644 > --- a/hw/scsi/scsi-disk.c > +++ b/hw/scsi/scsi-disk.c > @@ -792,7 +792,9 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, > uint8_t *outbuf) > the additional length is not adjusted */ > outbuf[4] = 36 - 5; > } > - > +if (s->qdev.lun == 0 && s->qdev.enclosure) { > +outbuf[6] = 0x40; /* Enclosure service */ > +} Should this really be set on disks even if you do have a LUN0? Why don't you instead add a very simple scsi-enclosure device that you can specify as the LUN0? Thanks, Paolo > /* Sync data transfer and TCQ. */ > outbuf[7] = 0x10 | (req->bus->info->tcq ? 0x02 : 0); > return buflen; > diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h > index 6b85786..243c185 100644 > --- a/include/hw/scsi/scsi.h > +++ b/include/hw/scsi/scsi.h > @@ -100,6 +100,7 @@ struct SCSIDevice > BlockConf conf; > SCSISense unit_attention; > bool sense_is_ua; > +bool enclosure; > uint8_t sense[SCSI_SENSE_BUF_SIZE]; > uint32_t sense_len; > QTAILQ_HEAD(, SCSIRequest) requests; >
Re: [Qemu-devel] [PATCHv2 4/4] scsi: Add 'enclosure' option for scsi devices
On 08/04/2017 12:48 PM, Paolo Bonzini wrote: > On 04/08/2017 10:36, Hannes Reinecke wrote: >> Make enclosure support optional with the 'enclosure' argument to >> the scsi device. >> Adding 'enclosure=on' as option to the SCSI device will present >> an enclosure service device on LUN0, either as a stand-alone >> LUN (in case LUN0 is not assigned) or by setting the EncServ bit >> int the inquiry data if LUN0 is assigned to a block devices. >> >> Signed-off-by: Hannes Reinecke >> --- >> hw/scsi/scsi-bus.c | 11 --- >> hw/scsi/scsi-disk.c| 4 +++- >> include/hw/scsi/scsi.h | 1 + >> 3 files changed, 12 insertions(+), 4 deletions(-) >> >> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c >> index 79a222f..c11422b 100644 >> --- a/hw/scsi/scsi-bus.c >> +++ b/hw/scsi/scsi-bus.c >> @@ -22,6 +22,7 @@ static Property scsi_props[] = { >> DEFINE_PROP_UINT32("channel", SCSIDevice, channel, 0), >> DEFINE_PROP_UINT32("scsi-id", SCSIDevice, id, -1), >> DEFINE_PROP_UINT32("lun", SCSIDevice, lun, -1), >> +DEFINE_PROP_BOOL("enclosure", SCSIDevice, enclosure, false), >> DEFINE_PROP_END_OF_LIST(), >> }; >> >> @@ -494,11 +495,14 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq >> *r) >> if (r->req.lun != 0) { >> r->buf[0] = TYPE_NO_LUN; >> } else { >> -r->buf[0] = TYPE_ENCLOSURE; >> +r->buf[0] = r->req.dev->enclosure ? >> +TYPE_ENCLOSURE : TYPE_NOT_PRESENT | TYPE_INACTIVE; >> r->buf[2] = 5; /* Version */ >> r->buf[3] = 2 | 0x10; /* HiSup, response data format */ >> r->buf[4] = r->len - 5; /* Additional Length = (Len - 1) - 4 */ >> -r->buf[6] = 0x40; /* Enclosure service */ >> +if (r->req.dev->enclosure) { >> +r->buf[6] = 0x40; /* Enclosure service */ >> +} >> r->buf[7] = 0x10 | (r->req.bus->info->tcq ? 0x02 : 0); /* Sync, >> TCQ. */ >> memcpy(&r->buf[8], "QEMU", 8); >> memcpy(&r->buf[16], "QEMU TARGET ", 16); >> @@ -600,7 +604,8 @@ static int32_t scsi_target_send_command(SCSIRequest >> *req, uint8_t *buf) >> } >> break; >> case RECEIVE_DIAGNOSTIC: >> -if (!scsi_target_emulate_receive_diagnostic(r)) { >> +if (!r->req.dev->enclosure || >> +!scsi_target_emulate_receive_diagnostic(r)) { >> goto illegal_request; >> } >> break; >> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c >> index 5f1e5e8..153d97d 100644 >> --- a/hw/scsi/scsi-disk.c >> +++ b/hw/scsi/scsi-disk.c >> @@ -792,7 +792,9 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, >> uint8_t *outbuf) >> the additional length is not adjusted */ >> outbuf[4] = 36 - 5; >> } >> - >> +if (s->qdev.lun == 0 && s->qdev.enclosure) { >> +outbuf[6] = 0x40; /* Enclosure service */ >> +} > > Should this really be set on disks even if you do have a LUN0? > Why not? You _can_ have embedded enclosure devices; that's what this bit is for... > Why don't you instead add a very simple scsi-enclosure device that you > can specify as the LUN0? > You don't have to. Once you leave LUN0 free (eg by assigning your first disk LUN1) you'll get this device by virtue of the current LUN0 emulation. But yeah, maybe I can be doing a separate enclosure device. Will be checking if and how I can make it to work. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [Qemu-devel] [PATCH v2 4/6] hw/block: Fix the return type
On Fri, Aug 04, 2017 at 06:26:41PM +0800, Mao Zhongyi wrote: > When the function no success value to transmit, it usually make the > function return void. It has turned out not to be a success, because > it means that the extra local_err variable and error_propagate() will > be needed. It leads to cumbersome code, therefore, transmit success/ > failure in the return value is worth. > > So fix the return type of blkconf_apply_backend_options(), > blkconf_geometry() and virtio_blk_data_plane_create() to avoid it. > > Cc: John Snow > Cc: Kevin Wolf > Cc: Max Reitz > Cc: Stefan Hajnoczi > > Signed-off-by: Mao Zhongyi > --- > hw/block/block.c| 15 +-- > hw/block/dataplane/virtio-blk.c | 12 +++- > hw/block/dataplane/virtio-blk.h | 2 +- > include/hw/block/block.h| 4 ++-- > 4 files changed, 19 insertions(+), 14 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [Qemu-devel] Is the use of bdrv_getlength() in handle_aiocb_write_zeroes() kosher?
On 08/04/2017 03:16 PM, Markus Armbruster wrote: > Denis, you added this in commit d50d822: > > #ifdef CONFIG_FALLOCATE > if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) { > int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, > aiocb->aio_nbytes); > if (ret == 0 || ret != -ENOTSUP) { > return ret; > } > s->has_fallocate = false; > } > #endif > > bdrv_getlength() can fail. Does it do the right thing then? For what > it's worth, the comparison of its value is signed. fallocate() with 0 flags can work only beyond end of file or on top of the hole. Thus the check is made to validate that we are beyond EOF. Technically fallocate should fail if that condition will be violated. But you right, we can add sanity check here. This would not harm. Should I send it? Den
Re: [Qemu-devel] Cleaning up handling of CPU exceptions accessing non-existent memory/devices
On 1 August 2017 at 11:50, Peter Maydell wrote: > My proposal is that we should define a new QOM CPU hook: > void (*do_transaction_failed)(CPUState *cpu, hwaddr physaddr, vaddr addr, > unsigned size, MMUAccessType access_type, > MemTxAttrs attrs, MemTxResult response); I realised that we probably also want the 'uintptr_t retaddr' and 'int mmu_idx' that tlb_fill() and the do_unaligned_access hook both already have. I have some patches mostly-ready which implement the common code changes plus the ARM mode implementation of the hook function. thanks -- PMM
Re: [Qemu-devel] Is the use of bdrv_getlength() in parallels.c kosher?
On 08/04/2017 03:31 PM, Markus Armbruster wrote: > Same question for allocate_clusters() in parallels.c, commit 5a41e1f, > modified in commit ddd2ef2: > > if (s->data_end + space > bdrv_getlength(bs->file->bs) >> > BDRV_SECTOR_BITS) { > > bdrv_getlength() can fail. Does it do the right thing then? For what > it's worth, the comparison of its value is signed. > > There's another one in parallels_open(): > > if (!(flags & BDRV_O_RESIZE) || !bdrv_has_zero_init(bs->file->bs) || > bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs), > PREALLOC_MODE_OFF, NULL) != 0) { > s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE; > } the same as in the previous letter. Saniti checks will not harm. I'll make a patch. Thank you for the suggestion. Den
Re: [Qemu-devel] [PATCH for 2.10] block: use 1 MB bounce buffers for crypto instead of 16KB
On Fri, Aug 04, 2017 at 01:48:01PM +0100, Stefan Hajnoczi wrote: > On Fri, Aug 04, 2017 at 11:51:36AM +0100, Daniel P. Berrange wrote: > > Using 16KB bounce buffers creates a significant performance > > penalty for I/O to encrypted volumes on storage with high > > I/O latency (rotating rust & network drives), because it > > triggers lots of fairly small I/O operations. > > > > On tests with rotating rust, and cache=none|directsync, > > write speed increased from 2MiB/s to 32MiB/s, on a par > > with that achieved by the in-kernel luks driver. > > > > Signed-off-by: Daniel P. Berrange > > --- > > block/crypto.c | 12 +--- > > 1 file changed, 5 insertions(+), 7 deletions(-) > > > > diff --git a/block/crypto.c b/block/crypto.c > > index 58ef6f2f52..207941db9a 100644 > > --- a/block/crypto.c > > +++ b/block/crypto.c > > @@ -379,7 +379,7 @@ static void block_crypto_close(BlockDriverState *bs) > > } > > > > > > -#define BLOCK_CRYPTO_MAX_SECTORS 32 > > +#define BLOCK_CRYPTO_MAX_SECTORS 2048 > > > > static coroutine_fn int > > block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, > > @@ -396,9 +396,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t > > sector_num, > > > > qemu_iovec_init(&hd_qiov, qiov->niov); > > > > -/* Bounce buffer so we have a linear mem region for > > - * entire sector. XXX optimize so we avoid bounce > > - * buffer in case that qiov->niov == 1 > > +/* Bounce buffer because we're not permitted to touch > > + * contents of qiov - it points to guest memory. > > */ > > cipher_data = > > qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * > > 512, > > In the *read* case you can modify the data buffers in-place. But the > guest might see intermediate states in its buffers - not sure whether > this could pose a security problem. Whether its a risk or not depends on the choice of crypto parameters, as exposing ciphertext to the guest might make watermarking attacks easier to perform. Probably not a problem in practice, but I prefer to err on the side of caution since I can't be sure it is safe. > Reviewed-by: Stefan Hajnoczi Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [Qemu-devel] [PATCH] block: move trace probes into bdrv_co_preadv|pwritev
On Fri, Aug 04, 2017 at 11:50:36AM +0100, Daniel P. Berrange wrote: > There are trace probes in bdrv_co_readv|writev, however, the > block drivers are being gradually moved over to using the > bdrv_co_preadv|pwritev functions instead. As a result some > block drivers miss the current probes. Move the probes > into bdrv_co_preadv|pwritev instead, so that they are triggered > by more (all?) I/O code paths. > > Signed-off-by: Daniel P. Berrange > --- > block/io.c | 8 > block/trace-events | 4 ++-- > 2 files changed, 6 insertions(+), 6 deletions(-) Thanks, applied to my tracing tree: https://github.com/stefanha/qemu/commits/tracing Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v2 5/6] hw/block: Use errp directly rather than local_err
On Fri, Aug 04, 2017 at 06:26:42PM +0800, Mao Zhongyi wrote: > Pass the error message to errp directly rather than the local > variable local_err and propagate it to errp via error_propagate(). > > Cc: John Snow > Cc: Kevin Wolf > Cc: Max Reitz > Cc: Keith Busch > Cc: Stefan Hajnoczi > Cc: "Michael S. Tsirkin" > Cc: Paolo Bonzini > Cc: Gerd Hoffmann > Cc: Markus Armbruster > > Signed-off-by: Mao Zhongyi > --- > hw/block/fdc.c| 17 ++--- > hw/block/nvme.c | 8 +++- > hw/block/virtio-blk.c | 17 + > hw/ide/qdev.c | 12 > hw/scsi/scsi-disk.c | 13 - > hw/usb/dev-storage.c | 9 +++-- > 6 files changed, 25 insertions(+), 51 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [Qemu-devel] [PULL 0/8] target-mips queue
On 3 August 2017 at 15:45, Yongbok Kim wrote: > The following changes since commit aaaec6acad7cf97372d48c1b09126a09697519c8: > > Update version for v2.10.0-rc1 release (2017-08-02 16:36:32 +0100) > > are available in the git repository at: > > git://github.com/yongbok/upstream-qemu.git tags/mips-20170803 > > for you to fetch changes up to d673a68db6963e86536b125af464bb6ed03eba33: > > target/mips: Fix RDHWR CC with icount (2017-08-02 22:18:13 +0100) > > > MIPS patches 2017-08-03 > > Changes: > KVM T&E segment support for TCG > malta: leave space for the bootmap after the initrd > Apply CP0.PageMask before writing into TLB entry > Fix fallout from indirect branch optimisation > Applied, thanks. -- PMM
Re: [Qemu-devel] [PATCH v4 7/9] s390x/pci: fence off instructions for non-pci
On 04.08.2017 13:29, Cornelia Huck wrote: > If a guest running on a machine without zpci issues a pci instruction, > throw them an exception. > > Reviewed-by: Thomas Huth > Signed-off-by: Cornelia Huck > --- > target/s390x/kvm.c | 54 > +- > 1 file changed, 41 insertions(+), 13 deletions(-) > > diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c > index bc62bba5b7..9de165d8b1 100644 > --- a/target/s390x/kvm.c > +++ b/target/s390x/kvm.c > @@ -1191,7 +1191,11 @@ static int kvm_clp_service_call(S390CPU *cpu, struct > kvm_run *run) > { > uint8_t r2 = (run->s390_sieic.ipb & 0x000f) >> 16; > > -return clp_service_call(cpu, r2); > +if (s390_has_feat(S390_FEAT_ZPCI)) { > +return clp_service_call(cpu, r2); > +} else { > +return -1; > +} I am a fan of dropping these else case and returning directly. But that is just my opinion. (applies to all changes in this patch) -- Thanks, David
Re: [Qemu-devel] [PATCH v4 5/9] s390x/ccw: create s390 phb conditionally
On 04.08.2017 13:29, Cornelia Huck wrote: > Don't create the s390 pci host bridge if we do not provide the zpci > facility. > > Reviewed-by: Thomas Huth > Acked-by: Christian Borntraeger > Signed-off-by: Cornelia Huck This works, because s390_init_cpus(machine) (which sets up features) is called before adding the bus. Mind to add a comment so nobody tries to reshuffle these functions? > --- > hw/s390x/s390-virtio-ccw.c | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > index 1c7af39ce6..8be4a541c1 100644 > --- a/hw/s390x/s390-virtio-ccw.c > +++ b/hw/s390x/s390-virtio-ccw.c > @@ -118,7 +118,6 @@ static void ccw_init(MachineState *machine) > { > int ret; > VirtualCssBus *css_bus; > -DeviceState *dev; > > s390_sclp_init(); > s390_memory_init(machine->ram_size); > @@ -134,10 +133,13 @@ static void ccw_init(MachineState *machine) >machine->initrd_filename, "s390-ccw.img", >"s390-netboot.img", true); > > -dev = qdev_create(NULL, TYPE_S390_PCI_HOST_BRIDGE); > -object_property_add_child(qdev_get_machine(), TYPE_S390_PCI_HOST_BRIDGE, > - OBJECT(dev), NULL); > -qdev_init_nofail(dev); > +if (s390_has_feat(S390_FEAT_ZPCI)) { > +DeviceState *dev = qdev_create(NULL, TYPE_S390_PCI_HOST_BRIDGE); > +object_property_add_child(qdev_get_machine(), > + TYPE_S390_PCI_HOST_BRIDGE, > + OBJECT(dev), NULL); > +qdev_init_nofail(dev); > +} > > /* register hypercalls */ > virtio_ccw_register_hcalls(); > -- Thanks, David
Re: [Qemu-devel] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev
On Fri, Aug 04, 2017 at 11:50:59AM +0100, Daniel P. Berrange wrote: > Signed-off-by: Daniel P. Berrange > --- > include/block/block_int.h | 29 + > 1 file changed, 29 insertions(+) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v4 8/9] s390x/kvm: msi route fixup for non-pci
On 04.08.2017 13:29, Cornelia Huck wrote: > If we don't provide pci, we cannot have a pci device for which we > have to translate to adapter routes: just return -ENODEV. > > Signed-off-by: Cornelia Huck > --- > target/s390x/kvm.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c > index 9de165d8b1..d8db1cbf6e 100644 > --- a/target/s390x/kvm.c > +++ b/target/s390x/kvm.c > @@ -2424,6 +2424,12 @@ int kvm_arch_fixup_msi_route(struct > kvm_irq_routing_entry *route, > uint32_t idx = data >> ZPCI_MSI_VEC_BITS; > uint32_t vec = data & ZPCI_MSI_VEC_MASK; > > +if (!s390_has_feat(S390_FEAT_ZPCI)) { > +/* How can we get here without pci enabled? */ > +g_assert(false); > +return -ENODEV; > +} > + /* How can we get here without pci enabled? */ g_assert(s390_has_feat(S390_FEAT_ZPCI)); ? > pbdev = s390_pci_find_dev_by_idx(s390_get_phb(), idx); > if (!pbdev) { > DPRINTF("add_msi_route no dev\n"); > -- Thanks, David
[Qemu-devel] Is the use of bdrv_getlength() in quorum_co_flush() kosher?
Have a look at quorum_co_flush(): quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0, bdrv_getlength(s->children[i]->bs), s->children[i]->bs->node_name, result); bdrv_getlength() can fail. Does it do the right thing then?
Re: [Qemu-devel] [PATCH for 2.10] block: use 1 MB bounce buffers for crypto instead of 16KB
On Fri, Aug 04, 2017 at 11:51:36AM +0100, Daniel P. Berrange wrote: > Using 16KB bounce buffers creates a significant performance > penalty for I/O to encrypted volumes on storage with high > I/O latency (rotating rust & network drives), because it > triggers lots of fairly small I/O operations. > > On tests with rotating rust, and cache=none|directsync, > write speed increased from 2MiB/s to 32MiB/s, on a par > with that achieved by the in-kernel luks driver. > > Signed-off-by: Daniel P. Berrange > --- > block/crypto.c | 12 +--- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/block/crypto.c b/block/crypto.c > index 58ef6f2f52..207941db9a 100644 > --- a/block/crypto.c > +++ b/block/crypto.c > @@ -379,7 +379,7 @@ static void block_crypto_close(BlockDriverState *bs) > } > > > -#define BLOCK_CRYPTO_MAX_SECTORS 32 > +#define BLOCK_CRYPTO_MAX_SECTORS 2048 > > static coroutine_fn int > block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, > @@ -396,9 +396,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t > sector_num, > > qemu_iovec_init(&hd_qiov, qiov->niov); > > -/* Bounce buffer so we have a linear mem region for > - * entire sector. XXX optimize so we avoid bounce > - * buffer in case that qiov->niov == 1 > +/* Bounce buffer because we're not permitted to touch > + * contents of qiov - it points to guest memory. > */ > cipher_data = > qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512, In the *read* case you can modify the data buffers in-place. But the guest might see intermediate states in its buffers - not sure whether this could pose a security problem. Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
[Qemu-devel] Is the use of bdrv_getlength() in vhdx*.c kosher?
bdrv_getlength() can fail. There are several calls in vhdx*.c that don't seem to check. Bug or not?
Re: [Qemu-devel] [Qemu-block] [PATCH] qemu-img: Clarify about relative backing file options
On 08/04/2017 08:55 AM, Fam Zheng wrote: > It's not too surprising when a user specifies the backing file relative > to the current working directory instead of the top layer image. This > causes error when they differ. Though the error message has enough > information to infer the fact about the misunderstanding, it is better > if we document this explicitly, so that users don't have to learn from > mistakes. > > Signed-off-by: Fam Zheng > --- > qemu-img.texi | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/qemu-img.texi b/qemu-img.texi > index 72dabd6b3e..cf079f90e2 100644 > --- a/qemu-img.texi > +++ b/qemu-img.texi > @@ -244,6 +244,9 @@ only the differences from @var{backing_file}. No size > needs to be specified in > this case. @var{backing_file} will never be modified unless you use the > @code{commit} monitor command (or qemu-img commit). > > +If a relative path name is given, the backing file is looked up against > +@var{filename}, rather than the current working directory. Maybe better as: the backing file is looked up relative to the directory containing @var{filename} In all three places -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Qemu-block] [PATCH for-2.10] block: move trace probes into bdrv_co_preadv|pwritev
On 08/04/2017 05:50 AM, Daniel P. Berrange wrote: > There are trace probes in bdrv_co_readv|writev, however, the > block drivers are being gradually moved over to using the > bdrv_co_preadv|pwritev functions instead. As a result some > block drivers miss the current probes. Move the probes > into bdrv_co_preadv|pwritev instead, so that they are triggered > by more (all?) I/O code paths. > > Signed-off-by: Daniel P. Berrange > --- > block/io.c | 8 > block/trace-events | 4 ++-- > 2 files changed, 6 insertions(+), 6 deletions(-) Arguably, the traces are broken, so this is a bug fix worthy of 2.10. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH v4 9/9] s390x: refine pci dependencies
VIRTIO_PCI should properly depend on CONFIG_PCI. With this change, we can switch off pci for s390x by removing 'CONFIG_PCI=y' from the default config. Reviewed-by: Thomas Huth Signed-off-by: Cornelia Huck --- default-configs/s390x-softmmu.mak | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/default-configs/s390x-softmmu.mak b/default-configs/s390x-softmmu.mak index b227a36179..7adaa4a4fe 100644 --- a/default-configs/s390x-softmmu.mak +++ b/default-configs/s390x-softmmu.mak @@ -1,5 +1,5 @@ CONFIG_PCI=y -CONFIG_VIRTIO_PCI=y +CONFIG_VIRTIO_PCI=$(CONFIG_PCI) CONFIG_VHOST_USER_SCSI=$(CONFIG_LINUX) CONFIG_VIRTIO=y CONFIG_SCLPCONSOLE=y -- 2.13.3
Re: [Qemu-devel] [Qemu-block] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev
On 08/04/2017 05:50 AM, Daniel P. Berrange wrote: > Signed-off-by: Daniel P. Berrange > --- > include/block/block_int.h | 29 + > 1 file changed, 29 insertions(+) > > diff --git a/include/block/block_int.h b/include/block/block_int.h > index d4f4ea7584..deb81a58bd 100644 > --- a/include/block/block_int.h > +++ b/include/block/block_int.h > @@ -147,12 +147,41 @@ struct BlockDriver { > > int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs, > int64_t sector_num, int nb_sectors, QEMUIOVector *qiov); > + > +/** > + * @offset: position in bytes to read at > + * @bytes: number of bytes to read > + * @qiov: the buffers to fill with read data > + * > + * @offset and @bytes will be a multiple of 'request_alignment', > + * but the length of individual @qiov elements does not have to > + * be a multiple. > + * > + * @bytes may be less than the total sizeof @iov, and will be > + * no larger than 'max_transfer'. > + * > + * The buffer in @qiov may point directly to guest memory. > + */ > int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs, > uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags); Documenting flags would also be nice (at the moment, flags is always 0) > int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs, > int64_t sector_num, int nb_sectors, QEMUIOVector *qiov); > int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs, > int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags); > +/** > + * @offset: position in bytes to write at > + * @bytes: number of bytes to write > + * @qiov: the buffers containing data to write > + * > + * @offset and @bytes will be a multiple of 'request_alignment', > + * but the length of individual @qiov elements does not have to > + * be a multiple. > + * > + * @bytes may be less than the total sizeof @iov, and will be > + * no larger than 'max_transfer'. > + * > + * The buffer in @qiov may point directly to guest memory. > + */ > int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs, > uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags); Likewise for documentation, but here, flags will never exceed bs->supported_write_flags. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v3] block: document semanatics of bdrv_co_preadv|pwritev
On 08/04/2017 09:08 AM, Daniel P. Berrange wrote: > Signed-off-by: Daniel P. Berrange > --- > > - Clarify that @bytes matches @qiov total size (Kevin) > > include/block/block_int.h | 31 +++ > 1 file changed, 31 insertions(+) [looks like the nongnu.org infrastructure is having heavy load today, so mails are getting through more slowly than usual - leads to lots of crossed emails, where I'm seeing replies or direct sends sooner than list copies] > +/** > + * @offset: position in bytes to write at > + * @bytes: number of bytes to write > + * @qiov: the buffers containing data to write > + * @flags: zero or more of bits allowed by 'supported_write_flags' maybe s/of // > + * > + * @offset and @bytes will be a multiple of 'request_alignment', > + * but the length of individual @qiov elements does not have to > + * be a multiple. > + * > + * @bytes will always equal the total size of @qiov, and will be > + * no larger than 'max_transfer'. > + * > + * The buffer in @qiov may point directly to guest memory. > + */ > int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs, > uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags); Do we make guarantees that the driver callback is never reached if the image is currently read-only? If so, is that a guarantee worth documenting? Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v3 1/7] block: move ThrottleGroup membership to ThrottleGroupMember
On Mon 31 Jul 2017 11:54:37 AM CEST, Manos Pitsidianakis wrote: > This commit eliminates the 1:1 relationship between BlockBackend and > throttle group state. Users will be able to create multiple throttle > nodes, each with its own throttle group state, in the future. The > throttle group state cannot be per-BlockBackend anymore, it must be > per-throttle node. This is done by gathering ThrottleGroup membership > details from BlockBackendPublic into ThrottleGroupMember and refactoring > existing code to use the structure. > > Reviewed-by: Stefan Hajnoczi > Signed-off-by: Manos Pitsidianakis Reviewed-by: Alberto Garcia Berto
Re: [Qemu-devel] [PATCH v4 9/9] s390x: refine pci dependencies
On 04.08.2017 13:29, Cornelia Huck wrote: > VIRTIO_PCI should properly depend on CONFIG_PCI. > With this change, we can switch off pci for s390x by removing > 'CONFIG_PCI=y' from the default config. > > Reviewed-by: Thomas Huth > Signed-off-by: Cornelia Huck > --- > default-configs/s390x-softmmu.mak | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/default-configs/s390x-softmmu.mak > b/default-configs/s390x-softmmu.mak > index b227a36179..7adaa4a4fe 100644 > --- a/default-configs/s390x-softmmu.mak > +++ b/default-configs/s390x-softmmu.mak > @@ -1,5 +1,5 @@ > CONFIG_PCI=y > -CONFIG_VIRTIO_PCI=y > +CONFIG_VIRTIO_PCI=$(CONFIG_PCI) > CONFIG_VHOST_USER_SCSI=$(CONFIG_LINUX) > CONFIG_VIRTIO=y > CONFIG_SCLPCONSOLE=y > Reviewed-by: David Hildenbrand -- Thanks, David
Re: [Qemu-devel] [Qemu-block] [PATCH] qemu-img: Clarify about relative backing file options
On Fri, 08/04 09:23, Eric Blake wrote: > > +If a relative path name is given, the backing file is looked up against > > +@var{filename}, rather than the current working directory. > > Maybe better as: > the backing file is looked up relative to the directory containing > @var{filename} Sounds good, will update.
[Qemu-devel] [PATCH v4 0/9] s390x: zPCI detangling
Next version, not so many changes from v3. As you might have guessed, the goals are still the same: - Being able to disable PCI support in a build completely. - Properly fencing off PCI if the relevant facility bit is not provided. Changes v3->v4: - introduce pci_available boolean - use pci_available to fence off setting the zcpi facility bit - collected tags Branch is still git://github.com/cohuck/qemu no-zpci-cpumodel Cornelia Huck (9): kvm: remove hard dependency on pci s390x/pci: add stubs s390x: chsc nt2 events are pci-only s390x/pci: do not advertise pci on non-pci builds s390x/ccw: create s390 phb conditionally s390x/sclp: properly guard pci-specific functions s390x/pci: fence off instructions for non-pci s390x/kvm: msi route fixup for non-pci s390x: refine pci dependencies accel/kvm/kvm-all.c | 6 ++-- default-configs/s390x-softmmu.mak | 2 +- hw/pci/pci-stub.c | 13 +++ hw/pci/pci.c | 2 ++ hw/s390x/Makefile.objs| 3 +- hw/s390x/s390-pci-bus.c | 4 +-- hw/s390x/s390-pci-bus.h | 4 +-- hw/s390x/s390-pci-stub.c | 74 +++ hw/s390x/s390-virtio-ccw.c| 12 --- hw/s390x/sclp.c | 19 +++--- include/hw/pci/pci.h | 2 ++ target/s390x/ioinst.c | 16 + target/s390x/kvm.c| 64 + 13 files changed, 189 insertions(+), 32 deletions(-) create mode 100644 hw/s390x/s390-pci-stub.c -- 2.13.3
[Qemu-devel] [PATCH v6 0/3] Generate APEI GHES table and dynamically record CPER
> On Aug 4, 2017, at 1:22 AM, qemu-devel-requ...@nongnu.org wrote: > > Date: Fri, 4 Aug 2017 12:37:52 +0800 > From: Dongjiu Geng > To: , , , > , , > , > Cc: , , > > Subject: [Qemu-devel] [PATCH v6 0/3] Generate APEI GHES table and > dynamically record CPER > Message-ID: <1501821475-14647-1-git-send-email-gengdong...@huawei.com> > Content-Type: text/plain > > In the armv8 platform, the mainly hardware error source are ARMv8 > SEA/SEI/GSIV. For the ARMv8 SEA/SEI, the KVM or host kernel will signal SIGBUS > or use other interface to notify user space, such as Qemu. After Qemu gets > the notification, it will record the CPER and inject the SEA/SEI to KVM. this > series of patches will generate APEI table when guest OS boot up, and > dynamically > record CPER for the guest OS about the generic hardware errors, currently the > userspace only handle the memory section hardware errors. Before Qemu record > the > CPER, it needs to check the ACK value written by the guest OS to avoid > read-write > race condition. > > Below is the APEI/GHESV2/CPER table layout, the max number of error soure is > 11, > which is classified by notification type, now only enable the SEA/SEI > notification type > error source. > > etc/acpi/tables etc/hardware_errors > > == > +--+ > ++|address | > +--+ > |HEST+|registers | | > Error Status | > + +--+| ++ | > Data Block 0 | > | | GHES0| +->| |status_address0 |->| > ++ > +--+ | | ++ | | > CPER | > | | .| | +--->| |status_address1 |--+ | > | CPER | > | | error_status_address | | || ++ | | > | | > | | .| | || . | | | > | CPER | > | | error_status_address-+-+ |+--+ | | > +-++ > | | .| | +->| |status_address10|+ | | > Error Status | > | | read_ack_register+-+ | | | ++| | | > Data Block 1 | > | | read_ack_preserve| +-+-+->| |ack_address0|--+ | +-->| > ++ > | | read_ack_write | | | | ++ | | | > | CPER | > + +--+ | | +--->| |ack_address1|--+-+ | | > | CPER | > | | GHES1| | | || ++ | | | | > | | > + +--+ | | || | . | | | | | > | CPER | > | | .| | | || ++ | | | > +-++ > | | error_status_address-+---+ | | +->| |ack_address10 |--+-+-+ | | > |.. | > | | .| | | | | ++ | | | | | > ++ > | | read_ack_register+-+-+ | | | ack0 |<-+ | | | | > Error Status | > | | read_ack_preserve| | | | ++| | | | > Data Block 10| > | | read_ack_write | | | | | ack1 |<---+ | +>| > ++ > + +--+ | | | ++ | | > | CPER | > | | ... | | | | | | | | > | CPER | > + +--+ | | | +--+ | | | > | | > | | GHES10 | | | | | ack10 |< + | > | CPER | > + +--+ | | | ++ > +-++ > | | .| | | > | | error_status_address-+-+ | > | | .| | > | | read_ack_register+-+ > | | read_ack_preserve| > | | read_ack_write | > + +--+ Excellent job with the ASCII drawing.
[Qemu-devel] [PATCH v4 3/9] s390x: chsc nt2 events are pci-only
The nt2 event class is pci-only - don't look for events if pci is not in the active cpu model. Signed-off-by: Cornelia Huck --- hw/s390x/s390-pci-bus.c | 4 ++-- hw/s390x/s390-pci-bus.h | 4 ++-- hw/s390x/s390-pci-stub.c | 4 ++-- target/s390x/ioinst.c| 16 4 files changed, 22 insertions(+), 6 deletions(-) diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c index 61cfd2138f..c57f6ebae0 100644 --- a/hw/s390x/s390-pci-bus.c +++ b/hw/s390x/s390-pci-bus.c @@ -47,7 +47,7 @@ S390pciState *s390_get_phb(void) return phb; } -int chsc_sei_nt2_get_event(void *res) +int pci_chsc_sei_nt2_get_event(void *res) { ChscSeiNt2Res *nt2_res = (ChscSeiNt2Res *)res; PciCcdfAvail *accdf; @@ -87,7 +87,7 @@ int chsc_sei_nt2_get_event(void *res) return rc; } -int chsc_sei_nt2_have_event(void) +int pci_chsc_sei_nt2_have_event(void) { S390pciState *s = s390_get_phb(); diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h index 67af2c12ff..5df6292509 100644 --- a/hw/s390x/s390-pci-bus.h +++ b/hw/s390x/s390-pci-bus.h @@ -319,8 +319,8 @@ typedef struct S390pciState { } S390pciState; S390pciState *s390_get_phb(void); -int chsc_sei_nt2_get_event(void *res); -int chsc_sei_nt2_have_event(void); +int pci_chsc_sei_nt2_get_event(void *res); +int pci_chsc_sei_nt2_have_event(void); void s390_pci_sclp_configure(SCCB *sccb); void s390_pci_sclp_deconfigure(SCCB *sccb); void s390_pci_iommu_enable(S390PCIIOMMU *iommu); diff --git a/hw/s390x/s390-pci-stub.c b/hw/s390x/s390-pci-stub.c index 2e7e42a2af..cc7278a865 100644 --- a/hw/s390x/s390-pci-stub.c +++ b/hw/s390x/s390-pci-stub.c @@ -7,12 +7,12 @@ #include "s390-pci-bus.h" /* target/s390x/ioinst.c */ -int chsc_sei_nt2_get_event(void *res) +int pci_chsc_sei_nt2_get_event(void *res) { return 1; } -int chsc_sei_nt2_have_event(void) +int pci_chsc_sei_nt2_have_event(void) { return 0; } diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c index 51fbea620d..3fa3301f50 100644 --- a/target/s390x/ioinst.c +++ b/target/s390x/ioinst.c @@ -599,6 +599,22 @@ static int chsc_sei_nt0_have_event(void) return 0; } +static int chsc_sei_nt2_get_event(void *res) +{ +if (s390_has_feat(S390_FEAT_ZPCI)) { +return pci_chsc_sei_nt2_get_event(res); +} +return 1; +} + +static int chsc_sei_nt2_have_event(void) +{ +if (s390_has_feat(S390_FEAT_ZPCI)) { +return pci_chsc_sei_nt2_have_event(); +} +return 0; +} + #define CHSC_SEI_NT0(1ULL << 63) #define CHSC_SEI_NT2(1ULL << 61) static void ioinst_handle_chsc_sei(ChscReq *req, ChscResp *res) -- 2.13.3
[Qemu-devel] [PATCH v2] qemu-img: Clarify about relative backing file options
It's not too surprising when a user specifies the backing file relative to the current working directory instead of the top layer image. This causes error when they differ. Though the error message has enough information to infer the fact about the misunderstanding, it is better if we document this explicitly, so that users don't have to learn from mistakes. Signed-off-by: Fam Zheng --- v2: Improve wording. [Eric] --- qemu-img.texi | 9 + 1 file changed, 9 insertions(+) diff --git a/qemu-img.texi b/qemu-img.texi index 72dabd6b3e..90c7eab4a8 100644 --- a/qemu-img.texi +++ b/qemu-img.texi @@ -244,6 +244,9 @@ only the differences from @var{backing_file}. No size needs to be specified in this case. @var{backing_file} will never be modified unless you use the @code{commit} monitor command (or qemu-img commit). +If a relative path name is given, the backing file is looked up relative to +the directory containing @var{filename}. + Note that a given backing file will be opened to check that it is valid. Use the @code{-u} option to enable unsafe backing file mode, which means that the image will be created even if the associated backing file cannot be opened. A @@ -343,6 +346,9 @@ created as a copy on write image of the specified base image; the @var{backing_file} should have the same content as the input's base image, however the path, image format, etc may differ. +If a relative path name is given, the backing file is looked up relative to +the directory containing @var{output_filename}. + If the @code{-n} option is specified, the target volume creation will be skipped. This is useful for formats such as @code{rbd} if the target volume has already been created with site specific options that cannot @@ -490,6 +496,9 @@ The backing file is changed to @var{backing_file} and (if the image format of string), then the image is rebased onto no backing file (i.e. it will exist independently of any backing file). +If a relative path name is given, the backing file is looked up relative to +the directory containing @var{filename}. + @var{cache} specifies the cache mode to be used for @var{filename}, whereas @var{src_cache} specifies the cache mode for reading backing files. -- 2.13.3
Re: [Qemu-devel] [PATCHv2 3/4] scsi: clarify sense codes for LUN0 emulation
On 04/08/2017 10:36, Hannes Reinecke wrote: > The LUN0 emulation is just that, an emulation for a non-existing > LUN0. So we should be returning LUN_NOT_SUPPORTED for any request > coming from any other LUN. > And we should be aborting unhandled commands with INVALID OPCODE, > not LUN NOT SUPPORTED. > > Signed-off-by: Hannes Reinecke > --- > hw/scsi/scsi-bus.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c > index 8419c75..79a222f 100644 > --- a/hw/scsi/scsi-bus.c > +++ b/hw/scsi/scsi-bus.c > @@ -583,6 +583,11 @@ static int32_t scsi_target_send_command(SCSIRequest > *req, uint8_t *buf) > { > SCSITargetReq *r = DO_UPCAST(SCSITargetReq, req, req); > > +if (req->lun != 0) { > +scsi_req_build_sense(req, SENSE_CODE(LUN_NOT_SUPPORTED)); > +scsi_req_complete(req, CHECK_CONDITION); > +return 0; > +} > switch (buf[0]) { > case REPORT_LUNS: > if (!scsi_target_emulate_report_luns(r)) { > @@ -613,7 +618,7 @@ static int32_t scsi_target_send_command(SCSIRequest *req, > uint8_t *buf) > case TEST_UNIT_READY: > break; > default: > -scsi_req_build_sense(req, SENSE_CODE(LUN_NOT_SUPPORTED)); > +scsi_req_build_sense(req, SENSE_CODE(INVALID_OPCODE)); > scsi_req_complete(req, CHECK_CONDITION); > return 0; > illegal_request: > I am queuing this one since it's an independent bugfix. Paolo
Re: [Qemu-devel] Prohibit Windows from running in QEMU
On 04/08/2017 12:27, Peter Lieven wrote: > Am 04.08.2017 um 12:23 schrieb Paolo Bonzini: >> On 04/08/2017 11:58, Peter Lieven wrote: >>> Am 29.10.2013 um 10:59 schrieb Paolo Bonzini: Il 29/10/2013 10:48, Peter Lieven ha scritto: > Hi all, > > this question might seem a bit weird, but does anyone see a good way to > avoid > that Windows is able to boot inside qemu? > > We have defined several profiles for different operation systems and I > want > to avoid that someone chooses Linux and then installs Windows within > a VM. Reason is licensing. Patch QEMU to crash when Hyper-V extensions are enabled... >>> Hi all, >>> >>> this is an old topic that has become important for me again recently. >>> Now all Linux versions should be able to detect KVM even if Hyper-V is >>> enabled. >>> >>> But how do I detect from Qemu userspace that Hyper-V is enabled? >> Maybe a better one: make KVM crash the guest if CR8 is nonzero on a >> vmexit. Linux doesn't use it, Windows should not survive long. > > You mean the kvm kernel module? Or can I access this register also > from Qemu on any call that is handled in userspace? It would be easier > to have a cmdline option to Qemu than an option to a kernel module. Yes, the kernel module. Accessing it in QEMU requires cpu_synchronize_state so it's slow. However, you could piggyback on some other functionality that is never used by Linux to do the check, for example the RTC I/O port. That is, in the RTC I/O port code you call cpu_synchronize_state and check CR8. Paolo >> Warning, I don't know if UEFI firmware uses CR8. > > UEFI firmware is not important in this case. > Do you know if FreeBSD, OpenBSD or NetBSD use it? > > > Thank for your ideas, > Peter >
[Qemu-devel] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev
Signed-off-by: Daniel P. Berrange --- include/block/block_int.h | 29 + 1 file changed, 29 insertions(+) diff --git a/include/block/block_int.h b/include/block/block_int.h index d4f4ea7584..deb81a58bd 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -147,12 +147,41 @@ struct BlockDriver { int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov); + +/** + * @offset: position in bytes to read at + * @bytes: number of bytes to read + * @qiov: the buffers to fill with read data + * + * @offset and @bytes will be a multiple of 'request_alignment', + * but the length of individual @qiov elements does not have to + * be a multiple. + * + * @bytes may be less than the total sizeof @iov, and will be + * no larger than 'max_transfer'. + * + * The buffer in @qiov may point directly to guest memory. + */ int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags); int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov); int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags); +/** + * @offset: position in bytes to write at + * @bytes: number of bytes to write + * @qiov: the buffers containing data to write + * + * @offset and @bytes will be a multiple of 'request_alignment', + * but the length of individual @qiov elements does not have to + * be a multiple. + * + * @bytes may be less than the total sizeof @iov, and will be + * no larger than 'max_transfer'. + * + * The buffer in @qiov may point directly to guest memory. + */ int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags); -- 2.13.3
Re: [Qemu-devel] Prohibit Windows from running in QEMU
Am 04.08.2017 um 12:43 schrieb Paolo Bonzini: > On 04/08/2017 12:27, Peter Lieven wrote: >> Am 04.08.2017 um 12:23 schrieb Paolo Bonzini: >>> On 04/08/2017 11:58, Peter Lieven wrote: Am 29.10.2013 um 10:59 schrieb Paolo Bonzini: > Il 29/10/2013 10:48, Peter Lieven ha scritto: >> Hi all, >> >> this question might seem a bit weird, but does anyone see a good way to >> avoid >> that Windows is able to boot inside qemu? >> >> We have defined several profiles for different operation systems and I >> want >> to avoid that someone chooses Linux and then installs Windows within >> a VM. Reason is licensing. > Patch QEMU to crash when Hyper-V extensions are enabled... Hi all, this is an old topic that has become important for me again recently. Now all Linux versions should be able to detect KVM even if Hyper-V is enabled. But how do I detect from Qemu userspace that Hyper-V is enabled? >>> Maybe a better one: make KVM crash the guest if CR8 is nonzero on a >>> vmexit. Linux doesn't use it, Windows should not survive long. >> You mean the kvm kernel module? Or can I access this register also >> from Qemu on any call that is handled in userspace? It would be easier >> to have a cmdline option to Qemu than an option to a kernel module. > Yes, the kernel module. Accessing it in QEMU requires > cpu_synchronize_state so it's slow. However, you could piggyback on > some other functionality that is never used by Linux to do the check, > for example the RTC I/O port. > > That is, in the RTC I/O port code you call cpu_synchronize_state and > check CR8. Thanks, I will try that out and let you know. Peter
[Qemu-devel] [PATCH v4 4/9] s390x/pci: do not advertise pci on non-pci builds
Only set the zpci feature bit on builds that actually support pci. Signed-off-by: Cornelia Huck --- target/s390x/kvm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c index c4c5791d27..bc62bba5b7 100644 --- a/target/s390x/kvm.c +++ b/target/s390x/kvm.c @@ -2662,7 +2662,9 @@ void kvm_s390_get_host_cpu_model(S390CPUModel *model, Error **errp) } /* We emulate a zPCI bus and AEN, therefore we don't need HW support */ -set_bit(S390_FEAT_ZPCI, model->features); +if (pci_available) { +set_bit(S390_FEAT_ZPCI, model->features); +} set_bit(S390_FEAT_ADAPTER_EVENT_NOTIFICATION, model->features); if (s390_known_cpu_type(cpu_type)) { -- 2.13.3
Re: [Qemu-devel] [PULL 0/2] slirp updates
On 2 August 2017 at 23:28, Samuel Thibault wrote: > warning: redirection vers https://people.debian.org/~sthibault/qemu.git/ > The following changes since commit aaaec6acad7cf97372d48c1b09126a09697519c8: > > Update version for v2.10.0-rc1 release (2017-08-02 16:36:32 +0100) > > are available in the git repository at: > > http://people.debian.org/~sthibault/qemu.git tags/samuel-thibault > > for you to fetch changes up to 413d463f43fbc4dd3a601e80a5724aa384a265a0: > > slirp: check len against dhcp options array end (2017-08-03 00:26:44 +0200) > > > slirp updates > > > Hervé Poussineau (1): > slirp: fill error when failing to initialize user network > > Prasad J Pandit (1): > slirp: check len against dhcp options array end > > net/slirp.c | 134 > -- > slirp/bootp.c | 3 ++ > 2 files changed, 97 insertions(+), 40 deletions(-) Applied, thanks. -- PMM
[Qemu-devel] [PATCH] block: move trace probes into bdrv_co_preadv|pwritev
There are trace probes in bdrv_co_readv|writev, however, the block drivers are being gradually moved over to using the bdrv_co_preadv|pwritev functions instead. As a result some block drivers miss the current probes. Move the probes into bdrv_co_preadv|pwritev instead, so that they are triggered by more (all?) I/O code paths. Signed-off-by: Daniel P. Berrange --- block/io.c | 8 block/trace-events | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/block/io.c b/block/io.c index d9dc822173..26003814eb 100644 --- a/block/io.c +++ b/block/io.c @@ -1135,6 +1135,8 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child, bool use_local_qiov = false; int ret; +trace_bdrv_co_preadv(child->bs, offset, bytes, flags); + if (!drv) { return -ENOMEDIUM; } @@ -1207,8 +1209,6 @@ static int coroutine_fn bdrv_co_do_readv(BdrvChild *child, int coroutine_fn bdrv_co_readv(BdrvChild *child, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) { -trace_bdrv_co_readv(child->bs, sector_num, nb_sectors); - return bdrv_co_do_readv(child, sector_num, nb_sectors, qiov, 0); } @@ -1526,6 +1526,8 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child, bool use_local_qiov = false; int ret; +trace_bdrv_co_pwritev(child->bs, offset, bytes, flags); + if (!bs->drv) { return -ENOMEDIUM; } @@ -1660,8 +1662,6 @@ static int coroutine_fn bdrv_co_do_writev(BdrvChild *child, int coroutine_fn bdrv_co_writev(BdrvChild *child, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) { -trace_bdrv_co_writev(child->bs, sector_num, nb_sectors); - return bdrv_co_do_writev(child, sector_num, nb_sectors, qiov, 0); } diff --git a/block/trace-events b/block/trace-events index 071a8d77ba..25dd5a3026 100644 --- a/block/trace-events +++ b/block/trace-events @@ -9,8 +9,8 @@ blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int bytes, int flags blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int flags) "blk %p bs %p offset %"PRId64" bytes %u flags 0x%x" # block/io.c -bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d" -bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d" +bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x" +bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x" bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x" bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %u" -- 2.13.3
Re: [Qemu-devel] [PATCH v4 4/9] s390x/pci: do not advertise pci on non-pci builds
On 04.08.2017 13:29, Cornelia Huck wrote: > Only set the zpci feature bit on builds that actually support pci. > > Signed-off-by: Cornelia Huck > --- > target/s390x/kvm.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c > index c4c5791d27..bc62bba5b7 100644 > --- a/target/s390x/kvm.c > +++ b/target/s390x/kvm.c > @@ -2662,7 +2662,9 @@ void kvm_s390_get_host_cpu_model(S390CPUModel *model, > Error **errp) > } > > /* We emulate a zPCI bus and AEN, therefore we don't need HW support */ > -set_bit(S390_FEAT_ZPCI, model->features); > +if (pci_available) { > +set_bit(S390_FEAT_ZPCI, model->features); > +} > set_bit(S390_FEAT_ADAPTER_EVENT_NOTIFICATION, model->features); > > if (s390_known_cpu_type(cpu_type)) { > Yup, looks good to me. Reviewed-by: David Hildenbrand -- Thanks, David
Re: [Qemu-devel] [PATCH for-2.10 2/5] block: Allow reopen rw without BDRV_O_ALLOW_RDWR
On 08/04/2017 05:20 AM, Kevin Wolf wrote: >>> >>> Hmm, I wonder. https://bugzilla.redhat.com/show_bug.cgi?id=1465320 >>> details a failure when starting qemu with a read-write NBD disk, then >>> taking several snapshots (nbd <- snap1 <- snap2 <- snap3), then where >>> intermediate commit (snap2 into nbd) works but live commit (snap3 into >>> nbd) fails with a message that nbd does not support reopening. I'm >>> presuming that your series may help to address that; I'll give it a spin >>> and see what happens. >> >> Nope, even with your patches, I'm still getting: >> >> {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar2'}} >> {"return": {}} >> {"timestamp": {"seconds": 1501811285, "microseconds": 439748}, "event": >> "BLOCK_JOB_COMPLETED", "data": {"device": "drive-image1", "len": >> 2097152, "offset": 2097152, "speed": 0, "type": "commit"}} >> >> {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar3'}} >> {"error": {"class": "GenericError", "desc": "Block format 'nbd' used by >> node '#block048' does not support reopening files"}} > > That's simply NBD not implementing .bdrv_reopen_*. In other words, not a > bug, but just a missing feature. But why does intermediate commit work, while live commit fails? Both require that the image being committed into be read-write, and the NBD image was opened read-write (even if it can be treated as read-only for the duration of time that it is a backing image). I understand missing .bdrv_reopen making it impossible to change read-only to read-write, but when missing it means you can never change read-write down to the tighter read-only originally, then what is making live commit different from intermediate in not being able to handle it? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PULL] virtio: fix for rc2
On 3 August 2017 at 15:07, Michael S. Tsirkin wrote: > The following changes since commit aaaec6acad7cf97372d48c1b09126a09697519c8: > > Update version for v2.10.0-rc1 release (2017-08-02 16:36:32 +0100) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream > > for you to fetch changes up to e6a74868d92f858ac33915b6772999d7de2fd288: > > build-sys: add --disable-vhost-user (2017-08-03 15:55:41 +0300) > > > virtio: fix for rc2 > > Looks like the constant stream of additions of vhost-user devices is a > problem for some people who are concerned about external connections > from qemu. A per-device flag seems like an overkill, but a single > configure flag seems like a sane way to support that, and it looks like > we need to do it before the release. > > Signed-off-by: Michael S. Tsirkin > > > Marc-André Lureau (1): > build-sys: add --disable-vhost-user Applied, thanks. -- PMM
[Qemu-devel] [PATCH for 2.10] block: use 1 MB bounce buffers for crypto instead of 16KB
Using 16KB bounce buffers creates a significant performance penalty for I/O to encrypted volumes on storage with high I/O latency (rotating rust & network drives), because it triggers lots of fairly small I/O operations. On tests with rotating rust, and cache=none|directsync, write speed increased from 2MiB/s to 32MiB/s, on a par with that achieved by the in-kernel luks driver. Signed-off-by: Daniel P. Berrange --- block/crypto.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 58ef6f2f52..207941db9a 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -379,7 +379,7 @@ static void block_crypto_close(BlockDriverState *bs) } -#define BLOCK_CRYPTO_MAX_SECTORS 32 +#define BLOCK_CRYPTO_MAX_SECTORS 2048 static coroutine_fn int block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, @@ -396,9 +396,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, qemu_iovec_init(&hd_qiov, qiov->niov); -/* Bounce buffer so we have a linear mem region for - * entire sector. XXX optimize so we avoid bounce - * buffer in case that qiov->niov == 1 +/* Bounce buffer because we're not permitted to touch + * contents of qiov - it points to guest memory. */ cipher_data = qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512, @@ -464,9 +463,8 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num, qemu_iovec_init(&hd_qiov, qiov->niov); -/* Bounce buffer so we have a linear mem region for - * entire sector. XXX optimize so we avoid bounce - * buffer in case that qiov->niov == 1 +/* Bounce buffer because we're not permitted to touch + * contents of qiov - it points to guest memory. */ cipher_data = qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512, -- 2.13.3
[Qemu-devel] [PATCH for 2.11] vga/migration: Update memory map in post_load
From: "Dr. David Alan Gilbert" After migration the chain4 alias mapping added by 80763888 (in 2011) might be missing, since there's no call to vga_update_memory_access in the post_load after the registers are updated. Add it back. Signed-off-by: Dr. David Alan Gilbert --- hw/display/vga.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/display/vga.c b/hw/display/vga.c index 63421f9ee8..5d3848a37e 100644 --- a/hw/display/vga.c +++ b/hw/display/vga.c @@ -2044,6 +2044,7 @@ static int vga_common_post_load(void *opaque, int version_id) /* force refresh */ s->graphic_mode = -1; vbe_update_vgaregs(s); +vga_update_memory_access(s); return 0; } -- 2.13.3
Re: [Qemu-devel] Is the use of bdrv_getlength() in quorum_co_flush() kosher?
On Fri 04 Aug 2017 02:48:03 PM CEST, Markus Armbruster wrote: > Have a look at quorum_co_flush(): > > quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0, > bdrv_getlength(s->children[i]->bs), > s->children[i]->bs->node_name, result); > > bdrv_getlength() can fail. Does it do the right thing then? If it fails then it returns -errno, but then quorum_report_bad() turns into uint64_t and assumes it's valid. Since that number is then rounded up to the next multiple of BDRV_SECTOR_SIZE in order to calculate end_sector, I think that what happens in practice is that the user sees a QUORUM_REPORT_BAD event with sectors-count = 0 (in most cases) or with a very high value in sectors-count (if errno > BDRV_SECTOR_SIZE). The result of bdrv_getlength() is only used to report the number of affected sectors in the QUORUM_REPORT_BAD event, so there are no other consequences. Anyway I think it's a good idea not to make assumptions, detect the error and pass 0 instead. --- a/block/quorum.c +++ b/block/quorum.c @@ -785,8 +785,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs) for (i = 0; i < s->num_children; i++) { result = bdrv_co_flush(s->children[i]->bs); if (result) { +int64_t length = bdrv_getlength(s->children[i]->bs); quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0, - bdrv_getlength(s->children[i]->bs), + length > 0 ? length : 0, s->children[i]->bs->node_name, result); result_value.l = result; quorum_count_vote(&error_votes, &result_value, i); Berto
Re: [Qemu-devel] [PATCH v4 00/22] Clean up around qmp() and hmp()
On 08/03/2017 08:54 PM, no-re...@patchew.org wrote: > Hi, > > This series failed automatic build test. Please find the testing commands and > their output below. If you have docker installed, you can probably reproduce > it > locally. When will patchew have a syntax for stating dependencies? (Of course, I should actually mention those dependencies in my cover letter, not after the fact). > /tmp/qemu-test/src/tests/test-qga.c: In function ‘test_qga_set_time’: > /tmp/qemu-test/src/tests/test-qga.c:652: warning: implicit declaration of > function ‘qmp_fd’ This patch series depends on my earlier cleanup: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg00385.html [PATCH] test-qga: Kill broken and dead QGA_TEST_SIDE_EFFECT -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 1/3] i386/kvm: use a switch statement for MSR detection
On 04.08.2017 11:14, Ladi Prosek wrote: > Switch is easier on the eye and might lead to better codegen. > > Signed-off-by: Ladi Prosek Reviewed-by: David Hildenbrand -- Thanks, David
Re: [Qemu-devel] [PATCH 3/3] i386/kvm: advertise Hyper-V frequency MSRs
On 04.08.2017 11:14, Ladi Prosek wrote: > As of kernel commit eb82feea59d6 ("KVM: hyperv: support > HV_X64_MSR_TSC_FREQUENCY > and HV_X64_MSR_APIC_FREQUENCY"), KVM supports two new MSRs which are required > for nested Hyper-V to read timestamps with RDTSC + TSC page. > > This commit makes QEMU advertise the MSRs with CPUID.4003H:EAX[11] and > CPUID.4003H:EDX[8] as specified in the Hyper-V TLFS and experimentally > verified on a Hyper-V host. The feature is enabled with the existing hv-time > CPU > flag, and only if the TSC frequency is stable across migration and known. > > Signed-off-by: Ladi Prosek > --- > target/i386/kvm.c | 16 +++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/target/i386/kvm.c b/target/i386/kvm.c > index 77b6373..7e484a7 100644 > --- a/target/i386/kvm.c > +++ b/target/i386/kvm.c > @@ -89,6 +89,7 @@ static bool has_msr_hv_vpindex; > static bool has_msr_hv_runtime; > static bool has_msr_hv_synic; > static bool has_msr_hv_stimer; > +static bool has_msr_hv_frequencies; > static bool has_msr_xss; > > static bool has_msr_architectural_pmu; > @@ -631,7 +632,17 @@ static int hyperv_handle_properties(CPUState *cs) > if (cpu->hyperv_time) { > env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_HYPERCALL_AVAILABLE; > env->features[FEAT_HYPERV_EAX] |= > HV_X64_MSR_TIME_REF_COUNT_AVAILABLE; > -env->features[FEAT_HYPERV_EAX] |= 0x200; > +env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_REFERENCE_TSC_AVAILABLE; > +if (has_msr_hv_frequencies> +/* TSC clock must be stable > and known for this feature. */ > +&& ((env->features[FEAT_8000_0007_EDX] & CPUID_APM_INVTSC) > +|| env->user_tsc_khz != 0) > +&& env->tsc_khz != 0) { I'd drop the != 0 in both cases and move the env->tsc_khz check up to has_msr_hv_frequencies. if (has_msr_hv_frequencies && env->tsc_khz && ... Wonder if it even would make sense to move some parts of this check into a helper function, to beautify this a bit. tsc_stable(env) tsc_known(env) ... > + > +env->features[FEAT_HYPERV_EAX] |= HV_X64_ACCESS_FREQUENCY_MSRS; > +env->features[FEAT_HYPERV_EDX] |= > +HV_FEATURE_FREQUENCY_MSRS_AVAILABLE; > +} > } > if (cpu->hyperv_crash && has_msr_hv_crash) { > env->features[FEAT_HYPERV_EDX] |= HV_X64_GUEST_CRASH_MSR_AVAILABLE; > @@ -1127,6 +1138,9 @@ static int kvm_get_supported_msrs(KVMState *s) > case HV_X64_MSR_STIMER0_CONFIG: > has_msr_hv_stimer = true; > break; > +case HV_X64_MSR_TSC_FREQUENCY: > +has_msr_hv_frequencies = true; > +break; > } > > } > -- Thanks, David
[Qemu-devel] Is the use of bdrv_getlength() in handle_aiocb_write_zeroes() kosher?
Denis, you added this in commit d50d822: #ifdef CONFIG_FALLOCATE if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) { int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes); if (ret == 0 || ret != -ENOTSUP) { return ret; } s->has_fallocate = false; } #endif bdrv_getlength() can fail. Does it do the right thing then? For what it's worth, the comparison of its value is signed.