[PATCH] KVM: SVM: Pass through the host kernel's IO delay port

2009-06-19 Thread Paolo Bonzini
From: Paolo Bonzini 

KVM's optimization of guest port 80 accesses was removed last May 11
in commit 99f85a.  However, this probably has speed penalties.  I don't
have a machine to test but the equivalent VMX patch (fdef3ad) reported
a speedup of 3-5%, and on the Xen mailing list it was mentioned that on
Xen passing port 80 through had positive effects on startup speed.

We can enable passthrough to the same port the host kernel uses instead.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kernel/io_delay.c |1 +
 arch/x86/kvm/svm.c |5 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/io_delay.c b/arch/x86/kernel/io_delay.c
index a979b5b..a478cb2 100644
--- a/arch/x86/kernel/io_delay.c
+++ b/arch/x86/kernel/io_delay.c
@@ -129,3 +129,4 @@ static int __init io_delay_param(char *s)
 }
 
 early_param("io_delay", io_delay_param);
+EXPORT_SYMBOL_GPL(io_delay_type);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1f8510c..c49f4db 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 #include 
@@ -411,6 +412,10 @@ static __init int svm_hardware_setup(void)
 
iopm_va = page_address(iopm_pages);
memset(iopm_va, 0xff, PAGE_SIZE * (1 << IOPM_ALLOC_ORDER));
+   if (io_delay_type == CONFIG_IO_DELAY_TYPE_0X80)
+   clear_bit(0x80, iopm_va);
+   else if (io_delay_type == CONFIG_IO_DELAY_TYPE_0XED)
+   clear_bit(0xED, iopm_va);
iopm_base = page_to_pfn(iopm_pages) << PAGE_SHIFT;
 
if (boot_cpu_has(X86_FEATURE_NX))
-- 
1.6.0.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: Pass through the host kernel's IO delay port

2009-06-22 Thread Paolo Bonzini

Avi Kivity wrote:

On 06/19/2009 01:15 PM, Paolo Bonzini wrote:

From: Paolo Bonzini

KVM's optimization of guest port 80 accesses was removed last May 11
in commit 99f85a.  However, this probably has speed penalties.  I don't
have a machine to test but the equivalent VMX patch (fdef3ad) reported
a speedup of 3-5%, and on the Xen mailing list it was mentioned that on
Xen passing port 80 through had positive effects on startup speed.

We can enable passthrough to the same port the host kernel uses instead.


Since we don't tell the guest to use 0xed, this won't help.


Won't the guest do that automatically through DMI?

I think we have four cases:

1) non-buggy chipset, host uses 0x80 and guest cannot crash the machine; 
this is okay.


2) buggy chipset, both host and guest use 0xed thanks to DMI detection; 
and (since 0x80 is not passthrough) the guest cannot crash the machine, 
so this is okay.


3) host is given explicit io_delay=0xed, but the guest isn't.  However, 
0x80 is not passthrough so the guest can use port 0x80 safely even on 
buggy hardware.


4) a buggy chipset is not detected, so io_delay=0x80 and 0x80 is made 
passthrough.  In principle the guest could crash the machine; however 
the host will have written to 0x80 at startup, most likely causing a 
crash way before a guest is started.  So this should not be a problem 
either.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support

2009-06-22 Thread Paolo Bonzini



+ * Design note: We create one PIO/MMIO device (iosignalfd_group) which
+ * aggregates  one or more iosignalfd_items.  Each item points to exactly one
+ * eventfd, and can be registered to trigger on any write to the group
+ * (wildcard), or to a write of a specific value.  If more than one item is to
+ * be supported, the addr/len ranges must all be identical in the group.  If a
+ * trigger value is to be supported on a particular item, the group range must
+ * be exactly the width of the trigger.


Some duplicate spaces in the text above, apparently at random places.


-ENOPARSE ;)

Can you elaborate?


I see "aggregates  one".  The others are all at end of sentence, so I 
think that Michael was not talking about those (git grep '\*.*\.  ').


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: Pass through the host kernel's IO delay port

2009-06-22 Thread Paolo Bonzini

Marcelo Tosatti wrote:

Paolo,

As said in private email you can change the port 80 passthrough to
disabled only on those hosts that have broken port 80 (you can find the
blacklist at arch/x86/kernel/io_delay.c).c


Actually I read the code more and, since pv_ops is actually disabling 
outs to port 80 if the guest is sufficiently new, it may even make more 
sense for easier maintenance to remove the port 80 passthrough on Intel 
processors as well...


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: Pass through the host kernel's IO delay port

2009-06-22 Thread Paolo Bonzini

Marcelo Tosatti wrote:

On Mon, Jun 22, 2009 at 07:57:22PM +0200, Paolo Bonzini wrote:

Marcelo Tosatti wrote:

Paolo,

As said in private email you can change the port 80 passthrough to
disabled only on those hosts that have broken port 80 (you can find the
blacklist at arch/x86/kernel/io_delay.c).c
Actually I read the code more and, since pv_ops is actually disabling  
outs to port 80 if the guest is sufficiently new, it may even make more  
sense for easier maintenance to remove the port 80 passthrough on Intel  
processors as well...


But what about Windows guests?


I don't think they are using port 80 to delay ISA access, are they?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio scsi host draft specification, v3

2011-06-29 Thread Paolo Bonzini

On 06/12/2011 09:51 AM, Michael S. Tsirkin wrote:

>
>  If a device uses more than one queue it is the responsibility of the
>  device to ensure strict request ordering.

Maybe I misunderstand - how can this be the responsibility of
the device if the device does not get the information about
the original ordering of the requests?

For example, if the driver is crazy enough to put
all write requests on one queue and all barriers
on another one, how is the device supposed to ensure
ordering?


I agree here, in fact I misread Hannes's comment as "if a driver uses 
more than one queue it is responsibility of the driver to ensure strict 
request ordering".  If you send requests to different queues, you know 
that those requests are independent.  I don't think anything else is 
feasible in the virtio framework.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio scsi host draft specification, v3

2011-06-29 Thread Paolo Bonzini

On 06/14/2011 10:39 AM, Hannes Reinecke wrote:

If, however, we decide to expose some details about the backend, we
could be using the values from the backend directly.
EG we could be forwarding the SCSI target port identifier here
(if backed by real hardware) or creating our own SAS-type
identifier when backed by qemu block. Then we could just query
the backend via a new command on the controlq
(eg 'list target ports') and wouldn't have to worry about any protocol
specific details here.


Besides the controlq command, which I can certainly add, this is 
actually quite similar to what I had in mind (though my plan likely 
would not have worked because I was expecting hierarchical LUNs used 
uniformly).  So, "list target ports" would return a set of LUN values to 
which you can send REPORT LUNS, or something like that?  I suppose that 
if you're using real hardware as the backing storage the in-kernel 
target can provide that.


For the QEMU backend I'd keep hierarchical LUNs, though of course one 
could add a FC or SAS bus to QEMU, each implementing its own identifier 
scheme.


If I understand it correctly, it should remain possible to use a single 
host for both pass-through and emulated targets.


Would you draft the command structure, so I can incorporate it into the 
spec?



Of course, when doing so we would be lose the ability to freely remap
LUNs. But then remapping LUNs doesn't gain you much imho.
Plus you could always use qemu block backend here if you want
to hide the details.


And you could always use the QEMU block backend with scsi-generic if you 
want to remap LUNs, instead of true passthrough via the kernel target.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio scsi host draft specification, v3

2011-06-29 Thread Paolo Bonzini

On 06/29/2011 12:03 PM, Christoph Hellwig wrote:

>  I agree here, in fact I misread Hannes's comment as "if a driver
>  uses more than one queue it is responsibility of the driver to
>  ensure strict request ordering".  If you send requests to different
>  queues, you know that those requests are independent.  I don't think
>  anything else is feasible in the virtio framework.

That doesn't really fit very well with the SAM model.  If we want
to use multiple queues for a single LUN it has to be transparent to
the SCSI command stream.  Then again I don't quite see the use for
that anyway.


Agreed, I see a use for multiple queues (MSI-X), but not for multiple 
queues shared by a single LUN.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio scsi host draft specification, v3

2011-06-29 Thread Paolo Bonzini

On 06/29/2011 12:31 PM, Michael S. Tsirkin wrote:

On Wed, Jun 29, 2011 at 12:06:29PM +0200, Paolo Bonzini wrote:

On 06/29/2011 12:03 PM, Christoph Hellwig wrote:

  I agree here, in fact I misread Hannes's comment as "if a driver
  uses more than one queue it is responsibility of the driver to
  ensure strict request ordering".  If you send requests to different
  queues, you know that those requests are independent.  I don't think
  anything else is feasible in the virtio framework.


That doesn't really fit very well with the SAM model.  If we want
to use multiple queues for a single LUN it has to be transparent to
the SCSI command stream.  Then again I don't quite see the use for
that anyway.


Agreed, I see a use for multiple queues (MSI-X), but not for
multiple queues shared by a single LUN.


Then let's make it explicit in the spec?


What, forbid it or say ordering is not guaranteed?  The latter is 
already explicit with the wording suggested in the thread.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio scsi host draft specification, v3

2011-06-30 Thread Paolo Bonzini

On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote:

> >  Of course, when doing so we would be lose the ability to freely remap
> >  LUNs. But then remapping LUNs doesn't gain you much imho.
> >  Plus you could always use qemu block backend here if you want
> >  to hide the details.
>
>  And you could always use the QEMU block backend with scsi-generic if you
>  want to remap LUNs, instead of true passthrough via the kernel target.

IIUC the in-kernel target always does remapping.  It passes through
individual LUNs rather than entire targets and you pick LU Numbers to
map to the backing storage (which may or may not be a SCSI
pass-through device).  Nicholas Bellinger can confirm whether this is
correct.


But then I don't understand.  If you pick LU numbers both with the 
in-kernel target and with QEMU, you do not need to use e.g. WWPNs with 
fiber channel, because we are not passing through the details of the 
transport protocol (one day we might have virtio-fc, but more likely 
not).  So the LUNs you use might as well be represented by hierarchical 
LUNs.


Using NPIV with KVM would be done by mapping the same virtual N_Port ID 
in the host(s) to the same LU number in the guest.  You might already do 
this now with virtio-blk, in fact.


Put in another way: the virtio-scsi device is itself a SCSI target, so 
yes, there is a single target port identifier in virtio-scsi.  But this 
SCSI target just passes requests down to multiple real targets, and so 
will let you do ALUA and all that.


Of course if I am dead wrong please correct me.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 09:42 AM, Hannes Reinecke wrote:

  size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
-const void *buf, size_t size)
+const void *buf, size_t offset, size_t size)


Wrong commit subject, it seems. :)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 09:42 AM, Hannes Reinecke wrote:

'tag' is just an abstraction to identify the command
from the driver. So we should make that explicit by
replacing 'tag' with a driver-defined pointer 'hba_private'.
This saves the lookup for driver handling several commands
in parallel.


This makes tracing a bit harder to follow.  Perhaps you can keep the 
transport tag (a uint64_t) in the SCSIRequest for debugging purposes?



Signed-off-by: Hannes Reinecke
---
  hw/esp.c  |2 +-
  hw/lsi53c895a.c   |   17 -
  hw/scsi-bus.c |   22 +++---
  hw/scsi-disk.c|5 ++---
  hw/scsi-generic.c |4 ++--
  hw/scsi.h |8 
  hw/spapr_vscsi.c  |   41 -
  hw/usb-msd.c  |   10 +-
  trace-events  |   14 +++---
  9 files changed, 52 insertions(+), 71 deletions(-)

diff --git a/hw/esp.c b/hw/esp.c
index 6d3f5d2..912ff89 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t 
busid)

  DPRINTF("do_busid_cmd: busid 0x%x\n", busid);
  lun = busid&  7;
-s->current_req = scsi_req_new(s->current_dev, 0, lun);
+s->current_req = scsi_req_new(s->current_dev, lun, s);


Might as well pass NULL here.  The hba_private value is basically 
unnecessary when the adapter doesn't support tagged command queuing.



diff --git a/hw/usb-msd.c b/hw/usb-msd.c
index 86582cc..4e2ea03 100644
--- a/hw/usb-msd.c
+++ b/hw/usb-msd.c
@@ -216,8 +216,8 @@ static void usb_msd_transfer_data(SCSIRequest *req, 
uint32_t len)
  MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
  USBPacket *p = s->packet;

-if (req->tag != s->tag) {
-fprintf(stderr, "usb-msd: Unexpected SCSI Tag 0x%x\n", req->tag);
+if (req->hba_private != s) {
+fprintf(stderr, "usb-msd: Unexpected SCSI command 0x%p\n", req);
  }


Same here, just pass NULL and remove these ifs.

Otherwise looks like a very good idea.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio scsi host draft specification, v3

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 09:14 AM, Hannes Reinecke wrote:

Actually, the kernel does _not_ do a LUN remapping.


Not the kernel, the in-kernel target.  The in-kernel target can and will
map hardware LUNs (target_lun in drivers/target/*) to arbitrary LUNs
(mapped_lun).


Put in another way: the virtio-scsi device is itself a SCSI
target,


Argl. No way. The virtio-scsi device has to map to a single LUN.


I think we are talking about different things. By "virtio-scsi device"
I meant the "virtio-scsi HBA".  When I referred to a LUN as seen by the
guest, I was calling it a "virtual SCSI device".  So yes, we were
calling things with different names.  Perhaps from now on
we can call them virtio-scsi {initiator,target,LUN} and have no
ambiguity?  I'll also modify the spec in this sense.


The SCSI spec itself only deals with LUNs, so anything you'll read in
there obviously will only handle the interaction between the
initiator (read: host) and the LUN itself. However, the actual
command is send via an intermediat target, hence you'll always see
the reference to the ITL (initiator-target-lun) nexus.


Yes, this I understand.


The SCSI spec details discovery of the individual LUNs presented by a
given target, it does _NOT_ detail the discovery of the targets
themselves.  That is being delegated to the underlying transport


And in fact I have this in virtio-scsi too, since virtio-scsi _is_ a
transport:

When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
is sent for LUN 0, the driver should ask the initiator to rescan
the target, in order to detect the case when an entire target has
appeared or disappeared.

[If the device fails] to report an event due to missing buffers,
[...] the driver should poll the logical units for unit attention
conditions, and/or do whatever form of bus scan is appropriate for
the guest operating system.


In the case of NPIV it would make sense to map the virtual SCSI host
 to the backend, so that all devices presented to the virtual SCSI
host will be presented to the backend, too. However, when doing so
these devices will normally be referenced by their original LUN, as
these will be presented to the guest via eg 'REPORT LUNS'.


Right.


The above thread now tries to figure out if we should remap those LUN
numbers or just expose them as they are. If we decide on remapping,
we have to emulate _all_ commands referring explicitely to those LUN
numbers (persistent reservations, anyone?).


But it seems to me that commands referring explicitly to LUN numbers
most likely have to be reimplemented anyway for virtualization.  I'm
thinking exactly of persistent reservations.  If two guests on the same
host try a persistent reservation, they should conflict with each other.
If reservation commands were just passed through, they would be seen
as coming from the same initiator (the HBA driver or iSCSI initiator in
the host OS).

etc.


If we don't, we would expose some hardware detail to the guest, but
would save us _a lot_ of processing.


But can we afford it?  And would the architecture allow that at all?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 03:11 PM, Hannes Reinecke wrote:

On 07/01/2011 10:27 AM, Paolo Bonzini wrote:

On 07/01/2011 09:42 AM, Hannes Reinecke wrote:

'tag' is just an abstraction to identify the command
from the driver. So we should make that explicit by
replacing 'tag' with a driver-defined pointer 'hba_private'.
This saves the lookup for driver handling several commands
in parallel.


This makes tracing a bit harder to follow. Perhaps you can keep the
transport tag (a uint64_t) in the SCSIRequest for debugging purposes?


Hmm. The transport tag wouldn't have any meaning outside scsi-bus.c.


It depends, in vmw_pvscsi I take it from a field in the request block 
that is 0..255.  So either you have a small tag that is recycled but 
stays nice, or a large tag that is unwieldy but should not be recycled 
ever.  A pointer is unwieldy _and_ is recycled, so it gives the worse of 
both worlds.


But I'm not very attached to this, I may even do it myself if/when I 
find the need.  Won't ack yet because of the nit with ESP/USB, but even 
if you do not bother I will ack the next respin.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] scsi: Add 'hba_private' to SCSIRequest

2011-07-01 Thread Paolo Bonzini
req = vscsi_find_req(s, sreq);
+vscsi_req *req = sreq->hba_private;
  int32_t res_in = 0, res_out = 0;

  dprintf("VSCSI: SCSI cmd complete, r=0x%x tag=0x%x status=0x%x, req=%p\n",
@@ -563,15 +553,14 @@ static void vscsi_command_complete(SCSIRequest *sreq, 
uint32_t status)
  }
  }
  vscsi_send_rsp(s, req, 0, res_in, res_out);
-vscsi_put_req(s, req);
+vscsi_put_req(req);
  }

  static void vscsi_request_cancelled(SCSIRequest *sreq)
  {
-VSCSIState *s = DO_UPCAST(VSCSIState, vdev.qdev, sreq->bus->qbus.parent);
-vscsi_req *req = vscsi_find_req(s, sreq);
+vscsi_req *req = sreq->hba_private;

-vscsi_put_req(s, req);
+vscsi_put_req(req);
  }

  static void vscsi_process_login(VSCSIState *s, vscsi_req *req)
@@ -659,7 +648,7 @@ static int vscsi_queue_cmd(VSCSIState *s, vscsi_req *req)
  }

  req->lun = lun;
-req->sreq = scsi_req_new(sdev, req->qtag, lun);
+req->sreq = scsi_req_new(sdev, req->qtag, lun, req);
  n = scsi_req_enqueue(req->sreq, srp->cmd.cdb);

  dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x ID %d LUN %d ret: %d\n",
@@ -858,7 +847,7 @@ static void vscsi_got_payload(VSCSIState *s, vscsi_crq *crq)
  }

  if (done) {
-vscsi_put_req(s, req);
+vscsi_put_req(req);
  }
  }

diff --git a/hw/usb-msd.c b/hw/usb-msd.c
index 86582cc..bfea096 100644
--- a/hw/usb-msd.c
+++ b/hw/usb-msd.c
@@ -216,10 +216,6 @@ static void usb_msd_transfer_data(SCSIRequest *req, 
uint32_t len)
  MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
  USBPacket *p = s->packet;

-if (req->tag != s->tag) {
-fprintf(stderr, "usb-msd: Unexpected SCSI Tag 0x%x\n", req->tag);
-}
-
  assert((s->mode == USB_MSDM_DATAOUT) == (req->cmd.mode == 
SCSI_XFER_TO_DEV));
  s->scsi_len = len;
  s->scsi_buf = scsi_req_get_buf(req);
@@ -241,9 +237,6 @@ static void usb_msd_command_complete(SCSIRequest *req, 
uint32_t status)
  MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
  USBPacket *p = s->packet;

-if (req->tag != s->tag) {
-fprintf(stderr, "usb-msd: Unexpected SCSI Tag 0x%x\n", req->tag);
-}
  DPRINTF("Command complete %d\n", status);
  s->residue = s->data_len;
  s->result = status != 0;
@@ -387,7 +380,7 @@ static int usb_msd_handle_data(USBDevice *dev, USBPacket *p)
  s->tag, cbw.flags, cbw.cmd_len, s->data_len);
  s->residue = 0;
  s->scsi_len = 0;
-s->req = scsi_req_new(s->scsi_dev, s->tag, 0);
+s->req = scsi_req_new(s->scsi_dev, s->tag, 0, NULL);
  scsi_req_enqueue(s->req, cbw.cmd);
  /* ??? Should check that USB and SCSI data transfer
 directions match.  */


Acked-by: Paolo Bonzini 

Have a nice weekend.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] megasas: LSI Megaraid SAS emulation

2011-07-03 Thread Paolo Bonzini

On 07/02/2011 03:50 PM, Hannes Reinecke wrote:

(And no, I will not getting into another dog-fight with Paul B. here.
Virtio can do without bounce buffers. AHCI can. So I fail to see why
SCSI has to rely on bounce buffers.)


I agree, but I do see why a SCSI device might prefer to rely on bounce 
buffers for non-I/O commands.  This is why in my last RFC series for 
vmw_pvscsi I let the device choose whether to force a bounce buffer or 
get an external iovec from the HBA.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] megasas: LSI Megaraid SAS emulation

2011-07-03 Thread Paolo Bonzini

On 07/04/2011 08:13 AM, Hannes Reinecke wrote:

On 07/03/2011 04:36 PM, Paolo Bonzini wrote:

On 07/02/2011 03:50 PM, Hannes Reinecke wrote:

(And no, I will not getting into another dog-fight with Paul B. here.
Virtio can do without bounce buffers. AHCI can. So I fail to see why
SCSI has to rely on bounce buffers.)


I agree, but I do see why a SCSI device might prefer to rely on
bounce buffers for non-I/O commands. This is why in my last RFC
series for vmw_pvscsi I let the device choose whether to force a
bounce buffer or get an external iovec from the HBA.


Yes, sure, for non-I/O commands it's perfectly okay.
Most of which will be emulated anyway.
It's bounce buffers for I/O which kills performance.

But I seem to have missed your last RFC (I'm not reading qemu-devel on a
regular basis ...).
Care to send me a pointer to it?


Sure, http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg00668.html

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] megasas: LSI Megaraid SAS emulation

2011-07-04 Thread Paolo Bonzini

On 07/04/2011 09:26 AM, Hannes Reinecke wrote:



Cool.
Exactly what I need.

FWIW, feel free to add my 'Acked-by' to it.

Any chance of getting them included?


I'm not very tied to pvscsi; I just needed an HBA that is not a joke by 
modern standards :) to play with the SCSI layer.  There may be hope that 
megasas will come before pvscsi or eliminate the need for it altogether. 
 So, feel free to pick up patches 5-8 from that series.


That said, note that scsi-generic does *not* support scatter/gather in 
that series, so you should still make sure the fallback paths work well. 
:)  In pvscsi I added a qdev property to toggle scatter/gather, it was 
useful for benchmarking.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] megasas: LSI Megaraid SAS emulation

2011-07-04 Thread Paolo Bonzini

On 07/04/2011 02:52 PM, Hannes Reinecke wrote:

However, I probably will see to fixup the megasas emulation with the
current infrastructure, get that in, and then move over to the iovec
infrastructure.


Makes sense also for ease of review.


But if you promise to merge the iovec infrastructure I'm more than
willing to fixup the scsi-generic backend.


I am certainly going to get it upstream, but maintainers would like it 
to have users at the time it is committed.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for July 5

2011-07-04 Thread Paolo Bonzini

On 07/04/2011 09:52 AM, Juan Quintela wrote:

Hi

Please send in any agenda items you are interested in covering.


SCSI without bounce buffers: talk now, or forever hold your peace.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] Avoid Wunsed-but-set warnings (or errors in case of Werror)

2011-07-05 Thread Paolo Bonzini

On 07/05/2011 09:49 AM, Markus Armbruster wrote:

>  If 'ret' has been used to silence compiler warnings about functions
>  which have been declared with attribute __warn_unused_result__
>  (eg write() and various other libc functions) then "(void)write()"
>  is insufficient -- gcc requires the variable.

gcc being silly.  Oh well.


In this particular case I think that the return value should be checked. 
 It's good if something is printed in the log saying that the reset 
wasn't done for some reason---even if it is just defensive.


The really silly thing is in glibc, not gcc.  __warn_unused_result__ was 
added to fwrite, where you have no certainty that the write has been 
done, but not to either fclose or fflush.  Oh well...


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] scsi-disk: Fixup debugging statement

2011-07-05 Thread Paolo Bonzini

On 07/05/2011 01:03 PM, Hannes Reinecke wrote:

A debugging statement wasn't converted to the new interface.

Signed-off-by: Hannes Reinecke
---
  hw/scsi-disk.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index c2a99fe..5804662 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -1007,7 +1007,7 @@ static int32_t scsi_send_command(SCSIRequest *req, 
uint8_t *buf)

  command = buf[0];
  outbuf = (uint8_t *)r->iov.iov_base;
-DPRINTF("Command: lun=%d tag=0x%x data=0x%02x", lun, tag, buf[0]);
+DPRINTF("Command: lun=%d tag=0x%x data=0x%02x", req->lun, req->tag, 
buf[0]);

  if (scsi_req_parse(&r->req, buf) != 0) {
  BADF("Unsupported command length, command %x\n", command);


Acked-by: Paolo Bonzini 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] scsi-disk: Mask out serial number EVPD

2011-07-05 Thread Paolo Bonzini

On 07/05/2011 01:03 PM, Hannes Reinecke wrote:

If the serial number is not set we should mask it out in the
list of supported VPD pages and mark it as not supported.

Signed-off-by: Hannes Reinecke
---
  hw/scsi-disk.c |   15 ---
  1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 5804662..05d14ab 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -398,7 +398,8 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
uint8_t *outbuf)
  "buffer size %zd\n", req->cmd.xfer);
  pages = buflen++;
  outbuf[buflen++] = 0x00; // list of supported pages (this page)
-outbuf[buflen++] = 0x80; // unit serial number
+if (s->serial)
+outbuf[buflen++] = 0x80; // unit serial number
  outbuf[buflen++] = 0x83; // device identification
  if (s->drive_kind == SCSI_HD) {
  outbuf[buflen++] = 0xb0; // block limits
@@ -409,8 +410,14 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
uint8_t *outbuf)
  }
  case 0x80: /* Device serial number, optional */
  {
-int l = strlen(s->serial);
+int l;

+if (!s->serial) {
+DPRINTF("Inquiry (EVPD[Serial number] not supported\n");
+return -1;
+}
+
+l = strlen(s->serial);
  if (l>  req->cmd.xfer)
  l = req->cmd.xfer;
  if (l>  20)
@@ -1203,7 +1210,9 @@ static int scsi_initfn(SCSIDevice *dev, SCSIDriveKind 
kind)
  if (!s->serial) {
  /* try to fall back to value set with legacy -drive serial=... */
  dinfo = drive_get_by_blockdev(s->bs);
-s->serial = qemu_strdup(*dinfo->serial ? dinfo->serial : "0");
+if (*dinfo->serial) {
+s->serial = qemu_strdup(dinfo->serial);
+}
  }

  if (!s->version) {


Acked-by: Paolo Bonzini 

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] megasas: LSI Megaraid SAS emulation

2011-07-05 Thread Paolo Bonzini
On 07/05/2011 03:38 PM, Alexander Graf wrote:
>>
>> +if (is_sgl64) {
>> +iov_pa = ldq_phys(pa);
>> +} else {
>> +iov_pa = ldl_phys(pa);
> 
> These load data from memory in target endianness. Are you sure that's 
> what you want? I'd expect this to be defined as little endian 
> (especially given that ldq and ldl on the same address work).

Seems to be target endianness from the corresponding Linux code:

if (sge_count) {
scsi_for_each_sg(scp, os_sgl, sge_count, i) {
mfi_sgl->sge32[i].length = sg_dma_len(os_sgl);
mfi_sgl->sge32[i].phys_addr = sg_dma_address(os_sgl);
}
}

...

if (sge_count) {
scsi_for_each_sg(scp, os_sgl, sge_count, i) {
mfi_sgl->sge64[i].length = sg_dma_len(os_sgl);
mfi_sgl->sge64[i].phys_addr = sg_dma_address(os_sgl);
}
}

Note that this is _either_ a ldq or a ldl depending on what the driver told
the device.  It is not accessing a 64-bit value as 32-bit.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 5/5] megasas: LSI Megaraid SAS emulation

2011-07-06 Thread Paolo Bonzini

On 07/06/2011 08:20 AM, Hannes Reinecke wrote:


We cannot map control structures from guest memory and treating them
as valid request state later on.


Yes, I've been working on that one already.
What I'll be doing is to read in the sge count during 'map_sgl' and
store this value internally (in ->iov_cnt). And during unmap I'll be
using this value instead of the frame-provided one.

That way we'll be checking the sge_count field only once when we slurp
in the entire frame.


Note the flags too.  Perhaps it's easier to simply copy the entire frame 
header...


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New thread for the VM migration

2011-07-15 Thread Paolo Bonzini

On 07/14/2011 06:07 PM, Avi Kivity wrote:


Maybe we can do this via a magic subsection whose contents are the
hotplug event.


What about making the device list is just another "thing" to be migrated 
live, together with block and ram?


Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New thread for the VM migration

2011-07-15 Thread Paolo Bonzini

On 07/14/2011 06:07 PM, Avi Kivity wrote:


Maybe we can do this via a magic subsection whose contents are the
hotplug event.


What about making the device list just another "thing" that has to be 
migrated live, together with block and ram?


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Native Linux KVM tool for 3.1

2011-07-25 Thread Paolo Bonzini

On 07/25/2011 09:50 AM, Sasha Levin wrote:

Anthony had a talk on last years KVM forum regarding the QEMU threading
model (slide:
http://www.linux-kvm.org/wiki/images/7/70/2010-forum-threading-qemu.pdf) .

It was suggested that the KVM part of QEMU is having a hard time
achieving the ideal threading model due to its need to support TCG -
something which has nothing to do with KVM itself.


No, it is not having a hard time.  The "foot in the door" that Anthony 
mentions has been part of QEMU and qemu-kvm for a long time 
(multi-threading is necessary to support SMP!) and works quite well.



Historically, there were three main loops:

1) QEMU single-threaded;

2) QEMU multi-threaded; clean, but buggy, untested and bitrotting;

3) qemu-kvm multi-threaded, forked from (1), ugly but robust and widely 
deployed.


In 0.15 the two multi-threaded versions have been unified by Jan Kiszka. 
 I have even ported (2) to Windows with little or no pain; porting to 
Mac OS X interestingly is harder than Windows, because non-portable 
Linux assumptions about signal handling have crept in the code (to 
preempt the objections: they weren't just non-portabilities, they were 
latent bugs).  Windows just does not have signals. :)


So, right now, the only difference is that QEMU is still defaulting to 
the single-threaded main loop, while qemu-kvm enables multi-threading by 
default.  In some time even QEMU will switch.


Yes, this is of course worse than getting it right in the first place; 
Nobody is saying the opposite.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()

2011-07-25 Thread Paolo Bonzini

On 07/25/2011 04:23 PM, Blue Swirl wrote:

>  Yes.  We can just make qemu_malloc use g_malloc.

It would be also possible to make g_malloc() use qemu_malloc(). That
way we could keep the tracepoints which would lose their value with
g_malloc() otherwise.


qemu_malloc uses g_malloc => you keep tracepoints, you just do not trace 
memory allocated by glib


g_malloc uses qemu_malloc => you keep and expand tracepoints, you lose 
the very nicely tuned allocator


The former is much less code, however it requires qemu_malloc to be 
always balanced with qemu_free (patches ready and on my github tree, 
won't be sent before KVM Forum though...).


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()

2011-07-25 Thread Paolo Bonzini

On 07/25/2011 04:56 PM, Blue Swirl wrote:

> qemu_malloc uses g_malloc => you keep tracepoints, you just do not trace
> memory allocated by glib

Unless the plan is to replace all qemu_malloc() calls with calls to g_malloc().


We can worry when the day comes... there is already another task 
blocking that replacement (balancing qemu_malloc with qemu_free).



>  The former is much less code, however it requires qemu_malloc to be always
>  balanced with qemu_free (patches ready and on my github tree, won't be sent
>  before KVM Forum though...).

Freeing qemu_malloc() memory with plain free() is a bug.


We have many bugs, then. :)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] separate thread for VM migration

2011-07-26 Thread Paolo Bonzini

On 07/22/2011 09:58 PM, Umesh Deshapnde wrote:

-qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+qemu_mod_timer(s->timer, qemu_get_clock_ms(migration_clock) + 100);

  if (s->freeze_output)
  return;
@@ -246,8 +246,10 @@ static void buffered_rate_tick(void *opaque)

  buffered_flush(s);

-/* Add some checks around this */
  s->put_ready(s->opaque);
+qemu_mutex_unlock_iothread();
+usleep(qemu_timer_difference(s->timer, migration_clock) * 1000);
+qemu_mutex_lock_iothread();
  }


Do you really need a timer?  The timer will only run in the migration 
thread, where you should have full control on when to wait.


The BufferedFile code is more general, but you can create one thread per 
BufferedFile (you will only have one).  Then, since you only have one 
timer, calling buffered_rate_tick repeatedly is really all that is done 
in the body of the thread:


while (s->state == MIG_STATE_ACTIVE)
start_time = qemu_get_clock_ms(rt_clock);
buffered_rate_tick (s->file);

qemu_mutex_unlock_iothread();
usleep ((qemu_get_clock_ms(rt_clock) + 100 - start_time) * 1000);
qemu_mutex_lock_iothread();
}


Perhaps the whole QEMUFile abstraction should be abandoned as the basis 
of QEMUBufferedFile.  It is simply too heavy when you're working in a 
separate thread and you can afford blocking operation.


I also am not sure about the correctness.  Here you removed the delayed 
call to migrate_fd_put_notify, which is what resets freeze_output to 0:



@@ -327,9 +320,7 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
*data, size_t size)
  if (ret == -1)
  ret = -(s->get_error(s));

-if (ret == -EAGAIN) {
-qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
-} else if (ret<  0) {
+if (ret<  0&&  ret != -EAGAIN) {
  if (s->mon) {
  monitor_resume(s->mon);
  }


The call to migrate_fd_put_notify is here:


@@ -396,6 +401,9 @@ void migrate_fd_put_ready(void *opaque)
 }
 s->state = state;
 notifier_list_notify(&migration_state_notifiers);
+} else {
+migrate_fd_wait_for_unfreeze(s);
+qemu_file_put_notify(s->file);
 }
 }



But it is not clear to me what calls migrate_fd_put_ready when the 
output file is frozen.


Finally, here you are busy waiting:


@@ -340,10 +331,34 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
*data, size_t size)
  return ret;
  }

-void migrate_fd_connect(FdMigrationState *s)
+void *migrate_run_timers(void *arg)
  {
+FdMigrationState *s = arg;
  int ret;

+qemu_mutex_lock_iothread();
+ret = qemu_savevm_state_begin(s->mon, s->file, s->mig_state.blk,
+s->mig_state.shared);
+if (ret<  0) {
+DPRINTF("failed, %d\n", ret);
+migrate_fd_error(s);
+return NULL;
+}
+
+migrate_fd_put_ready(s);
+
+while (s->state == MIG_STATE_ACTIVE) {
+qemu_run_timers(migration_clock);
+}


which also convinces me that if you make rate limiting the main purpose 
of the migration thread's main loop, most problems will go away.  You 
can also use select() instead of usleep, so that you fix the problem 
with qemu_file_put_notify.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/23] memory: merge adjacent segments of a single memory region

2011-07-26 Thread Paolo Bonzini

On 07/26/2011 01:26 PM, Avi Kivity wrote:

+while (i < view->nr) {
+j = i + 1;
+while (j < view->nr
+   && can_merge(&view->ranges[j-1], &view->ranges[j])) {
+view->ranges[i].addr.size += view->ranges[j].addr.size;
+++j;
+}
+++i;


if (j != i) {


+memmove(&view->ranges[i], &view->ranges[j],
+(view->nr - j) * sizeof(view->ranges[j]));
+view->nr -= j - i;
+}


}


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/23] memory: merge adjacent segments of a single memory region

2011-07-26 Thread Paolo Bonzini

On 07/26/2011 01:38 PM, Avi Kivity wrote:


if (j != i) {


+memmove(&view->ranges[i], &view->ranges[j],
+(view->nr - j) * sizeof(view->ranges[j]));
+view->nr -= j - i;
+}


}


Seems to work both ways?


Sure, but you're pointlessly memmove-ing memory over itself.  I'm not 
sure how many segments a single memory region will usually have, but 
it's better to be safe.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows7 crashes inside the VM when starting a certain program

2011-07-28 Thread Paolo Bonzini

On 07/07/2011 07:26 AM, André Weidemann wrote:

Hi,
I am running Windows7 x64 in a VM which crashes after starting a certain
game. Actually there are two games both from the same company, that make
the VM crash after starting them.
Windows crashes right after starting the game. With the 1st game the
screen goes black as usual and the cursor keeps spinning for 3-5 seconds
until Windows crashes. With the second game I get to 3D the login
screen. The game then crashes after logging in.
Windows displays this error message on the first crash:
http://pastebin.com/kMzk9Jif
Windows then finishes writing the crash dump and restarts.
I can reproduce Windows crashing every time I start the game while the
VM keeps running without any problems.
When Windows reboots after the first crash and the game is started
again, the message on the following blue screen changes slightly and
stays the same(except for the addresses) for every following crash:
http://pastebin.com/jVtBc4ZH


The blue screens seem to be for the same exception, 
0xC096--privileged instruction, only sometimes in user mode (but in 
a system service, which also causes a blue screen) sometimes in kernel mode.


Can you open the produced dump in WinDbg and post a disassemble around 
the failing instruction?


Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows7 crashes inside the VM when starting a certain program

2011-07-28 Thread Paolo Bonzini

On 07/28/2011 03:21 PM, Avi Kivity wrote:

I haven't used debuggers very much, so I hope I grabbed the correct
lines from the disassembly:
http://pastebin.com/t3sfvmTg


That's the bug check routine.  Can you go up a frame?


Or just do what Gleb suggested.  Open the dump, type "!analyze -v" and 
cut-paste the address from WinDbg's output into the Disassemble window.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows7 crashes inside the VM when starting a certain program

2011-07-29 Thread Paolo Bonzini

On 07/28/2011 07:44 PM, André Weidemann wrote:

Hi,

On 28.07.2011 15:49, Paolo Bonzini wrote:

On 07/28/2011 03:21 PM, Avi Kivity wrote:

I haven't used debuggers very much, so I hope I grabbed the correct
lines from the disassembly:
http://pastebin.com/t3sfvmTg


That's the bug check routine. Can you go up a frame?


Or just do what Gleb suggested. Open the dump, type "!analyze -v" and
cut-paste the address from WinDbg's output into the Disassemble window.


This is the output of "!analyze -v":
http://pastebin.com/sCZSjr8m

...and this is the output from the disassembly window:
http://pastebin.com/AVZuswkT


Very useful, thanks!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/3] separate thread for VM migration

2011-08-01 Thread Paolo Bonzini

On 07/29/2011 10:57 PM, Umesh Deshpande wrote:

This patch creates a separate thread for the guest migration on the source side.

Signed-off-by: Umesh Deshpande


Looks pretty good!

One thing that shows, is that the interface separation between 
buffered_file.c is migration.c is pretty weird.  Your patch makes it 
somewhat worse, but it was like this before so it's not your fault.  The 
good thing is that if buffered_file.c uses threads, you can fix a large 
part of this and get even simpler code:


1) there is really just one way to implement migrate_fd_put_notify, and 
with your simplifications it does not belong anymore in migration.c.


2) s->callback is actually not NULL exactly if s->file->frozen_output is 
true, you can remove it as well;


3) buffered_close is messy because it can be called from both the 
iothread (monitor->migrate_fd_cancel->migrate_fd_cleanup->qemu_fclose) 
or the migration thread (after qemu_savevm_state_complete).  But 
buffered_close is actually very similar to your thread function (it does 
flush+wait_for_unfreeze, basically)!  So buffered_close can be simply:


s->closed = 1;
ret = qemu_thread_join(s->thread); /* doesn't exist yet :) */
qemu_free(...);
return ret;

Another nit is that here:


+if (migrate_fd_check_expire()) {
+buffered_rate_tick(s->file);
+}
+
+if (s->state != MIG_STATE_ACTIVE) {
+break;
+}
+
+if (s->callback) {
+migrate_fd_wait_for_unfreeze(s);
+s->callback(s);
+}


you can still have a busy wait.

Putting it all together, you can move the thread function back to 
buffered_file.c like:


while (!s->closed || (!s->has_error && s->buffer_size)) {
if (s->freeze_output) {
qemu_mutex_unlock_iothread();
s->wait_for_unfreeze(s);
qemu_mutex_lock_iothread();
/* This comes from qemu_file_put_notify (via
   buffered_put_buffer---can be simplified a lot too?).
s->freeze_output = 0;
/* Test again for cancellation.  */
continue;
}

int64_t current_time = qemu_get_clock_ms(rt_clock);
if (s->expire_time > current_time) {
struct timeval tv = { .tv_sec = 0, .tv_usec = ... };
qemu_mutex_unlock_iothread();
select (0, NULL, NULL, NULL, &tv);
qemu_mutex_lock_iothread();
s->expire_time = qemu_get_clock_ms(rt_clock) + 100;
continue;
}

/* This comes from buffered_rate_tick.  */
s->bytes_xfer = 0;
buffered_flush(s);
if (!s->closed) {
s->put_ready(s->opaque);
}
}

ret = s->close(s->opaque);
...

Does it look sane?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 2/3] fine grained qemu_mutex locking for migration

2011-08-01 Thread Paolo Bonzini

On 07/29/2011 10:57 PM, Umesh Deshpande wrote:

+qemu_mutex_unlock_iothread();

  while (s->state == MIG_STATE_ACTIVE) {
  if (migrate_fd_check_expire()) {
+qemu_mutex_lock_iothread();
  buffered_rate_tick(s->file);
+qemu_mutex_unlock_iothread();
  }

  if (s->state != MIG_STATE_ACTIVE) {
@@ -392,12 +396,11 @@ void migrate_fd_begin(void *arg)

  if (s->callback) {
  migrate_fd_wait_for_unfreeze(s);
+qemu_mutex_lock_iothread();
  s->callback(s);
+qemu_mutex_unlock_iothread();
  }
  }
-
-out:
-qemu_mutex_unlock_iothread();


I think it's clearer to unlock explicitly around the waiting points (see 
review of 1/3).  In fact, I think you're working around the busy wait by 
accessing s->state outside the lock, right?  I don't think this is 
provably safe; moving the knowledge of the thread entirely within 
buffered_file.c also fixes this, because then the lifetimes of the 
thread and the QEMUFile are much clearer.


Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/3] separate thread for VM migration

2011-08-02 Thread Paolo Bonzini

On 08/01/2011 11:00 PM, Umesh Deshpande wrote:



I kept this in migration.c to call qemu_savevm_state_begin. (The way it
is done currently. i.e. to keep access to FdMigrationState in migration.c)
Calling it from buffered_file.c would be inconsistent in that sense. or
we will have to call it from the iothread before spawning the migration
thread.


Right, I missed that.  Perhaps you can call it the first time put_ready 
is called.



Also why is the separation between FdMigrationState and QEMUFileBuffered
is required. Is QEMUFileBuffered designed to use also for things other
than migration?


No, but let's keep it this way for now.  It may be an annoyance, but it 
also helps making a reusable architecture, and it can probably be 
cleaned up substantially with thread support.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 1/4] separate thread for VM migration

2011-08-11 Thread Paolo Bonzini

On 08/11/2011 05:32 PM, Umesh Deshpande wrote:

This patch creates a separate thread for the guest migration on the source side.
migrate_cancel request from the iothread is handled asynchronously. That is,
iothread submits migrate_cancel to the migration thread and returns, while the
migration thread attends this request at the next iteration to terminate its
execution.


Looks pretty good!  I hope you agree. :)  Just one note inside.


Signed-off-by: Umesh Deshpande
---
  buffered_file.c |   85 --
  buffered_file.h |4 ++
  migration.c |   49 ++-
  migration.h |6 
  4 files changed, 82 insertions(+), 62 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 41b42c3..19932b6 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -16,6 +16,8 @@
  #include "qemu-timer.h"
  #include "qemu-char.h"
  #include "buffered_file.h"
+#include "migration.h"
+#include "qemu-thread.h"

  //#define DEBUG_BUFFERED_FILE

@@ -28,13 +30,14 @@ typedef struct QEMUFileBuffered
  void *opaque;
  QEMUFile *file;
  int has_error;
+int closed;
  int freeze_output;
  size_t bytes_xfer;
  size_t xfer_limit;
  uint8_t *buffer;
  size_t buffer_size;
  size_t buffer_capacity;
-QEMUTimer *timer;
+QemuThread thread;
  } QEMUFileBuffered;

  #ifdef DEBUG_BUFFERED_FILE
@@ -155,14 +158,6 @@ static int buffered_put_buffer(void *opaque, const uint8_t 
*buf, int64_t pos, in
  offset = size;
  }

-if (pos == 0&&  size == 0) {
-DPRINTF("file is ready\n");
-if (s->bytes_xfer<= s->xfer_limit) {
-DPRINTF("notifying client\n");
-s->put_ready(s->opaque);
-}
-}
-
  return offset;
  }

@@ -175,20 +170,20 @@ static int buffered_close(void *opaque)

  while (!s->has_error&&  s->buffer_size) {
  buffered_flush(s);
-if (s->freeze_output)
+if (s->freeze_output) {
  s->wait_for_unfreeze(s);
+}
  }


This is racy; you might end up calling buffered_put_buffer twice from 
two different threads.



-ret = s->close(s->opaque);
+s->closed = 1;

-qemu_del_timer(s->timer);
-qemu_free_timer(s->timer);
+ret = s->close(s->opaque);
  qemu_free(s->buffer);
-qemu_free(s);


... similarly, here the migration thread might end up using the buffer. 
 Just set s->closed here and wait for thread completion; the migration 
thread can handle the flushes free the buffer etc.  Let the migration 
thread do as much as possible, it will simplify your life.



  return ret;
  }

+
  static int buffered_rate_limit(void *opaque)
  {
  QEMUFileBuffered *s = opaque;
@@ -228,34 +223,55 @@ static int64_t buffered_get_rate_limit(void *opaque)
  return s->xfer_limit;
  }

-static void buffered_rate_tick(void *opaque)
+static void *migrate_vm(void *opaque)
  {
  QEMUFileBuffered *s = opaque;
+int64_t current_time, expire_time = qemu_get_clock_ms(rt_clock) + 100;
+struct timeval tv = { .tv_sec = 0, .tv_usec = 10};

-if (s->has_error) {
-buffered_close(s);
-return;
-}
+qemu_mutex_lock_iothread();

-qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+while (!s->closed) {


... This can be in fact

while (!s->closed || s->buffered_size)

and that alone will subsume the loop in buffered_close, no?


+if (s->freeze_output) {
+s->wait_for_unfreeze(s);
+s->freeze_output = 0;
+continue;
+}

-if (s->freeze_output)
-return;
+if (s->has_error) {
+break;
+}
+
+current_time = qemu_get_clock_ms(rt_clock);
+if (!s->closed&&  (expire_time>  current_time)) {
+tv.tv_usec = 1000 * (expire_time - current_time);
+select(0, NULL, NULL, NULL,&tv);
+continue;
+}

-s->bytes_xfer = 0;
+s->bytes_xfer = 0;
+buffered_flush(s);

-buffered_flush(s);
+expire_time = qemu_get_clock_ms(rt_clock) + 100;
+s->put_ready(s->opaque);
+}

-/* Add some checks around this */
-s->put_ready(s->opaque);
+if (s->has_error) {
+buffered_close(s);
+}
+qemu_free(s);
+
+qemu_mutex_unlock_iothread();
+
+return NULL;
  }

  QEMUFile *qemu_fopen_ops_buffered(void *opaque,
-  size_t bytes_per_sec,
-  BufferedPutFunc *put_buffer,
-  BufferedPutReadyFunc *put_ready,
-  BufferedWaitForUnfreezeFunc 
*wait_for_unfreeze,
-  BufferedCloseFunc *close)
+size_t bytes_per_sec,
+BufferedPutFunc *put_buffer,
+BufferedPutReadyFunc *put_ready,
+BufferedWaitForUnfreezeFunc *wait_for_unfreeze,
+BufferedCloseFunc *close)
  {
  QEMUFileBuffered *s;

@@ -267,15 +

Re: [RFC PATCH v3 3/4] lock to protect memslots

2011-08-11 Thread Paolo Bonzini

diff --git a/buffered_file.c b/buffered_file.c
index b64ada7..5735e18 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -243,7 +243,9 @@ static void *migrate_vm(void *opaque)

  while (!s->closed) {
  if (s->freeze_output) {
+qemu_mutex_unlock_iothread();
  s->wait_for_unfreeze(s);
+qemu_mutex_lock_iothread();
  s->freeze_output = 0;
  continue;
  }
@@ -255,7 +257,9 @@ static void *migrate_vm(void *opaque)
  current_time = qemu_get_clock_ms(rt_clock);
  if (!s->closed&&  (expire_time>  current_time)) {
  tv.tv_usec = 1000 * (expire_time - current_time);
+qemu_mutex_unlock_iothread();
  select(0, NULL, NULL, NULL,&tv);
+qemu_mutex_lock_iothread();
  continue;
  }



I believe these should be already in patch 1 or anyway in a separate patch.


diff --git a/cpus.c b/cpus.c
index de70e02..6090c44 100644
--- a/cpus.c
+++ b/cpus.c
@@ -666,6 +666,7 @@ int qemu_init_main_loop(void)
  qemu_cond_init(&qemu_work_cond);
  qemu_mutex_init(&qemu_fair_mutex);
  qemu_mutex_init(&qemu_global_mutex);
+qemu_mutex_init(&ram_list.mutex);
  qemu_mutex_lock(&qemu_global_mutex);

  qemu_thread_get_self(&io_thread);


These are only executed if CONFIG_IOTHREAD; you can probably move it 
somewhere in exec.c.



@@ -919,6 +920,17 @@ void qemu_mutex_unlock_iothread(void)
  qemu_mutex_unlock(&qemu_global_mutex);
  }

+void qemu_mutex_lock_ramlist(void)
+{
+qemu_mutex_lock(&ram_list.mutex);
+}
+
+void qemu_mutex_unlock_ramlist(void)
+{
+qemu_mutex_unlock(&ram_list.mutex);
+}
+
+


And these too.


  static int all_vcpus_paused(void)
  {
  CPUState *penv = first_cpu;
diff --git a/exec.c b/exec.c
index 0e2ce57..7bfb36f 100644
--- a/exec.c
+++ b/exec.c
@@ -2972,6 +2972,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, 
const char *name,
  }
  new_block->length = size;

+qemu_mutex_lock_ramlist();
+
  QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);

  ram_list.phys_dirty = qemu_realloc(ram_list.phys_dirty,
@@ -2979,6 +2981,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, 
const char *name,
  memset(ram_list.phys_dirty + (new_block->offset>>  TARGET_PAGE_BITS),
 0xff, size>>  TARGET_PAGE_BITS);

+qemu_mutex_unlock_ramlist();
+
  if (kvm_enabled())
  kvm_setup_guest_memory(new_block->host, size);

@@ -2996,7 +3000,9 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)

  QLIST_FOREACH(block,&ram_list.blocks, next) {
  if (addr == block->offset) {
+qemu_mutex_lock_ramlist();
  QLIST_REMOVE(block, next);
+qemu_mutex_unlock_ramlist();
  qemu_free(block);
  return;
  }
@@ -3009,7 +3015,9 @@ void qemu_ram_free(ram_addr_t addr)

  QLIST_FOREACH(block,&ram_list.blocks, next) {
  if (addr == block->offset) {
+qemu_mutex_lock_ramlist();
  QLIST_REMOVE(block, next);
+qemu_mutex_unlock_ramlist();
  if (block->flags&  RAM_PREALLOC_MASK) {
  ;
  } else if (mem_path) {
@@ -3117,8 +3125,10 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
  if (addr - block->offset<  block->length) {
  /* Move this entry to to start of the list.  */
  if (block != QLIST_FIRST(&ram_list.blocks)) {
+qemu_mutex_lock_ramlist();
  QLIST_REMOVE(block, next);
  QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
+qemu_mutex_unlock_ramlist();


Theoretically qemu_get_ram_ptr should be protected.  The problem is not 
just accessing the ramlist, it is accessing the data underneath it 
before anyone frees it.  Luckily we can set aside that problem for now, 
because qemu_ram_free_from_ptr is only used by device assignment and 
device assignment makes VMs unmigratable.


If we support generic RAM hotplug/hotunplug, we would probably need 
something RCU-like to protect accesses, since protecting all uses of 
qemu_get_ram_ptr is not practical.



  }
  if (xen_mapcache_enabled()) {
  /* We need to check if the requested address is in the RAM
diff --git a/qemu-common.h b/qemu-common.h
index abd7a75..b802883 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -212,6 +212,8 @@ char *qemu_strndup(const char *str, size_t size);

  void qemu_mutex_lock_iothread(void);
  void qemu_mutex_unlock_iothread(void);
+void qemu_mutex_lock_ramlist(void);
+void qemu_mutex_unlock_ramlist(void);

  int qemu_open(const char *name, int flags, ...);
  ssize_t qemu_write_full(int fd, const void *buf, size_t count)


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 0/4] Separate thread for VM migration

2011-08-11 Thread Paolo Bonzini

On 08/11/2011 05:32 PM, Umesh Deshpande wrote:

Following patch series deals with VCPU and iothread starvation during the
migration of
a guest. Currently the iothread is responsible for performing the guest
migration. It holds qemu_mutex during the migration and doesn't allow VCPU to
enter the qemu mode and delays its return to the guest. The guest migration,
executed as an iohandler also delays the execution of other iohandlers.
In the following patch series,

The migration has been moved to a separate thread to
reduce the qemu_mutex contention and iohandler starvation.

Umesh Deshpande (4):
   separate thread for VM migration
   synchronous migrate_cancel
   lock to protect memslots
   separate migration bitmap

  arch_init.c |   26 ++
  buffered_file.c |  100 +--
  buffered_file.h |4 ++
  cpu-all.h   |   39 
  cpus.c  |   12 ++
  exec.c  |   74 +
  hw/hw.h |5 ++-
  migration.c |   50 --
  migration.h |6 +++
  qemu-common.h   |2 +
  qemu-thread-posix.c |   10 +
  qemu-thread.h   |1 +
  savevm.c|   31 +++-
  13 files changed, 280 insertions(+), 80 deletions(-)



Looks quite good.  Note that when submitting for inclusion, the last two 
patches should go first, so that you only add the migration thread after 
you're race free.  If there are parts that do not apply before patches 
1/2, that likely means they are in the wrong patch. :)


Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 1/4] separate thread for VM migration

2011-08-11 Thread Paolo Bonzini

On 08/11/2011 07:36 PM, Umesh Deshpande wrote:



Now, migrate_fd_cleanup, buffured_close is just executed by the
migration thread.
I am not letting iothread call any migration cancellation related
functions. In stead it just submits the request and waits for the
migration thread to terminate itself in the next iteration.
The reason is to avoid the call to qemu_fflush,
qemu_savevm_state_cancel (to carry out migrate_cancel) from iothread
while migration thread is transferring data without holding the locks.


Got it now, thanks for explaining.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 3/4] lock to protect memslots

2011-08-11 Thread Paolo Bonzini

On 08/11/2011 06:20 PM, Paolo Bonzini wrote:


+qemu_mutex_lock_ramlist();
  QLIST_REMOVE(block, next);
  QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
+qemu_mutex_unlock_ramlist();


Theoretically qemu_get_ram_ptr should be protected.  The problem is not
just accessing the ramlist, it is accessing the data underneath it
before anyone frees it.  Luckily we can set aside that problem for now,
because qemu_ram_free_from_ptr is only used by device assignment and
device assignment makes VMs unmigratable.


Hmm, rethinking about it, all the loops in exec.c should be protected 
from the mutex.  That's not too good because qemu_get_ram_ptr is a hot 
path for TCG.  Perhaps you can also avoid the mutex entirely, and just 
disable the above optimization for most-recently-used-block while 
migration is running.  It's not a complete solution, but it could be 
good enough until we have RAM hot-plug/hot-unplug.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 3/4] lock to protect memslots

2011-08-15 Thread Paolo Bonzini

On 08/14/2011 11:45 PM, Umesh Deshpande wrote:

That's not too good because qemu_get_ram_ptr is a hot path for TCG.


Looks like qemu_get_ram_ptr isn't called from the source side code of
guest migration.


Right, but since you are accessing the list from both migration and 
non-migration threads, qemu_get_ram_ptr needs to take the ram_list lock. 
 The ram_list and IO thread mutexes together protect the "next" member, 
and you need to hold them both when modifying the list.


See the reply to Marcelo for a proposed solution.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 3/4] lock to protect memslots

2011-08-15 Thread Paolo Bonzini
On 08/15/2011 12:26 AM, Marcelo Tosatti wrote:
> Actually the previous patchset does not traverse the ramlist without
> qemu_mutex locked, which is safe versus the most-recently-used-block
> optimization.

Actually it does:

 bytes_transferred_last = bytes_transferred;
 bwidth = qemu_get_clock_ns(rt_clock);
 
+if (stage != 3) {
+qemu_mutex_lock_ramlist();
+qemu_mutex_unlock_iothread();
+}
+
 while (!qemu_file_rate_limit(f)) {
 int bytes_sent;
 
 /* ram_save_block does traverse memory.  */
 bytes_sent = ram_save_block(f);
 bytes_transferred += bytes_sent;
 if (bytes_sent == 0) { /* no more blocks */
 break;
 }
 }
 
+if (stage != 3) {
+qemu_mutex_lock_iothread();
+qemu_mutex_unlock_ramlist();
+}
+
 bwidth = qemu_get_clock_ns(rt_clock) - bwidth;
 bwidth = (bytes_transferred - bytes_transferred_last) / bwidth;
 

What Umesh is doing is using "either ramlist mutex or iothread mutex" when 
reading
the ramlist, and "both" when writing the ramlist; similar to rwlocks done with a
regular mutex per CPU---clever!  So this:

+qemu_mutex_lock_ramlist();
 QLIST_REMOVE(block, next);
 QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
+qemu_mutex_unlock_ramlist();

is effectively upgrading the lock from read-side to write-side, assuming that
qemu_get_ram_ptr is never called from the migration thread (which is true).

However, I propose that you put the MRU order in a separate list.  You would 
still
need two locks: the IO thread lock to protect the new list, a new lock to 
protect
the other fields in the ram_list.  For simplicity you may skip the new lock if 
you
assume that the migration and I/O threads never modify the list concurrently,
which is true.  And more importantly, the MRU and migration code absolutely do 
not
affect each other, because indeed the migration thread does not do MRU accesses.
See the attachment for an untested patch.

Paolo
>From 8579b821a2c7c4da55a4208c5df3c86e8ce2cc87 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini 
Date: Fri, 12 Aug 2011 13:08:04 +0200
Subject: [PATCH] split MRU ram list

Outside the execution threads the normal, non-MRU-ized order of
the RAM blocks should always be enough.  So manage two separate
lists, which will have separate locking rules.

Signed-off-by: Paolo Bonzini 
---
 cpu-all.h |4 +++-
 exec.c|   16 +++-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index f5c82cd..083d9e6 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -479,8 +479,9 @@ typedef struct RAMBlock {
 ram_addr_t offset;
 ram_addr_t length;
 uint32_t flags;
-char idstr[256];
 QLIST_ENTRY(RAMBlock) next;
+QLIST_ENTRY(RAMBlock) next_mru;
+char idstr[256];
 #if defined(__linux__) && !defined(TARGET_S390X)
 int fd;
 #endif
@@ -489,6 +490,7 @@ typedef struct RAMBlock {
 typedef struct RAMList {
 uint8_t *phys_dirty;
 QLIST_HEAD(, RAMBlock) blocks;
+QLIST_HEAD(, RAMBlock) blocks_mru;
 } RAMList;
 extern RAMList ram_list;
 
diff --git a/exec.c b/exec.c
index 253f42c..be0f37e 100644
--- a/exec.c
+++ b/exec.c
@@ -110,7 +110,10 @@ static uint8_t *code_gen_ptr;
 int phys_ram_fd;
 static int in_migration;
 
-RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks) };
+RAMList ram_list = {
+.blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks),
+.blocks_mru = QLIST_HEAD_INITIALIZER(ram_list.blocks_mru)
+};
 
 static MemoryRegion *system_memory;
 
@@ -2972,6 +2975,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
 new_block->length = size;
 
 QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
+QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);
 
 ram_list.phys_dirty = qemu_realloc(ram_list.phys_dirty,
last_ram_offset() >> TARGET_PAGE_BITS);
@@ -2996,6 +3000,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
 QLIST_FOREACH(block, &ram_list.blocks, next) {
 if (addr == block->offset) {
 QLIST_REMOVE(block, next);
+QLIST_REMOVE(block, next_mru);
 qemu_free(block);
 return;
 }
@@ -3009,6 +3014,7 @@ void qemu_ram_free(ram_addr_t addr)
 QLIST_FOREACH(block, &ram_list.blocks, next) {
 if (addr == block->offset) {
 QLIST_REMOVE(block, next);
+QLIST_REMOVE(block, next_mru);
 if (block->flags & RAM_PREALLOC_MASK) {
 ;
 } else if (mem_path) {
@@ -3113,12 +3119,12 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 {
 RAMBlock *block;
 
-QLIST_FOREACH(block, &ram_list.blocks, next) {
+QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
 if (addr - block->offset < block->length) {

Re: [RFC PATCH v3 3/4] lock to protect memslots

2011-08-15 Thread Paolo Bonzini

On 08/15/2011 01:27 PM, Umesh Deshpande wrote:

Yes, the mru list patch would obviate the need of holding the ram_list
mutex in qemu_get_ram_ptr.


Feel free to take it and complete it with locking then!


Also, I was planning to protect the whole migration thread with iothread
mutex, and ram_list mutex. (i.e. holding ram_list mutex while sleeping
between two iterations, when we release iothread mutex). This will
prevent the memslot block removal altogether during the migration. Do
you see any problem with this?


No, indeed holding the ram_list mutex through all the migration "fixes" 
the problems with ram_lists removing during migration.  I guess whoever 
will implement memory hotplug, will find a way around it. :)


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 3/4] lock to protect memslots

2011-08-16 Thread Paolo Bonzini

On 08/15/2011 11:15 PM, Paolo Bonzini wrote:

On 08/15/2011 01:27 PM, Umesh Deshpande wrote:

Yes, the mru list patch would obviate the need of holding the ram_list
mutex in qemu_get_ram_ptr.


Feel free to take it and complete it with locking then!


Also, I was planning to protect the whole migration thread with iothread
mutex, and ram_list mutex. (i.e. holding ram_list mutex while sleeping
between two iterations, when we release iothread mutex). This will
prevent the memslot block removal altogether during the migration. Do
you see any problem with this?


No, indeed holding the ram_list mutex through all the migration "fixes"
the problems with ram_lists removing during migration. I guess whoever
will implement memory hotplug, will find a way around it. :)


BTW, now that I think of it, do you have ideas on how to do the 
migration thread do block migration?


Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 0.15: is glib actually needed?

2011-08-16 Thread Paolo Bonzini

On 08/16/2011 03:57 AM, Michael Tokarev wrote:

Each time I build qemu-kvm, my build script complains
that it is needlessly linked against libglib-2.0 without
using any symbols from that library.  So is glib really
needed for qemu-kvm?  How it's different from qemu-0.15?


glib is only needed for the guest agent, or on platforms that do not 
have makecontext/swapcontext (the BSDs).


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v4 2/5] ramlist mutex

2011-08-16 Thread Paolo Bonzini

On 08/16/2011 08:56 PM, Umesh Deshpande wrote:

@@ -3001,8 +3016,10 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)

  QLIST_FOREACH(block,&ram_list.blocks, next) {
  if (addr == block->offset) {
+qemu_mutex_lock_ramlist();
  QLIST_REMOVE(block, next);
  QLIST_REMOVE(block, next_mru);
+qemu_mutex_unlock_ramlist();
  qemu_free(block);
  return;
  }
@@ -3015,8 +3032,10 @@ void qemu_ram_free(ram_addr_t addr)

  QLIST_FOREACH(block,&ram_list.blocks, next) {
  if (addr == block->offset) {
+qemu_mutex_lock_ramlist();
  QLIST_REMOVE(block, next);
  QLIST_REMOVE(block, next_mru);
+qemu_mutex_unlock_ramlist();
  if (block->flags&  RAM_PREALLOC_MASK) {
  ;
  } else if (mem_path) {


You must protect the whole QLIST_FOREACH.  Otherwise looks good.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v4 4/5] separate thread for VM migration

2011-08-17 Thread Paolo Bonzini

On 08/16/2011 08:56 PM, Umesh Deshpande wrote:

+qemu_mutex_lock_ramlist();


Taken locks: iothread, ramlist


+qemu_mutex_unlock_iothread();


Taken locks: ramlist


+s->wait_for_unfreeze(s);
+qemu_mutex_lock_iothread();


Taken locks: ramlist, iothread

You'd have a deadlock here if hypothetically you had two migrations at 
the same time.



+qemu_mutex_unlock_ramlist();


But in general, why this locking?  The buffered file need not know 
anything about the ram list and its mutex.  Only ram_save_live needs to 
hold the ramlist lock---starting just before sort_ram_list and ending 
just after the end of stage 3.  That should be part of patch 2.


Actually buffered_file.c should ideally not even take the iothread lock! 
 The code there is only copying data from a private buffer to a file 
descriptor; neither is shared.  It's migrate_fd_put_buffer that should 
take care of locking.  This is an example of why keeping the separation 
of QEMUBufferedFile is a good idea at least for now.


I am still kind of unconvinced about calling qemu_fclose from the 
migration thread.  You still have one instance of cancellation in the 
iothread when migrate_fd_release is called.  Ideally, as soon as 
migration finishes or has an error you could trigger a bottom half that 
closes the file (which in turn joins the thread).  Migration state 
notifiers should also be run only from the iothread.  Failure to do so 
(or in general lack of a policy of what runs where) can lead to very 
difficult bugs.  Not so much hard to debug in this case (we have a 
global lock, so things cannot go _that_ bad), but hard to fix without 
redoing everything.


However, this patch is a good start (with locking fixed).  It should 
takes several incremental steps before getting there, including 
incredible simplification if you take into account that migration can 
block and wait_for_unfreeze can disappear.  In the end it probably 
should be committed as a single patch, but I'm liking the patches more 
and more.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 5/5] Making iothread block for migrate_cancel

2011-08-17 Thread Paolo Bonzini

On 08/16/2011 08:56 PM, Umesh Deshpande wrote:

diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c
index 2bd02ef..0d18b35 100644
--- a/qemu-thread-posix.c
+++ b/qemu-thread-posix.c
@@ -115,6 +115,16 @@ void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex)
  error_exit(err, __func__);
  }

+void qemu_thread_join(QemuThread thread)


Should take a pointer.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v4 2/5] ramlist mutex

2011-08-21 Thread Paolo Bonzini

On 08/19/2011 08:20 AM, Umesh Deshpande wrote:

Or, is it okay to convert all the ramblock list traversals in exec.c
(under iothread) to mru traversals, and probably it makes sense as the
original list was also maintained in the mru order, whereas the sequence
of blocks doesn't matter for the migration code. This way we don't have
to acquire the mutex for block list traversals.


I'm not sure... as I said, the MRU list is on a fast path and 
restricting it to that fast path keeps us honest.  Also, the non-MRU 
list is almost never accessed outside the migration thread, so the mutex 
shouldn't be heavily contended anyway.  You can also think about (not 
too clever) ways to keep the mutex unlocked while not doing ram_save_live.


BTW, actually the migration code tries to migrate the largest blocks 
first (because usually all the blocks after the first are small and 
easily migrated during the _complete pass), so the order does somewhat 
matter for migration.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Segfault starting vcpu thread

2012-07-11 Thread Paolo Bonzini
Il 11/07/2012 14:08, Avi Kivity ha scritto:
> specific command line or guest?
 >>> 
 >>> qemu-system-x86_64
>>> >> 
>>> >> Just did the same, but it's all fine here.
>> > 
>> > Ok, I'll debug it.  Probably something stupid like a miscompile.
> Indeed, a simple clean build fixed it up.  Paolo, it looks like
> autodependencies are still broken.

Hmm, I think everything happened while I was away.  I'll take a look.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] hw/virtio-scsi: Set max_target=0 during vhost-scsi operation

2012-07-11 Thread Paolo Bonzini
Il 12/07/2012 07:34, Zhi Yong Wu ha scritto:
> HI,
> 
> Do we need to maintain one QEMU branch to collect all useful latest
> patches for tcm_vhost support?  You know, those patches will not get
> merged into qemu.git/master.

Never say never, but the answer to your question is yes: please apply
this patch to your vhost-scsi branch and push it to github.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v2] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning

2012-07-12 Thread Paolo Bonzini
Il 12/07/2012 09:23, James Bottomley ha scritto:
>> > Cc: Paolo Bonzini 
>> > Cc: Stefan Hajnoczi 
>> > Cc: Zhi Yong Wu 
>> > Cc: Christoph Hellwig 
>> > Cc: Hannes Reinecke 
>> > Cc: James Bottomley 
>> > Signed-off-by: Nicholas Bellinger 
> Was the change so great that it needs re acking?

Not really, but anyway

Acked-by: Paolo Bonzini 

> I assume it also now no longer applies to stable because it will reject?

Yes, stable will have to use v1.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH uq/master 2/9] event_notifier: remove event_notifier_test

2012-07-12 Thread Paolo Bonzini
Il 12/07/2012 11:10, Avi Kivity ha scritto:
> On 07/05/2012 06:16 PM, Paolo Bonzini wrote:
>> This is broken; since the eventfd is used in nonblocking mode there
>> is a race between reading and writing.
>>
> 
>> diff --git a/event_notifier.c b/event_notifier.c
>> index 2b210f4..c339bfe 100644
>> --- a/event_notifier.c
>> +++ b/event_notifier.c
>> @@ -51,18 +51,3 @@ int event_notifier_test_and_clear(EventNotifier *e)
>>  int r = read(e->fd, &value, sizeof(value));
>>  return r == sizeof(value);
>>  }
>> -
>> -int event_notifier_test(EventNotifier *e)
>> -{
>> -uint64_t value;
>> -int r = read(e->fd, &value, sizeof(value));
>> -if (r == sizeof(value)) {
>> -/* restore previous value. */
>> -int s = write(e->fd, &value, sizeof(value));
>> -/* never blocks because we use EFD_SEMAPHORE.
>> - * If we didn't we'd get EAGAIN on overflow
>> - * and we'd have to write code to ignore it. */
>> -assert(s == sizeof(value));
>> -}
>> -return r == sizeof(value);
>> -}
> 
> I don't see the race.  Mind explaining?

The assertion can actually fire, there's nothing that prevents this from
happening:

event_notifier_test()
read(fd, &value, 8)
  write(fd, , 8)
write(fd, &value, 8)

event_notifier_set will always write a 1 and it will take a large amount
of writes to reach overflow :) but that may not be true of other writers
using the same file descriptor.

Then, the comment is wrong in two ways.  First, we do not use
EFD_SEMAPHORE (though even if we did the only difference is that value
will be always one).  Second, we cannot write code to ignore EAGAIN,
because then we've lost the value.

With blocking I/O things would not be much better, because then
event_notifier_test() might block on the write.  That would be quite
surprising.

If we cared, we could implement the function more easily and corectly
with poll(), checking for POLLIN in the revents.  But I don't see a
sensible use case for it anyway.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH uq/master 2/9] event_notifier: remove event_notifier_test

2012-07-12 Thread Paolo Bonzini
Il 12/07/2012 13:04, Avi Kivity ha scritto:
> Right, it's useless.  I'll adjust the comment (and the whitespace fix)
> and apply.

Ok, thanks very much!

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] virtio-scsi spec: event improvements

2012-07-16 Thread Paolo Bonzini
This makes some changes to the virtio-scsi event specification, so that
it is now possible to use virtio-scsi events in the implementation of
the QEMU "block_resize" command.

Thanks to Cong Meng for finally implementing virtio-scsi hotplug, which
made me look at block_resize again!

Paolo Bonzini (2):
  virtio-scsi spec: unify event structs
  virtio-scsi spec: add configuration change event

 virtio-spec.lyx |  164 +--
 1 file changed, 160 insertions(+), 4 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] virtio-scsi spec: unify event structs

2012-07-16 Thread Paolo Bonzini
All currently defined event structs have the same fields.  Simplify the
driver by enforcing this also for future structs.

Signed-off-by: Paolo Bonzini 
---
 virtio-spec.lyx |   69 +++
 1 file changed, 65 insertions(+), 4 deletions(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 905e619..f8b214b 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -8207,7 +8207,20 @@ struct virtio_scsi_event {
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440791
+
 ...
+\change_inserted 1531152142 1342440791
+u8  lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1342440791
+
+u32 reason;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -8221,16 +8234,32 @@ struct virtio_scsi_event {
 \end_layout
 
 \begin_layout Standard
-If bit 31 is set in the event field, the device failed to report an event
- due to missing buffers.
+If bit 31 is set in the 
+\series bold
+event
+\series default
+ field, the device failed to report an event due to missing buffers.
  In this case, the driver should poll the logical units for unit attention
  conditions, and/or do whatever form of bus scan is appropriate for the
  guest operating system.
 \end_layout
 
 \begin_layout Standard
-Other data that the device writes to the buffer depends on the contents
- of the event field.
+
+\change_deleted 1531152142 1342440830
+Other data that the device writes to the buffer
+\change_inserted 1531152142 1342440839
+The meaning of the 
+\series bold
+reason
+\series default
+ field
+\change_unchanged
+ depends on the contents of the 
+\series bold
+event
+\series default
+ field.
  The following events are defined:
 \end_layout
 
@@ -8312,36 +8341,50 @@ status open
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440799
+
 struct virtio_scsi_event_reset {
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440799
+
 // Write-only part
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440799
+
 u32 event;
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440799
+
 u8  lun[8];
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440799
+
 u32 reason;
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440799
+
 }
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440799
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -8542,40 +8585,58 @@ status open
 \begin_layout Plain Layout
 
 #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
+\change_deleted 1531152142 1342440854
+
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440854
+
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440854
+
 struct virtio_scsi_event_an {
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440854
+
 // Write-only part
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440854
+
 u32 event;
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440854
+
 u8  lun[8];
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440854
+
 u32 reason;
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 1531152142 1342440854
+
 }
+\change_unchanged
+
 \end_layout
 
 \end_inset
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] virtio-scsi spec: add configuration change event

2012-07-16 Thread Paolo Bonzini
This adds an event for changes to LUN parameters, for example capacity.  These
are reported in virtio-blk via configuration changes, and we want a similar
functionality in virtio-scsi too.

There is no list of supported parameter changes, instead we just refer to
the list of sense codes in the SCSI specification.

This event will usually be serviced in one of three ways: 1) call an OS
service to revalidate the disk, either always or only for some specific
sense codes; 2) somehow pass the sense directly to the upper-level driver;
3) inject a TEST UNIT READY command into the upper-level device, so that
the OS will see the unit attention code and react.  Of course a mix of
the three is also possible, depending on how the driver writer prefers
to have his layering violations served.

Signed-off-by: Paolo Bonzini 
---
 virtio-spec.lyx |   95 +++
 1 file changed, 95 insertions(+)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index f8b214b..8d2ac9a 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -6995,6 +6995,21 @@ VIRTIO_SCSI_F_HOTPLUG
 
 (1) The host should enable hot-plug/hot-unplug of new LUNs and targets on
  the SCSI bus.
+\change_inserted 1531152142 1342440342
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1342440768
+VIRTIO_SCSI_F_CHANGE
+\begin_inset space ~
+\end_inset
+
+(2) The host will report changes to LUN parameters via a 
VIRTIO_SCSI_T_PARAM_CHA
+NGE event.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -8673,6 +8688,86 @@ reason
 \begin_layout Standard
 When dropped events are reported, the driver should poll for asynchronous
  events manually using SCSI commands.
+\change_inserted 1531152142 1342439104
+
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1342440778
+LUN
+\begin_inset space ~
+\end_inset
+
+parameter
+\begin_inset space ~
+\end_inset
+
+change 
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1342440783
+
+#define VIRTIO_SCSI_T_PARAM_CHANGE  3
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1342440882
+By sending this event, the device signals that the configuration parameters
+ (for example the capacity) of a logical unit have changed.
+ The 
+\series bold
+event
+\series default
+ field is set to VIRTIO_SCSI_T_PARAM_CHANGE.
+ The 
+\series bold
+lun
+\series default
+ field addresses a logical unit in the SCSI host.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1342440916
+The same event is also reported as a unit attention condition.
+ The 
+\series bold
+reason
+\series default
+ field contains the additional sense code and additional sense code qualifier,
+ respectively in bits 0..7 and 8..15.
+ For example, a change in capacity will be reported as asc 0x2a, ascq 0x09
+ (CAPACITY DATA HAS CHANGED).
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1342442803
+For MMC devices (inquiry type 5) there would be some overlap between this
+ event and the asynchronous notification event.
+ For simplicity, as of this version of the specification the host must
+ never report this event for MMC devices.
 \end_layout
 
 \end_deeper
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] virtio-scsi: support block_resize

2012-07-16 Thread Paolo Bonzini
This series adds support for block_resize to virtio-scsi.  Events
are reported via a new event type.  Patches to the spec are on the
list.

Paolo Bonzini (5):
  scsi-disk: removable hard disks support START/STOP
  scsi-disk: report resized disk via sense codes
  scsi: establish precedence levels for unit attention
  scsi: report parameter changes to HBA drivers
  virtio-scsi: report parameter change events

 hw/scsi-bus.c|   67 +-
 hw/scsi-disk.c   |   21 +++--
 hw/scsi.h|5 
 hw/virtio-scsi.c |   16 +
 trace-events |1 +
 5 files changed, 102 insertions(+), 8 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] scsi-disk: removable hard disks support START/STOP

2012-07-16 Thread Paolo Bonzini
Support for START/STOP UNIT right now is limited to CD-ROMs.  This is wrong,
since removable hard disks (in the real world: SD card readers) also support
it in pretty much the same way.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi-disk.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index bcec66b..42bae3b 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -1251,7 +1251,7 @@ static int scsi_disk_emulate_start_stop(SCSIDiskReq *r)
 bool start = req->cmd.buf[4] & 1;
 bool loej = req->cmd.buf[4] & 2; /* load on start, eject on !start */
 
-if (s->qdev.type == TYPE_ROM && loej) {
+if ((s->features & (1 << SCSI_DISK_F_REMOVABLE)) && loej) {
 if (!start && !s->tray_open && s->tray_locked) {
 scsi_check_condition(r,
  bdrv_is_inserted(s->qdev.conf.bs)
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] scsi-disk: report resized disk via sense codes

2012-07-16 Thread Paolo Bonzini
Linux will not use these, but a very similar mechanism will be used to
report the condition via virtio-scsi events.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi-bus.c  |5 +
 hw/scsi-disk.c |   15 +++
 hw/scsi.h  |2 ++
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 8b961f2..2547c50 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -1161,6 +1161,11 @@ const struct SCSISense sense_code_LUN_FAILURE = {
 .key = ABORTED_COMMAND, .asc = 0x3e, .ascq = 0x01
 };
 
+/* Unit attention, Capacity data has changed */
+const struct SCSISense sense_code_CAPACITY_CHANGED = {
+.key = UNIT_ATTENTION, .asc = 0x2a, .ascq = 0x09
+};
+
 /* Unit attention, Power on, reset or bus device reset occurred */
 const struct SCSISense sense_code_RESET = {
 .key = UNIT_ATTENTION, .asc = 0x29, .ascq = 0x00
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 42bae3b..0905446 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -1863,6 +1863,13 @@ static void scsi_destroy(SCSIDevice *dev)
 blockdev_mark_auto_del(s->qdev.conf.bs);
 }
 
+static void scsi_disk_resize_cb(void *opaque)
+{
+SCSIDiskState *s = opaque;
+
+scsi_device_set_ua(&s->qdev, SENSE_CODE(CAPACITY_CHANGED));
+}
+
 static void scsi_cd_change_media_cb(void *opaque, bool load)
 {
 SCSIDiskState *s = opaque;
@@ -1904,11 +1911,13 @@ static bool scsi_cd_is_medium_locked(void *opaque)
 return ((SCSIDiskState *)opaque)->tray_locked;
 }
 
-static const BlockDevOps scsi_cd_block_ops = {
+static const BlockDevOps scsi_disk_block_ops = {
 .change_media_cb = scsi_cd_change_media_cb,
 .eject_request_cb = scsi_cd_eject_request_cb,
 .is_tray_open = scsi_cd_is_tray_open,
 .is_medium_locked = scsi_cd_is_medium_locked,
+
+.resize_cb = scsi_disk_resize_cb,
 };
 
 static void scsi_disk_unit_attention_reported(SCSIDevice *dev)
@@ -1956,9 +1965,7 @@ static int scsi_initfn(SCSIDevice *dev)
 return -1;
 }
 
-if (s->features & (1 << SCSI_DISK_F_REMOVABLE)) {
-bdrv_set_dev_ops(s->qdev.conf.bs, &scsi_cd_block_ops, s);
-}
+bdrv_set_dev_ops(s->qdev.conf.bs, &scsi_disk_block_ops, s);
 bdrv_set_buffer_alignment(s->qdev.conf.bs, s->qdev.blocksize);
 
 bdrv_iostatus_enable(s->qdev.conf.bs);
diff --git a/hw/scsi.h b/hw/scsi.h
index 47c3b9c..e901350 100644
--- a/hw/scsi.h
+++ b/hw/scsi.h
@@ -198,6 +198,8 @@ extern const struct SCSISense sense_code_IO_ERROR;
 extern const struct SCSISense sense_code_I_T_NEXUS_LOSS;
 /* Command aborted, Logical Unit failure */
 extern const struct SCSISense sense_code_LUN_FAILURE;
+/* LUN not ready, Capacity data has changed */
+extern const struct SCSISense sense_code_CAPACITY_CHANGED;
 /* LUN not ready, Medium not present */
 extern const struct SCSISense sense_code_UNIT_ATTENTION_NO_MEDIUM;
 /* Unit attention, Power on, reset or bus device reset occurred */
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] scsi: establish precedence levels for unit attention

2012-07-16 Thread Paolo Bonzini
When a device is resized, we will report a unit attention condition
for CAPACITY DATA HAS CHANGED.  However, we should ensure that this
condition does not override a more important unit attention condition.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi-bus.c  |   52 +++-
 hw/scsi-disk.c |4 ++--
 hw/scsi.h  |1 +
 trace-events   |1 +
 4 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 2547c50..d5e1fb0 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -1530,6 +1530,55 @@ void scsi_req_abort(SCSIRequest *req, int status)
 scsi_req_unref(req);
 }
 
+static int scsi_ua_precedence(SCSISense sense)
+{
+if (sense.key != UNIT_ATTENTION) {
+return INT_MAX;
+}
+if (sense.asc == 0x29 && sense.ascq == 0x04) {
+/* DEVICE INTERNAL RESET goes with POWER ON OCCURRED */
+return 1;
+} else if (sense.asc == 0x3F && sense.ascq == 0x01) {
+/* MICROCODE HAS BEEN CHANGED goes with SCSI BUS RESET OCCURRED */
+return 2;
+} else if (sense.asc == 0x29 && (sense.ascq == 0x05 || sense.ascq == 
0x06)) {
+/* These two go with "all others". */
+;
+} else if (sense.asc == 0x29 && sense.ascq <= 0x07) {
+/* POWER ON, RESET OR BUS DEVICE RESET OCCURRED = 0
+ * POWER ON OCCURRED = 1
+ * SCSI BUS RESET OCCURRED = 2
+ * BUS DEVICE RESET FUNCTION OCCURRED = 3
+ * I_T NEXUS LOSS OCCURRED = 7
+ */
+return sense.ascq;
+} else if (sense.asc == 0x2F && sense.ascq == 0x01) {
+/* COMMANDS CLEARED BY POWER LOSS NOTIFICATION  */
+return 8;
+}
+return (sense.asc << 8) | sense.ascq;
+}
+
+void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
+{
+int prec1, prec2;
+if (sense.key != UNIT_ATTENTION) {
+return;
+}
+trace_scsi_device_set_ua(sdev->id, sdev->lun, sense.key,
+ sense.asc, sense.ascq);
+
+/*
+ * Override a pre-existing unit attention condition, except for a more
+ * important reset condition.
+*/
+prec1 = scsi_ua_precedence(sdev->unit_attention);
+prec2 = scsi_ua_precedence(sense);
+if (prec2 < prec1) {
+sdev->unit_attention = sense;
+}
+}
+
 void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense)
 {
 SCSIRequest *req;
@@ -1538,7 +1587,8 @@ void scsi_device_purge_requests(SCSIDevice *sdev, 
SCSISense sense)
 req = QTAILQ_FIRST(&sdev->requests);
 scsi_req_cancel(req);
 }
-sdev->unit_attention = sense;
+
+scsi_device_set_ua(sdev, sense);
 }
 
 static char *scsibus_get_dev_path(DeviceState *dev)
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 0905446..fabd0bf 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -1886,7 +1886,7 @@ static void scsi_cd_change_media_cb(void *opaque, bool 
load)
  */
 s->media_changed = load;
 s->tray_open = !load;
-s->qdev.unit_attention = SENSE_CODE(UNIT_ATTENTION_NO_MEDIUM);
+scsi_device_set_ua(&s->qdev, SENSE_CODE(UNIT_ATTENTION_NO_MEDIUM));
 s->media_event = true;
 s->eject_request = false;
 }
@@ -1925,7 +1925,7 @@ static void scsi_disk_unit_attention_reported(SCSIDevice 
*dev)
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
 if (s->media_changed) {
 s->media_changed = false;
-s->qdev.unit_attention = SENSE_CODE(MEDIUM_CHANGED);
+scsi_device_set_ua(&s->qdev, SENSE_CODE(MEDIUM_CHANGED));
 }
 }
 
diff --git a/hw/scsi.h b/hw/scsi.h
index e901350..4804be9 100644
--- a/hw/scsi.h
+++ b/hw/scsi.h
@@ -239,6 +239,7 @@ void scsi_req_abort(SCSIRequest *req, int status);
 void scsi_req_cancel(SCSIRequest *req);
 void scsi_req_retry(SCSIRequest *req);
 void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense);
+void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense);
 int scsi_device_get_sense(SCSIDevice *dev, uint8_t *buf, int len, bool fixed);
 SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int target, int lun);
 
diff --git a/trace-events b/trace-events
index 7baa42d..90e9c2a 100644
--- a/trace-events
+++ b/trace-events
@@ -393,6 +393,7 @@ scsi_req_parsed(int target, int lun, int tag, int cmd, int 
mode, int xfer) "targ
 scsi_req_parsed_lba(int target, int lun, int tag, int cmd, uint64_t lba) 
"target %d lun %d tag %d command %d lba %"PRIu64
 scsi_req_parse_bad(int target, int lun, int tag, int cmd) "target %d lun %d 
tag %d command %d"
 scsi_req_build_sense(int target, int lun, int tag, int key, int asc, int ascq) 
"target %d lun %d tag %d key %#02x asc %#02x ascq %#02x"
+scsi_device_set_ua(int target, int lun, int key, int asc, int ascq) "target %d 
lun %d key %#02x asc %#02x ascq %#02x"
 scsi_report_luns(int target, int lun, int tag) "target %d l

[PATCH 4/5] scsi: report parameter changes to HBA drivers

2012-07-16 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 hw/scsi-bus.c  |   10 ++
 hw/scsi-disk.c |2 +-
 hw/scsi.h  |2 ++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index d5e1fb0..77aa946 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -1072,6 +1072,16 @@ int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, 
uint8_t *buf)
 return 0;
 }
 
+void scsi_device_report_change(SCSIDevice *dev, SCSISense sense)
+{
+SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, dev->qdev.parent_bus);
+
+scsi_device_set_ua(dev, sense);
+if (bus->info->change) {
+bus->info->change(bus, dev, sense);
+}
+}
+
 /*
  * Predefined sense codes
  */
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index fabd0bf..8351de5 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -1867,7 +1867,7 @@ static void scsi_disk_resize_cb(void *opaque)
 {
 SCSIDiskState *s = opaque;
 
-scsi_device_set_ua(&s->qdev, SENSE_CODE(CAPACITY_CHANGED));
+scsi_device_report_change(&s->qdev, SENSE_CODE(CAPACITY_CHANGED));
 }
 
 static void scsi_cd_change_media_cb(void *opaque, bool load)
diff --git a/hw/scsi.h b/hw/scsi.h
index 4804be9..380bb89 100644
--- a/hw/scsi.h
+++ b/hw/scsi.h
@@ -132,6 +132,7 @@ struct SCSIBusInfo {
 void (*cancel)(SCSIRequest *req);
 void (*hotplug)(SCSIBus *bus, SCSIDevice *dev);
 void (*hot_unplug)(SCSIBus *bus, SCSIDevice *dev);
+void (*change)(SCSIBus *bus, SCSIDevice *dev, SCSISense sense);
 QEMUSGList *(*get_sg_list)(SCSIRequest *req);
 
 void (*save_request)(QEMUFile *f, SCSIRequest *req);
@@ -240,6 +241,7 @@ void scsi_req_cancel(SCSIRequest *req);
 void scsi_req_retry(SCSIRequest *req);
 void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense);
 void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense);
+void scsi_device_report_change(SCSIDevice *dev, SCSISense sense);
 int scsi_device_get_sense(SCSIDevice *dev, uint8_t *buf, int len, bool fixed);
 SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int target, int lun);
 
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] virtio-scsi: report parameter change events

2012-07-16 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 hw/virtio-scsi.c |   16 
 1 file changed, 16 insertions(+)

diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
index 83dbabd..80a47d7 100644
--- a/hw/virtio-scsi.c
+++ b/hw/virtio-scsi.c
@@ -27,6 +27,7 @@
 /* Feature Bits */
 #define VIRTIO_SCSI_F_INOUT0
 #define VIRTIO_SCSI_F_HOTPLUG  1
+#define VIRTIO_SCSI_F_CHANGE   2
 
 /* Response codes */
 #define VIRTIO_SCSI_S_OK   0
@@ -63,6 +64,7 @@
 #define VIRTIO_SCSI_T_NO_EVENT 0
 #define VIRTIO_SCSI_T_TRANSPORT_RESET  1
 #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
+#define VIRTIO_SCSI_T_PARAM_CHANGE 3
 
 /* Reasons for transport reset event */
 #define VIRTIO_SCSI_EVT_RESET_HARD 0
@@ -554,6 +556,7 @@ static uint32_t virtio_scsi_get_features(VirtIODevice *vdev,
  uint32_t requested_features)
 {
 requested_features |= (1UL << VIRTIO_SCSI_F_HOTPLUG);
+requested_features |= (1UL << VIRTIO_SCSI_F_CHANGE);
 return requested_features;
 }
 
@@ -637,6 +640,18 @@ static void virtio_scsi_handle_event(VirtIODevice *vdev, 
VirtQueue *vq)
 }
 }
 
+static void virtio_scsi_change(SCSIBus *bus, SCSIDevice *dev, SCSISense sense)
+{
+VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
+
+if (((s->vdev.guest_features >> VIRTIO_SCSI_F_CHANGE) & 1) &&
+(s->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+dev->type != TYPE_ROM) {
+virtio_scsi_push_event(s, dev, VIRTIO_SCSI_T_PARAM_CHANGE,
+   sense.asc | (sense.ascq << 8));
+}
+}
+
 static void virtio_scsi_hotplug(SCSIBus *bus, SCSIDevice *dev)
 {
 VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
@@ -666,6 +681,7 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = {
 
 .complete = virtio_scsi_command_complete,
 .cancel = virtio_scsi_request_cancelled,
+.change = virtio_scsi_change,
 .hotplug = virtio_scsi_hotplug,
 .hot_unplug = virtio_scsi_hot_unplug,
 .get_sg_list = virtio_scsi_get_sg_list,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Add vhost-blk support

2012-07-17 Thread Paolo Bonzini
Il 17/07/2012 10:29, Asias He ha scritto:
> So, vhost-blk at least saves ~6 syscalls for us in each request. 

Are they really 6?  If I/O is coalesced by a factor of 3, for example
(i.e. each exit processes 3 requests), it's really 2 syscalls per request.

Also, is there anything we can improve?  Perhaps we can modify epoll and
ask it to clear the eventfd for us (would save 2 reads)?  Or
io_getevents (would save 1)?

> I guess you mean qemu here. Yes, in theory, qemu's block layer can be
> improved to achieve similar performance as vhost-blk or kvm tool's
> userspace virito-blk has. But I think it makes no sense to prevent one
> solution becase there is another in theory solution called: we can do
> similar in qemu.

It depends.  Like vhost-scsi, vhost-blk has the problem of a crippled
feature set: no support for block device formats, non-raw protocols,
etc.  This makes it different from vhost-net.

So it begs the question, is it going to be used in production, or just a
useful reference tool?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Add vhost-blk support

2012-07-17 Thread Paolo Bonzini
Il 17/07/2012 11:21, Asias He ha scritto:
>> It depends.  Like vhost-scsi, vhost-blk has the problem of a crippled
>> feature set: no support for block device formats, non-raw protocols,
>> etc.  This makes it different from vhost-net.
> 
> Data-plane qemu also has this cripppled feature set problem, no?

Yes, but that is just a proof of concept.  We can implement a separate
I/O thread within the QEMU block layer, and add fast paths that resemble
data-path QEMU, without limiting the feature set.

> Does user always choose to use block devices format like qcow2? What
> if they prefer raw image or raw block device?

If they do, the code should hit fast paths and be fast.  But it should
be automatic, without the need for extra knobs.  aio=thread vs.
aio=native is already one knob too much IMHO.

>> So it begs the question, is it going to be used in production, or just a
>> useful reference tool?
> 
> This should be decided by user, I can not speak for them. What is wrong
> with adding one option for user which they can decide?

Having to explain the user about the relative benefits; having to
support the API; having to handle transition from one more thing when
something better comes out.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Add vhost-blk support

2012-07-17 Thread Paolo Bonzini
Il 17/07/2012 11:45, Michael S. Tsirkin ha scritto:
>> So it begs the question, is it going to be used in production, or just a
>> useful reference tool?
> 
> Sticking to raw already makes virtio-blk faster, doesn't it?
> In that vhost-blk looks to me like just another optimization option.
> Ideally I think user just should not care where do we handle virtio:
> in-kernel or in userspace.  One can imagine it being enabled/disabled
> automatically if none of the features unsupported by it are used.

Ok, that would make more sense.  One difference between vhost-blk and
vhost-net is that for vhost-blk there are also management actions that
would trigger the switch, for example a live snapshot.

So a prerequisite for vhost-blk would be that it is possible to disable
it on the fly while the VM is running, as soon as all in-flight I/O is
completed.

(Note that, however, this is not possible for vhost-scsi, because it
really exposes different hardware to the guest.  It must not happen that
a kernel upgrade or downgrade toggles between userspace SCSI and
vhost-scsi, for example).

>> having to
>> support the API; having to handle transition from one more thing when
>> something better comes out.
> 
> Well this is true for any code. If the limited featureset which
> vhost-blk can accelerate is something many people use, then accelerating
> by 5-15% might outweight support costs.

It is definitely what people use if they are interested in performance.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Add vhost-blk support

2012-07-17 Thread Paolo Bonzini
Il 17/07/2012 12:49, Michael S. Tsirkin ha scritto:
>> Ok, that would make more sense.  One difference between vhost-blk and
>> vhost-net is that for vhost-blk there are also management actions that
>> would trigger the switch, for example a live snapshot.
>> So a prerequisite for vhost-blk would be that it is possible to disable
>> it on the fly while the VM is running, as soon as all in-flight I/O is
>> completed.
> 
> It applies for vhost-net too. For example if you bring link down,
> we switch to userspace. So vhost-net supports this switch on the fly.

Cool.

>> (Note that, however, this is not possible for vhost-scsi, because it
>> really exposes different hardware to the guest.  It must not happen that
>> a kernel upgrade or downgrade toggles between userspace SCSI and
>> vhost-scsi, for example).
> 
> I would say this is not a prerequisite for merging in qemu.
> It might be a required feature for production but it
> is also solvable at the management level.

I'm thinking of the level interrupts here.  You cannot make a change in
the guest, and have it do completely unrelated changes the hardware that
the guest sees.

 having to
 support the API; having to handle transition from one more thing when
 something better comes out.
>>>
>>> Well this is true for any code. If the limited featureset which
>>> vhost-blk can accelerate is something many people use, then accelerating
>>> by 5-15% might outweight support costs.
>>
>> It is definitely what people use if they are interested in performance.
> 
> In that case it seems to me we should stop using the feature set as
> an argument and focus on whether the extra code is worth the 5-15% gain.
> No one seems to have commented on that so everyone on list thinks that
> aspect is OK?

I would like to see a breakdown of _where_ the 5-15% lies, something
like http://www.linux-kvm.org/page/Virtio/Block/Latency.

> Kernel merge windows is coming up and I would like to see whether
> any of vhost-blk / vhost-scsi is going to be actually used by userspace.
> I guess we could tag it for staging but would be nice to avoid that.

Staging would be fine by me for both vhost-blk and vhost-scsi.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/5] scsi-disk: report resized disk via sense codes

2012-07-17 Thread Paolo Bonzini
Il 17/07/2012 13:14, Kevin Wolf ha scritto:
>> > -if (s->features & (1 << SCSI_DISK_F_REMOVABLE)) {
>> > -bdrv_set_dev_ops(s->qdev.conf.bs, &scsi_cd_block_ops, s);
>> > -}
>> > +bdrv_set_dev_ops(s->qdev.conf.bs, &scsi_disk_block_ops, s);
> Are you aware of this code?
> 
> bool bdrv_dev_has_removable_media(BlockDriverState *bs)
> {
> return !bs->dev || (bs->dev_ops && bs->dev_ops->change_media_cb);
> }
> 
> This means that now all SCSI disks have removable media from the
> monitor's point of view.

I remembered there was something similar but I couldn't find it.  I'll
rework the patch to have two separate sets of dev_ops.

Thanks very much!

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Add vhost-blk support

2012-07-17 Thread Paolo Bonzini
Il 17/07/2012 14:48, Michael S. Tsirkin ha scritto:
> On Tue, Jul 17, 2012 at 01:03:39PM +0100, Stefan Hajnoczi wrote:
>> On Tue, Jul 17, 2012 at 12:54 PM, Michael S. Tsirkin  wrote:
 Knowing the answer to that is important before anyone can say whether
 this approach is good or not.

 Stefan
>>>
>>> Why is it?
>>
>> Because there might be a fix to kvmtool which closes the gap.  It
>> would be embarassing if vhost-blk was pushed just because no one
>> looked into what is actually going on.
> 
> Embarrasing to whom? Is someone working on an optimization that
> makes the work in question redundant, with posting just around
> the corner? Then maybe the thing to do is just wait a bit.

Of course there is work going on to make QEMU perform better.  Not sure
about lkvm.

>> And on the flipside, hard evidence of an overhead that cannot be
>> resolved could be good reason to do more vhost devices in the future.
> 
> How can one have hard evidence of an overhead that cannot be resolved?

Since we do have two completely independent userspaces (lkvm and
data-plane QEMU), you can build up some compelling evidence of an
overhead that cannot be resolved in user space.

>> Either way, it's useful to do this before going further.
> 
> I think each work should be discussed on its own merits.  Maybe
> vhost-blk is just well written. So? What is your conclusion?

If it's just that vhost-blk is written well, my conclusion is that lkvm
people should look into improving their virtio-blk userspace.  We take
hints from each other all the time, for example virtio-scsi will have
unlocked kick in 3.6.

Why can't vhost-* just get into staging, and we call it a day?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Paolo Bonzini
Il 18/07/2012 15:42, Anthony Liguori ha scritto:
> If you add support for a new command, you need to provide userspace a
> way to disable this command.  If you change what gets reported for VPD,
> you need to provide userspace a way to make VPD look like what it did in
> a previous version.

The QEMU target is not enforcing this to this level.  We didn't for
CD-ROM ATAPI, and we're not doing it for SCSI.

It may indeed be useful for changes to VPD pages or major features.
However, so far we've never introduced any feature that deserved it.
This is also because OSes typically don't care: they use a small subset
of the features and all the remaining "decorations" are only needed to
be pedantically compliant to the spec.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Paolo Bonzini
Il 18/07/2012 21:12, Anthony Liguori ha scritto:
> Is that true for all OSes?  Linux may handle things gracefully if UNMAP
> starts throwing errors but that doesn't mean that Windows will.

There is so much USB crap (not just removable, think of ATA-to-USB
enclosures) that they have to deal with pretty much everything.

> Windows does this with a points system and I do believe that INQUIRY
> responses from any local disks are included in this tally.

INQUIRY responses (at least vendor/product/type) should not change.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-19 Thread Paolo Bonzini
Il 19/07/2012 09:28, James Bottomley ha scritto:
>> > INQUIRY responses (at least vendor/product/type) should not change.
> INQUIRY responses often change for arrays because a firmware upgrade
> enables new features and new features have to declare themselves,
> usually in the INQUIRY data.  What you mean, I think, is that previously
> exposed features in INQUIRY data, as well as strings
> (vendor/product/type, as you say), shouldn't change, but unexposed data
> (read 0 in the fields) may.

What I meant is that it's unlikely that Windows fingerprinting is using
anything but vendor/product/type, because everything else can change.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [QEMU PATCH 0/2] virtio-blk: writeback cache enable improvements

2012-07-23 Thread Paolo Bonzini
Il 03/07/2012 15:20, Paolo Bonzini ha scritto:
> These patches let virtio-blk use the new support for toggling the cache
> mode between writethrough and writeback.
> 
> The first patch introduces a new feature bit and configuration field to
> do this.  The second patch disables writeback caching for guests that do
> not negotiate VIRTIO_BLK_F_WCACHE (meaning that they cannot send flush
> requests), so that limited or older guests are now safe wrt power losses.
> VIRTIO_BLK_F_FLUSH has been introduced in Linux 2.6.32 (in 2009) and was
> backported to RHEL/CentOS 5.6 (in 2010).
> 
> The Windows drivers (which work by emulating SCSI on top of virtio-blk)
> have bugs in this area, which I reported on the Red Hat Bugzilla as
> bugs 837321 and 837324.  With these patches they will suffer a
> performance hit but gain correctness.
> 
> Paolo Bonzini (2):
>   virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE
>   virtio-blk: disable write cache if not negotiated

Ping - Anthony, mst?

Paolo


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/5] scsi-disk: removable hard disks support START/STOP

2012-07-23 Thread Paolo Bonzini
Il 23/07/2012 18:44, Blue Swirl ha scritto:
>> > Support for START/STOP UNIT right now is limited to CD-ROMs.  This is 
>> > wrong,
>> > since removable hard disks (in the real world: SD card readers) also 
>> > support
>> > it in pretty much the same way.
> I remember vaguely tuning a set of large SCSI hard disks
> (non-removable) so that they all didn't start immediately at the same
> time (which could have burned out the PSU) but only with START UNIT
> command. I think Linux or maybe even the BIOS started the drives
> (nicely in sequence) before accessing the drive.

Yes, all disks do start/stop.  Removable disks support load and eject in
addition.  I'll fix the commit message.

Paolo



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/9] virtio-pci: support host notifiers in TCG mode

2012-07-24 Thread Paolo Bonzini
Il 25/07/2012 00:33, Nicholas A. Bellinger ha scritto:
> From: Stefan Hajnoczi 
> 
> Normally host notifiers are only used together with vhost-net in KVM
> mode.  It is occassionally useful to use vhost with TCG mode, mainly for
> testing and development.  This isn't hard to achieve, simply fall back
> to notifying the host notifier manually from qemu if KVM mode is
> disabled.
> 
> Signed-off-by: Stefan Hajnoczi 
> Cc: Anthony Liguori 
> Cc: Paolo Bonzini 
> Signed-off-by: Nicholas Bellinger 
> ---
>  hw/virtio-pci.c |   23 ---
>  hw/virtio.c |7 +++
>  2 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index 4e03f0b..538eef4 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -249,6 +249,25 @@ void virtio_pci_reset(DeviceState *d)
>  proxy->flags &= ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
>  }
>  
> +static void virtio_pci_queue_notify(VirtIOPCIProxy *proxy, uint32_t n)
> +{
> +VirtQueue *vq;
> +EventNotifier *notifier;
> +
> +if (n >= VIRTIO_PCI_QUEUE_MAX) {
> +return;
> +}
> +
> +vq = virtio_get_queue(proxy->vdev, n);
> +notifier = virtio_queue_get_host_notifier(vq);
> +if (event_notifier_valid(notifier)) {
> +printf("notifying vq %u host notifier from userspace\n", n);

Debug printf.

> +event_notifier_notify(notifier);
> +} else {
> +virtio_queue_notify_vq(vq);
> +}

This can be done directly in virtio_queue_notify, there is nothing
specific to virtio-pci.

> +}
> +
>  static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>  {
>  VirtIOPCIProxy *proxy = opaque;
> @@ -278,9 +297,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  vdev->queue_sel = val;
>  break;
>  case VIRTIO_PCI_QUEUE_NOTIFY:
> -if (val < VIRTIO_PCI_QUEUE_MAX) {
> -virtio_queue_notify(vdev, val);
> -}
> +virtio_pci_queue_notify(proxy, val);
>  break;
>  case VIRTIO_PCI_STATUS:
>  if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> diff --git a/hw/virtio.c b/hw/virtio.c
> index d146f86..36a18b5 100644
> --- a/hw/virtio.c
> +++ b/hw/virtio.c
> @@ -536,6 +536,11 @@ void virtio_reset(void *opaque)
>  vdev->vq[i].signalled_used = 0;
>  vdev->vq[i].signalled_used_valid = false;
>  vdev->vq[i].notification = true;
> +
> +assert(!event_notifier_valid(&vdev->vq[i].guest_notifier));
> +assert(!event_notifier_valid(&vdev->vq[i].host_notifier));
> +vdev->vq[i].guest_notifier = EVENT_NOTIFIER_INITIALIZER;
> +vdev->vq[i].host_notifier = EVENT_NOTIFIER_INITIALIZER;

Given the assertions, the assignments should be no-ops.

>  }
>  }
>  
> @@ -905,6 +910,8 @@ VirtIODevice *virtio_common_init(const char *name, 
> uint16_t device_id,
>  for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
>  vdev->vq[i].vector = VIRTIO_NO_VECTOR;
>  vdev->vq[i].vdev = vdev;
> +vdev->vq[i].guest_notifier = EVENT_NOTIFIER_INITIALIZER;
> +vdev->vq[i].host_notifier = EVENT_NOTIFIER_INITIALIZER;
>  }
>  
>  vdev->name = name;
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/9] notifier: add validity check and notify function

2012-07-24 Thread Paolo Bonzini
Il 25/07/2012 00:33, Nicholas A. Bellinger ha scritto:
> +int event_notifier_notify(EventNotifier *e)
> +{
> +uint64_t value = 1;
> +int r;
> +
> +assert(event_notifier_valid(e));
> +r = write(e->fd, &value, sizeof(value));
> +if (r < 0) {
> +return -errno;
> +}
> +assert(r == sizeof(value));
> +return 0;
> +}

Note we now have event_notifier_set.

> +#define EVENT_NOTIFIER_INITIALIZER ((EventNotifier){ .fd = -1 })

This is problematic when your event notifier is inside a struct, because
it is extremely easy to forget the initializer.  You have to initialize
them yourself.  Also, the right thing to test is not whether the
notifier is initialized; it is whether the notifier is actually checked
in QEMU's select() loop.

So, I would prefer avoiding event_notifier_valid and just use a boolean
(in virtio_queue_set_host_notifier_fd_handler and
virtio_queue_set_guest_notifier_fd_handler) to track whether the
notifiers are in use.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 9/9] vhost-scsi: add -vhost-scsi host device

2012-07-24 Thread Paolo Bonzini
Il 25/07/2012 00:34, Nicholas A. Bellinger ha scritto:
> From: Stefan Hajnoczi 
> 
> This patch adds a new type of host device that drives the vhost_scsi
> device.  The syntax to add vhost-scsi is:
> 
>   qemu -vhost-scsi id=vhost-scsi0,wwpn=...,tpgt=123
> 
> The virtio-scsi emulated device will make use of vhost-scsi to process
> virtio-scsi requests inside the kernel and hand them to the in-kernel
> SCSI target stack.
> 
> Changelog v0 -> v1:
> 
> - Add VHOST_SCSI_SET_ENDPOINT call (stefan)
> - Enable vhost notifiers for multiple queues (Zhi)
> - clear vhost-scsi endpoint on stopped (Zhi)
> - Add CONFIG_VHOST_SCSI for QEMU build configure (nab)
> - Rename vhost_vring_target -> vhost_scsi_target (mst + nab)
> - Add support for VHOST_SCSI_GET_ABI_VERSION ioctl (aliguori + nab)
> 
> Signed-off-by: Stefan Hajnoczi 
> Signed-off-by: Zhi Yong Wu 
> Cc: Anthony Liguori 
> Cc: Paolo Bonzini 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Nicholas Bellinger 
> ---
>  configure|   10 +++
>  hw/Makefile.objs |1 +
>  hw/qdev-properties.c |   32 +
>  hw/qdev.h|3 +
>  hw/vhost-scsi.c  |  173 
> ++
>  hw/vhost-scsi.h  |   50 ++
>  qemu-common.h|1 +
>  qemu-config.c|   16 +
>  qemu-options.hx  |4 +
>  vl.c |   18 +
>  10 files changed, 308 insertions(+), 0 deletions(-)
>  create mode 100644 hw/vhost-scsi.c
>  create mode 100644 hw/vhost-scsi.h
> 
> diff --git a/configure b/configure
> index cef0a71..2291e7e 100755
> --- a/configure
> +++ b/configure
> @@ -144,6 +144,7 @@ libattr=""
>  xfs=""
>  
>  vhost_net="no"
> +vhost_scsi="no"
>  kvm="no"
>  gprof="no"
>  debug_tcg="no"
> @@ -489,6 +490,7 @@ Haiku)
>usb="linux"
>kvm="yes"
>vhost_net="yes"
> +  vhost_scsi="yes"
>if [ "$cpu" = "i386" -o "$cpu" = "x86_64" ] ; then
>  audio_possible_drivers="$audio_possible_drivers fmod"
>fi
> @@ -794,6 +796,10 @@ for opt do
>;;
>--enable-vhost-net) vhost_net="yes"
>;;
> +  --disable-vhost-net) vhost_scsi="no"
> +  ;;
> +  --enable-vhost-net) vhost_scsi="yes"
> +  ;;

Case labels are using vhost-net.

>--disable-opengl) opengl="no"
>;;
>--enable-opengl) opengl="yes"
> @@ -3059,6 +3065,7 @@ echo "posix_madvise $posix_madvise"
>  echo "uuid support  $uuid"
>  echo "libcap-ng support $cap_ng"
>  echo "vhost-net support $vhost_net"
> +echo "vhost-scsi support $vhost_scsi"
>  echo "Trace backend $trace_backend"
>  echo "Trace output file $trace_file-"
>  echo "spice support $spice"
> @@ -3762,6 +3769,9 @@ case "$target_arch2" in
>if test "$vhost_net" = "yes" ; then
>  echo "CONFIG_VHOST_NET=y" >> $config_target_mak
>fi
> +  if test "$vhost_scsi" = "yes" ; then
> +echo "CONFIG_VHOST_SCSI=y" >> $config_target_mak
> +  fi
>  fi
>  esac
>  case "$target_arch2" in
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index 8327e55..37a0d0e 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -161,6 +161,7 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio-blk.o 
> virtio-balloon.o virtio-net.o
>  obj-$(CONFIG_VIRTIO) += virtio-serial-bus.o virtio-scsi.o
>  obj-$(CONFIG_SOFTMMU) += vhost_net.o
>  obj-$(CONFIG_VHOST_NET) += vhost.o
> +obj-$(CONFIG_VHOST_SCSI) += vhost-scsi.o
>  obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/
>  obj-$(CONFIG_NO_PCI) += pci-stub.o
>  obj-$(CONFIG_VGA) += vga.o
> diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
> index 24b39e8..0c538d2 100644
> --- a/hw/qdev-properties.c
> +++ b/hw/qdev-properties.c
> @@ -3,6 +3,7 @@
>  #include "qerror.h"
>  #include "blockdev.h"
>  #include "hw/block-common.h"
> +#include "vhost-scsi.h"
>  
>  void *qdev_get_prop_ptr(DeviceState *dev, Property *prop)
>  {
> @@ -685,6 +686,37 @@ PropertyInfo qdev_prop_vlan = {
>  .set   = set_vlan,
>  };
>  
> +/* --- vhost-scsi --- */
> +
> +static int parse_vhost_scsi(DeviceState *dev, Property *prop, const char 
> *str)
> +{
> +VHostSCSI **ptr = qdev_get_prop_ptr(dev, prop);
> +
> +*ptr = find_vhost_scsi(str);
> +if (*ptr == NULL) {
> +return -ENOENT;
>

Re: [RFC 7/9] virtio-scsi: Start/stop vhost

2012-07-25 Thread Paolo Bonzini
Il 25/07/2012 00:34, Nicholas A. Bellinger ha scritto:
> From: Stefan Hajnoczi 
> 
> This patch starts and stops vhost as the virtio device transitions
> through its status phases.  Vhost can only be started once the guest
> reports its driver has successfully initialized, which means the
> virtqueues have been set up by the guest.
> 
> (v2: Squash virtio-scsi: use the vhost-scsi host device from stefan)
> 
> Signed-off-by: Stefan Hajnoczi 
> Signed-off-by: Zhi Yong Wu 
> Cc: Michael S. Tsirkin 
> Cc: Paolo Bonzini 
> Signed-off-by: Nicholas Bellinger 

Hmm, this is not what the patch does... :)

Paolo


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 7/9] virtio-scsi: Start/stop vhost

2012-07-25 Thread Paolo Bonzini
Il 25/07/2012 09:01, Paolo Bonzini ha scritto:
>> > From: Stefan Hajnoczi 
>> > 
>> > This patch starts and stops vhost as the virtio device transitions
>> > through its status phases.  Vhost can only be started once the guest
>> > reports its driver has successfully initialized, which means the
>> > virtqueues have been set up by the guest.
>> > 
>> > (v2: Squash virtio-scsi: use the vhost-scsi host device from stefan)
>> > 
>> > Signed-off-by: Stefan Hajnoczi 
>> > Signed-off-by: Zhi Yong Wu 
>> > Cc: Michael S. Tsirkin 
>> > Cc: Paolo Bonzini 
>> > Signed-off-by: Nicholas Bellinger 
> Hmm, this is not what the patch does... :)

Oops, the above comment was meant for patch 5.  Which is a one-liner
that can be squashed here.

Anyway there is some problem with the ordering of the patches, because
this patch includes vhost-scsi.h (introduced in patch 9) and patch 5
uses VHostSCSI (defined by vhost-scsi.h).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 6/9] virtio-scsi: Open and initialize /dev/vhost-scsi

2012-07-25 Thread Paolo Bonzini
Il 25/07/2012 00:34, Nicholas A. Bellinger ha scritto:
> From: Stefan Hajnoczi 
> 
> Begin adding vhost support by opening /dev/vhost-scsi.
> 
> (v2: Drop legacy ->vhost_vqs[] usage)
> 
> Signed-off-by: Stefan Hajnoczi 
> Signed-off-by: Zhi Yong Wu 
> Cc: Michael S. Tsirkin 
> Cc: Paolo Bonzini 
> Signed-off-by: Nicholas Bellinger 
> ---
>  hw/virtio-scsi.c |   12 
>  1 files changed, 12 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
> index 4a787d3..dea3269 100644
> --- a/hw/virtio-scsi.c
> +++ b/hw/virtio-scsi.c
> @@ -18,6 +18,7 @@
>  #include "virtio-scsi.h"
>  #include 
>  #include 
> +#include "vhost.h"
>  
>  #define VIRTIO_SCSI_VQ_SIZE 128
>  #define VIRTIO_SCSI_CDB_SIZE32
> @@ -137,6 +138,9 @@ typedef struct {
>  VirtQueue *ctrl_vq;
>  VirtQueue *event_vq;
>  VirtQueue *cmd_vqs[0];
> +
> +bool vhost_started;
> +VHostSCSI *vhost_scsi;
>  } VirtIOSCSI;
>  
>  typedef struct VirtIOSCSIReq {
> @@ -456,6 +460,11 @@ static void virtio_scsi_fail_cmd_req(VirtIOSCSIReq *req)
>  virtio_scsi_complete_req(req);
>  }
>  
> +static VirtIOSCSI *to_virtio_scsi(VirtIODevice *vdev)
> +{
> +return (VirtIOSCSI *)vdev;
> +}

This function is unused and, because it is static, it will break
compilation.  Not that it would compile anyway since VHostSCSI is not
defined yet... ;)


>  static void virtio_scsi_handle_cmd(VirtIODevice *vdev, VirtQueue *vq)
>  {
>  VirtIOSCSI *s = (VirtIOSCSI *)vdev;
> @@ -605,6 +614,8 @@ VirtIODevice *virtio_scsi_init(DeviceState *dev, 
> VirtIOSCSIConf *proxyconf)
>  
>  s->qdev = dev;
>  s->conf = proxyconf;
> +s->vhost_started = false;
> +s->vhost_scsi = proxyconf->vhost_scsi;

No need for this, just use s->conf->vhost_scsi.

Also, please do not register the QEMU SCSI bus if vhost-scsi is active.

Paolo

>  
>  /* TODO set up vdev function pointers */
>  s->vdev.get_config = virtio_scsi_get_config;
> @@ -636,5 +647,6 @@ void virtio_scsi_exit(VirtIODevice *vdev)
>  {
>  VirtIOSCSI *s = (VirtIOSCSI *)vdev;
>  unregister_savevm(s->qdev, "virtio-scsi", s);
> +vhost_dev_cleanup(&s->vhost_scsi);
>  virtio_cleanup(vdev);
>  }
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] qom: delay DeviceState's reclaim to main-loop

2012-07-25 Thread Paolo Bonzini
Il 25/07/2012 09:03, Stefan Hajnoczi ha scritto:
> From: Liu Ping Fan 
>
> iohandler/bh/timer may use DeviceState when its refcnt=0,

It's not clear how to me.  The only reference to devices from an
iohandler/bh/timer can be in the opaque.  Now, if you have a
iohandler/bh/timer whose opaque is a DeviceState, you should bump the
refcount before setting it up, and unref after tearing it down.

See for example how hw/scsi-disk.c bumps the refcount of a request
before starting asynchronous I/O and calls unref at the end of the
asynchronous I/O callback.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] qom: delay DeviceState's reclaim to main-loop

2012-07-25 Thread Paolo Bonzini
Il 25/07/2012 10:16, liu ping fan ha scritto:
> > It's not clear how to me.  The only reference to devices from an
> > iohandler/bh/timer can be in the opaque.  Now, if you have a
> > iohandler/bh/timer whose opaque is a DeviceState, you should bump the
> > refcount before setting it up, and unref after tearing it down.
>
> Yes, I admit refcnt is a good solution, but I think it means that we
> will fix it with each device's bh. And this way seems lazy.

Most of the time the bh/timer/iohandler is created in the init function,
and deleted in the exit function, so in this case the lifecycle is even
easier to manage.

Looking at your patch again, it seems like you're implementing a
poor-man RCU.  So that's fine for a proof-of-concept, but let's replace
it with real RCU sooner or later.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] qom: adopt rwlock to protect accessing dev from removing it

2012-07-25 Thread Paolo Bonzini
Il 25/07/2012 05:31, Liu Ping Fan ha scritto:
> From: Liu Ping Fan 
> 
> rwlock:
>   qemu_device_tree_mutex
> 
> rd side:
>   --device_del(destruction of device will be postphoned until unplug
> ack from guest),
>   --pci hot-unplug
>   --iteration (qdev_reset_all)
> 
> wr side:
>   --device_add
> 
> Signed-off-by: Liu Ping Fan 
> ---
>  hw/pci-hotplug.c  |4 
>  hw/qdev-monitor.c |   17 -
>  hw/qdev.c |2 ++
>  3 files changed, 22 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c
> index e7fb780..b3b88c1 100644
> --- a/hw/pci-hotplug.c
> +++ b/hw/pci-hotplug.c
> @@ -265,9 +265,11 @@ static int pci_device_hot_remove(Monitor *mon, const 
> char *pci_addr)
>  return -1;
>  }
>  
> +qemu_rwlock_rdlock_devtree();

This is not defined anywhere, is a piece missing in the patch?

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] exec.c: use refcnt to protect device during dispatching

2012-07-25 Thread Paolo Bonzini
Il 25/07/2012 10:12, liu ping fan ha scritto:
>>> >> +qemu_rwlock_rdlock_devtree();
>>> >>  section = phys_page_find(page >> TARGET_PAGE_BITS);
>>> >> +if (!(memory_region_is_ram(section->mr) ||
>>> >> +memory_region_is_romd(section->mr)) && !is_write) {
>>> >> +bk = get_backend(section->mr, addr);
>>> >> +object_ref(bk);
>>> >> +} else if (!memory_region_is_ram(section->mr) && is_write) {
>>> >> +bk = get_backend(section->mr, addr);
>>> >> +object_ref(bk);
>>> >> +}
>>> >> +qemu_rwlock_unlock_devtree();
>>> >>
>>> >>  if (is_write) {
>>> >>  if (!memory_region_is_ram(section->mr)) {
>>> >> @@ -3426,6 +3462,7 @@ void cpu_physical_memory_rw(target_phys_addr_t 
>>> >> addr, uint8_t *buf,
>>> >>  io_mem_write(section->mr, addr1, val, 1);
>>> >>  l = 1;
>>> >>  }
>>> >> +object_unref(bk);
>> >
>> > Currently object_ref()/object_unref() are not atomic.  Will you send
> We obey the rule:
>rdlock->search->ref_get,
>wrlock->remove ->ref_put
> So can  it causes problem if object_ref()/object_unref()  are not atomic?

Yes, two CPUs can perform object_ref at the same time.

You can find a header file for atomic operations here:
https://github.com/bonzini/qemu/commit/atomics.patch

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-25 Thread Paolo Bonzini
Il 25/07/2012 15:26, Boaz Harrosh ha scritto:
>>> In SCSI land most LLDs should support chaining just by virtu of using the
>>> for_each_sg macro. That all it takes. Your code above does support it.
>>
>> Yes, it supports it but still has to undo them before passing to virtio.
>>
>> What my LLD does is add a request descriptor in front of the scatterlist
>> that the LLD receives.  I would like to do this with a 2-item
>> scatterlist: one for the request descriptor, and one which is a chain to
>> the original scatterlist.  
> 
> I hate that plan. Why yet override the scatter element yet again with a third
> union of a "request descriptor"

I'm not overriding (or did you mean overloading?) anything, and I think
you're reading too much in my words.

What I am saying is (for a WRITE command):

1) what I get is a scsi_cmnd which contains an N-element scatterlist.

2) virtio-scsi has to build the "packet" that is passed to the hardware
(it does not matter that the hardware is virtual).  This packet (per
virtio-scsi spec) has an N+1-element scatterlist, where the first
element is a request descriptor (struct virtio_scsi_cmd_req), and the
others describe the written data.

3) virtio takes care of converting the "packet" from a scatterlist
(which currently must be a flat one) to the hardware representation.
Here a walk is inevitable, so we don't care about this walk.

4) What I'm doing now: copying (and flattening) the N-element
scatterlist onto the last elements of an N+1 array that I pass to virtio.

  _ _ _ _ _ _
 |_|_|_|_|_|_|  scsi_cmnd scatterlist

 vvv COPY vvv
_ _ _ _ _ _ _
   |_|_|_|_|_|_|_|  scatterlist passed to virtio
|
virtio_scsi_cmd_req

Then I hand off the scatterlist to virtio.  virtio walks it and converts
it to hardware format.

5) What I want to do: create a 2-element scatterlist, the first being
the request descriptor and the second chaining to scsi_cmnd's N-element
scatterlist.

 _ _ _ _ _ _
|_|_|_|_|_|_|  scsi_cmnd scatterlist
_ _/
   |_|C|   scatterlist passed to virtio
|
virtio_scsi_cmd_req

Then I hand off the scatterlist to virtio.  virtio will still walk the
scatterlist chain, and convert it to N+1 elements for the hardware to
consume.  Still, removing one walk largely reduces the length of my
critical sections.  I also save some pointer-chasing because the
2-element scatterlist are short-lived and can reside on the stack.


Other details (you can probably skip these):

There is also a response descriptor.  In the case of writes this is the
only element that the hardware will write to, so in the case of writes
the "written by hardware" scatterlist has 1 element only and does not
need chaining.

Reads are entirely symmetric.  The hardware will read the request
descriptor from a 1-element scatterlist, and will write response+data
into an N+1-element scatterlist (the response descriptor precedes the
data that was read).  It can be treated in exactly the same way.  The
N+1-element scatterlist could also become a 2-element scatterlist with
chaining.

>> Except that if I call sg_chain and my
>> architecture does not define ARCH_HAS_SG_CHAIN, it will BUG out.  So I
>> need to keep all the scatterlist allocation and copying crap that I have
>> now within #ifdef, and it will bitrot.
> 
> except that with the correct design you don't call sg_chain you just do:
>   request_descriptor->sg_list = sg;

By the above it should be clear, that request_descriptor is not a
driver-private extension of the scsi_cmnd.  It is something passed to
the hardware.

>>> Well that can be done as well, (If done carefully) It should be easy to add
>>> chained fragments to both the end of a given chain just as at beginning.
>>> It only means that the last element of the appended-to chain moves to
>>> the next fragment and it's place is replaced by a link.
>>
>> But you cannot do that in constant time, can you?  And that means you do
>> not enjoy any benefit in terms of cache misses etc.
> 
> I did not understand "constant time" it is O(0) if that what you meant.

In the worst case it is a linked list, no?  So in the worst case
_finding_ the last element of the appended-to chain is O(n).

Actually, appending to the end is not a problem for virtio-scsi.  But
for example the virtio-blk spec places the response descriptor _after_
the input data.  I think this was a mistake, and I didn't repeat it for
virtio-scsi, but I cited it because in the past Rusty complained that
the long sglist implementation was "SCSI-centric".

>> Also, this assumes that I can modify the appended-to chain.  I'm not
>> sure I can do this?
> 
> Each case it's own. If the appended-to chain is const, yes you'll have
> to reallocate it and copy. Is that your case?

It will be virtio-blk's case, but we can ignore it.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
Mor

Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-25 Thread Paolo Bonzini
Il 25/07/2012 17:28, Boaz Harrosh ha scritto:
>> 1) what I get is a scsi_cmnd which contains an N-element scatterlist.
>>
>> 2) virtio-scsi has to build the "packet" that is passed to the hardware
>> (it does not matter that the hardware is virtual).  This packet (per
>> virtio-scsi spec) has an N+1-element scatterlist, where the first
>> element is a request descriptor (struct virtio_scsi_cmd_req), and the
>> others describe the written data.
> 
> Then "virtio-scsi spec" is crap. It overloads the meaning of
> "struct scatterlist" of the first element in an array. to be a
> "struct virtio_scsi_cmd_req".

What the holy fuck?  The first element simply _points_ to the "struct
virtio_scsi_cmd_req", just like subsequent elements point to the data.

And the protocol of the device is _not_ a struct scatterlist[].  The
virtio _API_ takes that array and converts to a series of physical
address + offset pairs.

> Since you need to change the standard to support chaining then
> it is a good time to fix this.

Perhaps it is a good time for you to read the virtio spec.  You are
making a huge confusion between the LLD->virtio interface and the
virtio->hardware interface.  I'm talking only of the former.

>> 3) virtio takes care of converting the "packet" from a scatterlist
>> (which currently must be a flat one) to the hardware representation.
>> Here a walk is inevitable, so we don't care about this walk.
> 
> "hardware representation" you mean aio or biovec, what ever the
> IO submission path uses at the host end?

No, I mean the way the virtio spec encodes the physical address + offset
pairs.

I stopped reading here.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-25 Thread Paolo Bonzini
Il 25/07/2012 21:16, Boaz Harrosh ha scritto:
> The picture confused me. It looked like the first element is the 
> virtio_scsi_cmd_req
> not an sgilist-element that points to the struct's buffer.
> 
> In that case then yes your plan of making a two-elements fragment that points 
> to the
> original scsi-sglist is perfect. All you have to do is that, and all you have 
> to do
> at virtio is use the sg_for_each macro and you are done.
> 
> You don't need any sglist allocation or reshaping. And you can easily support
> chaining. Looks like order of magnitude more simple then what you do now

It is.

> So what is the problem?

That not all architectures have ARCH_HAS_SG_CHAIN (though all those I
care about do).  So I need to go through all architectures and make sure
they use for_each_sg, or at least to change ARCH_HAS_SG_CHAIN to a
Kconfig define so that dependencies can be expressed properly.

> And BTW you won't need that new __sg_set_page API anymore.

Kind of.

   sg_init_table(sg, 2);
   sg_set_buf(sg[0], req, sizeof(req));
   sg_chain(sg[1], scsi_out(sc));

is still a little bit worse than

   __sg_set_buf(sg[0], req, sizeof(req));
   __sg_chain(sg[1], scsi_out(sc));

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-26 Thread Paolo Bonzini
Il 25/07/2012 23:04, Boaz Harrosh ha scritto:
>> That not all architectures have ARCH_HAS_SG_CHAIN (though all those I
>> care about do).  So I need to go through all architectures and make sure
>> they use for_each_sg, or at least to change ARCH_HAS_SG_CHAIN to a
>> Kconfig define so that dependencies can be expressed properly.
> 
> What is actually preventing ARCH_HAS_SG_CHAIN from all these ARCHES?
> is that the DMA drivers not using for_each_sg(). Sounds like easy
> to fix.
> 
> But yes a deep change would convert ARCH_HAS_SG_CHAIN to a Kconfig.
> 
> If you want to be lazy, like me, You might just put a BUILD_BUG_ON
> in code, requesting the user to disable the driver for this ARCH.
> 
> I bet there is more things to do at ARCH to enable virtualization
> then just support ARCH_HAS_SG_CHAIN. Be it just another requirement.

Actually, many arches run just fine with QEMU's binary code translation
(alpha, mips, coldfire).  And since it's already pretty slow, using
virtio to improve performance is nice even in that scenario (which does
not require any kernel change).  But it's just an icing on the cake
indeed.  I can live with a BUILD_BUG_ON or better a "depends on
HAVE_SG_CHAIN" for the time being.

>>> And BTW you won't need that new __sg_set_page API anymore.
>>
>> Kind of.
>>
>>sg_init_table(sg, 2);
>>sg_set_buf(sg[0], req, sizeof(req));
>>sg_chain(sg[1], scsi_out(sc));
>>
>> is still a little bit worse than
>>
>>__sg_set_buf(sg[0], req, sizeof(req));
>>__sg_chain(sg[1], scsi_out(sc));
> 
> 
> I believe they are the same, specially for the
> on the stack 2 elements array. Actually I think
> In both cases you need to at least call sg_init_table()
> once after allocation, No?

Because the scatterlist here would be allocated on the stack, I would
need to call it every time if I used sg_set_buf/sg_chain.

But sg_init_table is really just memset + set SG_MAGIC + sg_mark_end.
If you use the "full-init" APIs as above, you don't need any of those
(sg_chain undoes the flag changes in sg_mark_end and vice versa).

> But please for my sake do not call it __sg_chain. Call it
> something like sg_chain_not_end(). I hate those "__" which
> for god sack means what? 
> (A good name is when I don't have to read the code, an "__"
>  means "fuck you go read the code")

Right, better call them sg_init_buf/sg_init_page/sg_init_chain.

In the meanwhile, we still have a bug to fix, and we need to choose
between Sen Wang's v1 (sg_set_page) or v2 (value assignment).  I'm still
leaning more towards v2, if only because I already tested that one myself.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-26 Thread Paolo Bonzini
Il 26/07/2012 09:56, Boaz Harrosh ha scritto:
>> > In the meanwhile, we still have a bug to fix, and we need to choose
>> > between Sen Wang's v1 (sg_set_page) or v2 (value assignment).  I'm still
>> > leaning more towards v2, if only because I already tested that one myself.
> 
> It's your call, you know what I think ;-)
> 
> I Agree none of them are bugs in current code, they will both work
> just the same.

Cool.  Then, Sen, please fix the commit message in v2 and repost.

> Please CC me on the "convert to sg copy-less" patches, It looks interesting

Sure.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC V5 2/5] virtio_ring: move queue_index to vring_virtqueue

2012-07-26 Thread Paolo Bonzini
Il 05/07/2012 13:40, Sasha Levin ha scritto:
> @@ -275,7 +274,7 @@ static void vm_del_vq(struct virtqueue *vq)
> vring_del_virtqueue(vq);
>  
> /* Select and deactivate the queue */
> -   writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL);
> +   writel(virtqueue_get_queue_index(vq), vm_dev->base + 
> VIRTIO_MMIO_QUEUE_SEL);
> writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
>  

This accesses vq after vring_del_virtqueue has freed it.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-26 Thread Paolo Bonzini
Il 26/07/2012 09:58, Paolo Bonzini ha scritto:
> 
>> > Please CC me on the "convert to sg copy-less" patches, It looks interesting
> Sure.

Well, here is the gist of it (note it won't apply on any public tree,
hence no SoB yet).  It should be split in multiple changesets and you
can make more simplifications on top of it, because
virtio_scsi_target_state is not anymore variable-sized, but that's
secondary.

The patch includes the conversion of virtio_ring.c to use sg_next.
It is a bit ugly because of backwards-compatibility, it can be fixed
by going through all drivers; not that there are many.

I'm still a bit worried of the future of this feature though.  I would
be the first and (because of the non-portability) presumably sole user
for some time.  Someone axing chained sg-lists in the future does not
seem too unlikely.  So in practice I would rather just add to virtio a
function that takes two struct scatterlist ** (or an array of struct
scatterlist * + size pairs).  Converting to sg_chain once it takes off
would be trivial.

Paolo
>From 57f8d4a20cbe9b3b25e10cd0595d7ac102fc8f73 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini 
Date: Thu, 26 Jul 2012 09:58:14 +0200
Subject: [PATCH] virtio-scsi: use chained sg_lists

Using chained sg_lists simplifies everything a lot.
The scatterlists we pass to virtio are always of bounded size,
and can be allocated on the stack.  This means we do not need to
take tgt_lock and struct virtio_scsi_target_state does not have
anymore a flexible array at the end, so we can avoid a pointer
access.
---
 drivers/block/virtio_blk.c   |3 +
 drivers/scsi/virtio_scsi.c   |   93 ---
 drivers/virtio/virtio_ring.c |   93 +++
 include/linux/virtio.h   |8 +++
 4 files changed, 114 insertions(+), 83 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 13f7ccb..ef65ea1 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -66,9 +66,6 @@ struct virtio_scsi_target_state {
 	struct virtio_scsi_vq *req_vq;
 
 	atomic_t reqs;
-
-	/* For sglist construction when adding commands to the virtqueue.  */
-	struct scatterlist sg[];
 };
 
 /* Driver instance state */
@@ -362,20 +359,6 @@ static void virtscsi_event_done(struct virtqueue *vq)
 	spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
 };
 
-static void virtscsi_map_sgl(struct scatterlist *sg, unsigned int *p_idx,
-			 struct scsi_data_buffer *sdb)
-{
-	struct sg_table *table = &sdb->table;
-	struct scatterlist *sg_elem;
-	unsigned int idx = *p_idx;
-	int i;
-
-	for_each_sg(table->sgl, sg_elem, table->nents, i)
-		sg_set_buf(&sg[idx++], sg_virt(sg_elem), sg_elem->length);
-
-	*p_idx = idx;
-}
-
 /**
  * virtscsi_map_cmd - map a scsi_cmd to a virtqueue scatterlist
  * @vscsi	: virtio_scsi state
@@ -384,52 +367,57 @@ static void virtscsi_map_sgl(struct scatterlist *sg, unsigned int *p_idx,
  * @in_num	: number of write-only elements
  * @req_size	: size of the request buffer
  * @resp_size	: size of the response buffer
- *
- * Called with tgt_lock held.
  */
-static void virtscsi_map_cmd(struct virtio_scsi_target_state *tgt,
-			 struct virtio_scsi_cmd *cmd,
-			 unsigned *out_num, unsigned *in_num,
+static void virtscsi_map_cmd(struct virtio_scsi_cmd *cmd,
+			 struct scatterlist *sg_out,
+			 unsigned *out_num,
+			 struct scatterlist *sg_in,
+			 unsigned *in_num,
 			 size_t req_size, size_t resp_size)
 {
 	struct scsi_cmnd *sc = cmd->sc;
-	struct scatterlist *sg = tgt->sg;
-	unsigned int idx = 0;
+
+	sg_init_table(sg_out, 2);
+	sg_init_table(sg_in, 2);
 
 	/* Request header.  */
-	sg_set_buf(&sg[idx++], &cmd->req, req_size);
+	sg_set_buf(&sg_out[0], &cmd->req, req_size);
+	*out_num = 1;
 
 	/* Data-out buffer.  */
-	if (sc && sc->sc_data_direction != DMA_FROM_DEVICE)
-		virtscsi_map_sgl(sg, &idx, scsi_out(sc));
-
-	*out_num = idx;
+	if (sc && sc->sc_data_direction != DMA_FROM_DEVICE) {
+		struct sg_table *table = &scsi_out(sc)->table;
+		sg_chain(sg_out, 2, table->sgl);
+		*out_num += table->nents;
+	}
 
 	/* Response header.  */
-	sg_set_buf(&sg[idx++], &cmd->resp, resp_size);
+	sg_set_buf(&sg_in[0], &cmd->resp, resp_size);
+	*in_num = 1;
 
 	/* Data-in buffer */
-	if (sc && sc->sc_data_direction != DMA_TO_DEVICE)
-		virtscsi_map_sgl(sg, &idx, scsi_in(sc));
-
-	*in_num = idx - *out_num;
+	if (sc && sc->sc_data_direction != DMA_TO_DEVICE) {
+		struct sg_table *table = &scsi_in(sc)->table;
+		sg_chain(sg_in, 2, table->sgl);
+		*in_num += table->nents;
+	}
 }
 
-static int virtscsi_kick_cmd(struct virtio_scsi_target_state *tgt,
-			 struct virtio_scsi_vq *vq,
+static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq,
 			 struct virtio_scsi_cmd *cmd,
 			 si

[PATCH 2/2] virtio-scsi: support online resizing of disks

2012-07-26 Thread Paolo Bonzini
Support the LUN parameter change event.  Currently, the host fires this event
when the capacity of a disk is changed from the virtual machine monitor.
The resize then appears in the kernel log like this:

  sd 0:0:0:0: [sda] 46137344 512-byte logical blocks: (23.6 GB/22.0 GIb)
  sda: detected capacity change from 22548578304 to 23622320128

Signed-off-by: Paolo Bonzini 
---
 drivers/scsi/virtio_scsi.c  |   31 ++-
 include/linux/virtio_scsi.h |2 ++
 2 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 8b6b927..c1a5e60 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -279,6 +279,31 @@ static void virtscsi_handle_transport_reset(struct 
virtio_scsi *vscsi,
}
 }
 
+static void virtscsi_handle_param_change(struct virtio_scsi *vscsi,
+struct virtio_scsi_event *event)
+{
+   struct scsi_device *sdev;
+   struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
+   unsigned int target = event->lun[1];
+   unsigned int lun = (event->lun[2] << 8) | event->lun[3];
+   u8 asc = event->reason & 255;
+   u8 ascq = event->reason >> 8;
+
+   sdev = scsi_device_lookup(shost, 0, target, lun);
+   if (!sdev) {
+   pr_err("SCSI device %d 0 %d %d not found\n",
+   shost->host_no, target, lun);
+   return;
+   }
+
+   /* Handle "Parameters changed", "Mode parameters changed", and
+  "Capacity data has changed".  */
+   if (asc == 0x2a && (ascq == 0x00 || ascq == 0x01 || ascq == 0x09))
+   scsi_rescan_device(&sdev->sdev_gendev);
+
+   scsi_device_put(sdev);
+}
+
 static void virtscsi_handle_event(struct work_struct *work)
 {
struct virtio_scsi_event_node *event_node =
@@ -297,6 +322,9 @@ static void virtscsi_handle_event(struct work_struct *work)
case VIRTIO_SCSI_T_TRANSPORT_RESET:
virtscsi_handle_transport_reset(vscsi, event);
break;
+   case VIRTIO_SCSI_T_PARAM_CHANGE:
+   virtscsi_handle_param_change(vscsi, event);
+   break;
default:
pr_err("Unsupport virtio scsi event %x\n", event->event);
}
@@ -737,7 +765,8 @@ static struct virtio_device_id id_table[] = {
 };
 
 static unsigned int features[] = {
-   VIRTIO_SCSI_F_HOTPLUG
+   VIRTIO_SCSI_F_HOTPLUG,
+   VIRTIO_SCSI_F_CHANGE,
 };
 
 static struct virtio_driver virtio_scsi_driver = {
diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
index dc8d305..d6b4440 100644
--- a/include/linux/virtio_scsi.h
+++ b/include/linux/virtio_scsi.h
@@ -72,6 +72,7 @@ struct virtio_scsi_config {
 /* Feature Bits */
 #define VIRTIO_SCSI_F_INOUT0
 #define VIRTIO_SCSI_F_HOTPLUG  1
+#define VIRTIO_SCSI_F_CHANGE   2
 
 /* Response codes */
 #define VIRTIO_SCSI_S_OK   0
@@ -108,6 +109,7 @@ struct virtio_scsi_config {
 #define VIRTIO_SCSI_T_NO_EVENT 0
 #define VIRTIO_SCSI_T_TRANSPORT_RESET  1
 #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
+#define VIRTIO_SCSI_T_PARAM_CHANGE 3
 
 /* Reasons of transport reset event */
 #define VIRTIO_SCSI_EVT_RESET_HARD 0
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] virtio-scsi fixes for 3.6

2012-07-26 Thread Paolo Bonzini
James,

patch 1 fixes scanning of LUNs whose number is greater than 255.  QEMU
passes a max_lun of 16383 (because it uses SAM numbering) but in Linux
it must become 32768 (because LUNs above 255 are "relocated" to 16640).
Patch 2 is a resubmission of the patch for online resizing of virtio-scsi
LUNs, which needs to be rebased.

LUNs above 255 now work for all of scanning, hotplug, hotunplug and
resize.

Thanks,

Paolo

Paolo Bonzini (2):
  virtio-scsi: fix LUNs greater than 255
  virtio-scsi: support online resizing of disks

 drivers/scsi/virtio_scsi.c  |   37 +++--
 include/linux/virtio_scsi.h |2 ++
 2 files changed, 37 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] virtio-scsi: fix LUNs greater than 255

2012-07-26 Thread Paolo Bonzini
virtio-scsi needs to report LUNs greater than 256 using the "flat"
format.  Because the Linux SCSI layer just maps the SCSI LUN to
an u32, without any parsing, these end up in the range from 16640
to 32767.  Fix max_lun to account for the possibility that logical
unit numbers are encoded with the "flat" format.

Cc: 
Signed-off-by: Paolo Bonzini 
---
 drivers/scsi/virtio_scsi.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index c7030fb..8b6b927 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -677,7 +677,11 @@ static int __devinit virtscsi_probe(struct virtio_device 
*vdev)
cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1;
shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x;
-   shost->max_lun = virtscsi_config_get(vdev, max_lun) + 1;
+
+   /* LUNs > 256 are reported with format 1, so they go in the range
+* 16640-32767.
+*/
+   shost->max_lun = virtscsi_config_get(vdev, max_lun) + 1 + 0x4000;
shost->max_id = num_targets;
shost->max_channel = 0;
shost->max_cmd_len = VIRTIO_SCSI_CDB_SIZE;
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-27 Thread Paolo Bonzini
Il 27/07/2012 08:27, Rusty Russell ha scritto:
>> > +int virtqueue_add_buf_sg(struct virtqueue *_vq,
>> > +   struct scatterlist *sg_out,
>> > +   unsigned int out,
>> > +   struct scatterlist *sg_in,
>> > +   unsigned int in,
>> > +   void *data,
>> > +   gfp_t gfp)
> The point of chained scatterlists is they're self-terminated, so the
> in & out counts should be calculated.
> 
> Counting them is not *that* bad, since we're about to read them all
> anyway.
> 
> (Yes, the chained scatterlist stuff is complete crack, but I lost that
> debate years ago.)
> 
> Here's my variant.  Networking, console and block seem OK, at least
> (ie. it booted!).

I hate the for loops, even though we're about indeed to read all the
scatterlists anyway... all they do is lengthen critical sections.  Also,
being the first user of chained scatterlist doesn't exactly give me warm
fuzzies.

I think it's simpler if we provide an API to add individual buffers to
the virtqueue, so that you can do multiple virtqueue_add_buf_more
(whatever) before kicking the virtqueue.  The idea is that I can still
use indirect buffers for the scatterlists that come from the block layer
or from an skb, but I will use direct buffers for the request/response
descriptors.  The direct buffers are always a small number (usually 2),
so you can balance the effect by making the virtqueue bigger.  And for
small reads and writes, you save a kmalloc on a very hot path.

(BTW, scatterlists will have separate entries for each page; we do not
need this in virtio buffers.  Collapsing physically-adjacent entries
will speed up QEMU and will also help avoiding indirect buffers).

Paolo



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC V5 3/5] virtio: intorduce an API to set affinity for a virtqueue

2012-07-27 Thread Paolo Bonzini
Il 05/07/2012 12:29, Jason Wang ha scritto:
> Sometimes, virtio device need to configure irq affiniry hint to maximize the
> performance. Instead of just exposing the irq of a virtqueue, this patch
> introduce an API to set the affinity for a virtqueue.
> 
> The api is best-effort, the affinity hint may not be set as expected due to
> platform support, irq sharing or irq type. Currently, only pci method were
> implemented and we set the affinity according to:
> 
> - if device uses INTX, we just ignore the request
> - if device has per vq vector, we force the affinity hint
> - if the virtqueues share MSI, make the affinity OR over all affinities
>  requested
> 
> Signed-off-by: Jason Wang 

Hmm, I don't see any benefit from this patch, I need to use
irq_set_affinity (which however is not exported) to actually bind IRQs
to CPUs.  Example:

with irq_set_affinity_hint:
 43:   89  107  100   97   PCI-MSI-edge   virtio0-request
 44:  178  195  268  199   PCI-MSI-edge   virtio0-request
 45:   97  100   97  155   PCI-MSI-edge   virtio0-request
 46:  234  261  213  218   PCI-MSI-edge   virtio0-request

with irq_set_affinity:
 43:  721001   PCI-MSI-edge   virtio0-request
 44:0  74601   PCI-MSI-edge   virtio0-request
 45:00  6580   PCI-MSI-edge   virtio0-request
 46:001  547   PCI-MSI-edge   virtio0-request

I gathered these quickly after boot, but real benchmarks show the same
behavior, and performance gets actually worse with virtio-scsi
multiqueue+irq_set_affinity_hint than with irq_set_affinity.

I also tried adding IRQ_NO_BALANCING, but the only effect is that I
cannot set the affinity

The queue steering algorithm I use in virtio-scsi is extremely simple
and based on your tx code.  See how my nice pinning is destroyed:

# taskset -c 0 dd if=/dev/sda bs=1M count=1000 of=/dev/null iflag=direct
# cat /proc/interrupts
 43:  2690 2709 2691 2696   PCI-MSI-edge  virtio0-request
 44:   109  122  199  124   PCI-MSI-edge  virtio0-request
 45:   170  183  170  237   PCI-MSI-edge  virtio0-request
 46:   143  166  125  125   PCI-MSI-edge  virtio0-request

All my requests come from CPU#0 and thus go to the first virtqueue, but
the interrupts are serviced all over the place.

Did you set the affinity manually in your experiments, or perhaps there
is a difference between scsi and networking... (interrupt mitigation?)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   9   10   >