date:20140716


Il 16/07/2014 10:40, Alexander Graf ha scritto:





I think that there are two other ways to do it.


And what about the chardev way? Your silence in this regard is
misterious ;-)


The main problem with chardevs is that they are completely asynchronous. 
 So they may not be a good match for your use case.


(In fact, this is a problem for the i2c-over-chardev and especially 
spi-over-chardev ideas I shot out yesterday.  For i2c-over-chardev one 
could add support to clock stretching in QEMU, but SPI is entirely 
synchronous).


Note that -qmp and -qtest both use chardevs internally to connect the 
external program with QEMU.


Paolo



1) [...]
2) [...]


Thanks a lot for your suggestions. I will play a bit and come back when
I have some code to show.

[Qemu-devel] [PULL for-2.1] virtio-serial: bugfix for virtconsole port unplug

Hi,

This is a small bugfix to prevent port unplug causing port 0 getting
unreserved for virtconsole use.

Patch has been on-list for a few days and has been reviewed.

Please pull.


The following changes since commit 5a7348045091a2bc15d85bb177e5956aa6114e5a:

  Update version for v2.1.0-rc2 release (2014-07-15 18:55:37 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/qemu/amit/virtio-serial.git for-2.1

for you to fetch changes up to 57d84cf35302fe51789c18354bf09a521bb603df:

  virtio-serial-bus: keep port 0 reserved for virtconsole even on unplug 
(2014-07-16 14:32:40 +0530)


Amit Shah (1):
  virtio-serial-bus: keep port 0 reserved for virtconsole even on unplug

 hw/char/virtio-serial-bus.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)


Amit

[Qemu-devel] [PULL for-2.1] migration: detect section renames in vmstate static checker script

Hi,

A minor enhancement to the static checker script that detects section
renames.  One such rename was added recently (ICH9-LPC -> ICH9 LPC).

This doesn't affect the qemu code, and enhances a test program, so I
recommend we pull it for 2.1.  Patch has been on list for a few days.

The following changes since commit 5a7348045091a2bc15d85bb177e5956aa6114e5a:

  Update version for v2.1.0-rc2 release (2014-07-15 18:55:37 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/qemu/amit/migration.git for-2.1

for you to fetch changes up to 79fe16c0489ca658f53796206067a551fc915ba2:

  vmstate static checker: detect section renames (2014-07-16 14:29:34 +0530)


Amit Shah (1):
  vmstate static checker: detect section renames

 scripts/vmstate-static-checker.py | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)


Amit

[Qemu-devel] [PULL for-2.1] virtio-rng: Fix abort on invalid input

Hi,

This patch returns an error instead of aborting, which is desirable
not just for cmdline invocation, but prevents an abort in case of
device hotplug.

Patch is small, and reviewed on-list.

Please pull,

The following changes since commit 5a7348045091a2bc15d85bb177e5956aa6114e5a:

  Update version for v2.1.0-rc2 release (2014-07-15 18:55:37 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/qemu/amit/virtio-rng.git for-2.1

for you to fetch changes up to 9ef6be93250e46d35062c84d5c75c7cb515dc27c:

  virtio-rng: Add human-readable error message for negative max-bytes parameter 
(2014-07-16 14:25:29 +0530)


John Snow (1):
  virtio-rng: Add human-readable error message for negative max-bytes 
parameter

 hw/virtio/virtio-rng.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

Amit

Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging

2014-07-16 Thread Alex Bennée


Joakim Tjernlund writes:

> Alexander Graf  wrote on 2014/07/12 02:39:21:
>> 
>> 
>> On 11.07.14 20:22, Peter Maydell wrote:
>> > On 11 July 2014 19:15, Joakim Tjernlund 
>  wrote:
>> >> Peter Maydell  wrote on 2014/07/11 
> 19:14:25:
>> >>> On 11 July 2014 16:18, Joakim Tjernlund 
> 
>> >> wrote:
>>  ppc logs every type of Invalid instruction. This generates a lot
>> >>> Rather than just deleting this EXCP_DUMP, I would suggest
>> >>> changing the EXCP_DUMP macro so  it only does anything
>> >>> if the user has passed the "-d int" debug logging flag:
>> >> I don't think ppc wants that. They want unconditionally
>> >> debug on to get relevant bug reports. This one is getting in the
>> >> way of normal operations so I think it should be deleted.
>> > If the PPC maintainers want that behaviour then they need
>> > to defend it. No other architecture's linux-user code spews
>> > junk to stderr for exceptions, and PPC shouldn't either.
>> > The debug log switches are exactly for allowing us to
>> > say "please turn on debug logging" when bugs are reported,
>> > and those are what we should use.
>> 
>> I agree - and it's how we behave in system emulation mode already :).
>> 
>> What do the other platforms do on illegal instructions during user mode? 
>
>> Any way we can get consistency across the board?
>
> In this case it is not an illegal insn, it is a valid insn but not just
> for the QEMU emulated CPU.

On aarch64 we do:

#define unsupported_encoding(s, insn)\
do { \
qemu_log_mask(LOG_UNIMP, \
  "%s:%d: unsupported instruction encoding 0x%08x "  \
  "at pc=%016" PRIx64 "\n",  \
  __FILE__, __LINE__, insn, s->pc - 4);  \
unallocated_encoding(s); \
} while (0);

So we signal it's unimplemented before falling through to the signal
generating code.

-- 
Alex Bennée

Re: [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> Il 07/07/2014 16:35, Dr. David Alan Gilbert ha scritto:
> >* Paolo Bonzini (pbonz...@redhat.com) wrote:
> >>Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> >>>From: "Dr. David Alan Gilbert" 
> >>>
> >>>Postcopy needs to have two migration streams loading concurrently;
> >>>one from memory (with the device state) and the other from the fd
> >>>with the memory transactions.
> >>
> >>Can you explain this?
> >>
> >>I would have though the order is
> >>
> >>precopy RAM and everything
> >>prepare postcopy RAM ("sent && dirty" bitmap)
> >>finish precopy non-RAM
> >>finish devices
> >>postcopy RAM
> >>
> >>Why do you need to have all the packaging stuff and a separate memory-based
> >>migration stream for devices?  I'm sure I'm missing something. :)
> >
> >The thing you're missing is the details of 'finish devices'.
> >The device emulation may access guest memory as part of loading it's
> >state, so you can't successfully complete 'finish devices' without
> >having the 'postcopy RAM' available to provide pages.
> 
> I see.  Can you document the flow (preferrably as a reply to this email
> _and_ in docs/ when you send v2 of the code :))?

I thought I documented enough in the docs/migration.txt stuff in the last
patch (see the 'Postcopy states' section); however lets see if I the following
is better:

Postcopy stream

Loading of device data may cause the device emulation to access guest RAM
that may trigger faults that have to be resolved by the source, as such
the migration stream has to be able to respond with page data *during* the
device load, and hence the device data has to be read from the stream completely
before the device load begins to free the stream up.  This is acheived by
'packaging' the device data into a blob that's read in one go.

Source behaviour

Until postcopy is entered the migration stream is identical to normal postcopy,
except for the addition of a 'postcopy advise' command at the beginning to
let the destination know that postcopy might happen.  When postcopy starts
the source sends the page discard data and then forms the 'package' containing:

   Command: 'postcopy ram listen'
   The device state
  A series of sections, identical to the precopy streams device state stream
  containing everything except postcopiable devices (i.e. RAM)
   Command: 'postcopy ram run'

The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
contents are formatted in the same way as the main migration stream.

Destination behaviour

Initially the destination looks the same as precopy, with a single thread
reading the migration stream; the 'postcopy advise' and 'discard' commands
are processed to change the way RAM is managed, but don't affect the stream
processing.

--
1  2   3 4 5  6   7
main -DISCARD-CMD_PACKAGED ( LISTEN  DEVICE DEVICE DEVICE RUN )
thread |   |
   | (page request)   
   |\___
   v\
listen thread: --- page -- page -- page -- page -- page --

   a   bc
--

On receipt of CMD_PACKAGED (1)
   All the data associated with the package - the ( ... ) section in the
diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
recurses into qemu_loadvm_state_main to process the contents of the package (2)
which contains commands (3,6) and devices (4...)

On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package)
a new thread (a) is started that takes over servicing the migration stream,
while the main thread carries on loading the package.   It loads normal
background page data (b) but if during a device load a fault happens (5) the
returned page (c) is loaded by the listen thread allowing the main threads
device load to carry on.

The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the 
destination
CPUs start running.
At the end of the CMD_PACKAGED (7) the main thread returns to normal running 
behaviour
and is no longer used by migration, while the listen thread carries
on servicing page data until the end of migration.

Is that any better?

Dave
P.S. I know of at least one bug in this code at the moment, it happens
on a VM that doesn't have many dirty pages where all the pages are
transmitted, and hence the listen thread finishes, before the main thread
gets to 'run'.

> From my cursory read of the code it is something like this on the source:
> 
> finish precopy non-RAM
> start RAM postcopy
> for each device
> pack up data
> send it to destination
> 
> and on the destination:
> 
>

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> >From: "Dr. David Alan Gilbert" 
> >
> >Postcopy needs a method to send messages from the destination back to
> >the source, this is the 'return path'.
> >



> >+/* Give a QEMUFile* off the same socket but data in the opposite
> >+ * direction.
> >+ * qemu_fopen_socket marks write fd's as blocking, but doesn't
> >+ * touch read fd's status, so we dup the fd just to keep settings
> >+ * separate. [TBD: Do I need to explicitly mark as non-block on read?]
> >+ */
> >+static QEMUFile *socket_dup_return_path(void *opaque)
> >+{
> >+QEMUFileSocket *qfs = opaque;
> >+int revfd;
> >+bool this_is_read;
> >+QEMUFile *result;
> >+
> >+/* If it's already open, return it */
> >+if (qfs->file->return_path) {
> >+return qfs->file->return_path;
> 
> Wouldn't this leave a dangling file descriptor if you call
> socket_dup_return_path twice, and then close the original QEMUFile?

Hmm - how?

> >+}
> >+
> >+if (qemu_file_get_error(qfs->file)) {
> >+/* If the forward file is in error, don't try and open a return */
> >+return NULL;
> >+}
> >+
> >+/* I don't think there's a better way to tell which direction 'this' is 
> >*/
> >+this_is_read = qfs->file->ops->get_buffer != NULL;
> >+
> >+revfd = dup(qfs->fd);
> >+if (revfd == -1) {
> >+error_report("Error duplicating fd for return path: %s",
> >+  strerror(errno));
> >+return NULL;
> >+}
> >+
> >+qemu_set_nonblock(revfd);
> 
> Blocking/nonblocking is per-file *description*, not descriptor.  So you're
> making the original QEMUFile nonblocking as well.  Can you explain why this
> is needed before I reach the meat of the patch series?

Yes, I went through that a few times until I got that it was per-entity not
the fd itself, it still makes life easier for the rest of the QEMUFile
code to have a separate fd for it (hence still worth the dup).

> In other words, can you draw a table with source/dest and read/write, and
> whether it should be blocking or non-blocking?

Sure; the non-blocking ness is mostly on the source side;
modifying the table in the docs patch a little:

  Source side
 Forward path - written by migration thread
: It's OK for this to be blocking, but we end up with it being
  non-blocking, and modify the socket code to emulate blocking.

 Return path  - opened by main thread, read by fd_handler on main thread
: Must be non-blocking so as not to block the main thread while
  waiting for a partially sent command.

  Destination side
 Forward path - read by main thread
 Return path  - opened by main thread, written by main thread AND postcopy
thread (protected by rp_mutex)

 I think I'm OK with both these being blocking.

Dave

> 
> Paolo
> 
> >+result = qemu_fopen_socket(revfd, this_is_read ? "wb" : "rb");
> >+qfs->file->return_path = result;
> >+
> >+if (result) {
> >+/* We are the reverse path of our reverse path (although I don't
> >+   expect this to be used, it would stop another dup if it was */
> >+result->return_path = qfs->file;
> >+} else {
> >+close(revfd);
> >+}
> >+
> >+return result;
> >+}
> >+
> > static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int 
> > iovcnt,
> > int64_t pos)
> > {
> >@@ -313,17 +361,31 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
> > }
> >
> > static const QEMUFileOps socket_read_ops = {
> >-.get_fd = socket_get_fd,
> >-.get_buffer = socket_get_buffer,
> >-.close =  socket_close
> >+.get_fd  = socket_get_fd,
> >+.get_buffer  = socket_get_buffer,
> >+.close   = socket_close,
> >+.get_return_path = socket_dup_return_path
> > };
> >
> > static const QEMUFileOps socket_write_ops = {
> >-.get_fd = socket_get_fd,
> >-.writev_buffer = socket_writev_buffer,
> >-.close =  socket_close
> >+.get_fd  = socket_get_fd,
> >+.writev_buffer   = socket_writev_buffer,
> >+.close   = socket_close,
> >+.get_return_path = socket_dup_return_path
> > };
> >
> >+/*
> >+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
> >+ * NULL if not available
> >+ */
> >+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
> >+{
> >+if (!f->ops->get_return_path) {
> >+return NULL;
> >+}
> >+return f->ops->get_return_path(f->opaque);
> >+}
> >+
> > bool qemu_file_mode_is_not_valid(const char *mode)
> > {
> > if (mode == NULL ||
> >
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets


Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:


+
+/* If it's already open, return it */
+if (qfs->file->return_path) {
+return qfs->file->return_path;


Wouldn't this leave a dangling file descriptor if you call
socket_dup_return_path twice, and then close the original QEMUFile?


Hmm - how?


The problem is that there is no reference count on QEMUFile, so if you do

  f1 = open_return_path(f0);
  f2 = open_return_path(f0);
  /* now f1 == f2 */
  qemu_fclose(f1);
  /* now f2 is dangling */

The remark about "closing the original QEMUFile" is also related to this 
part:


if (result) {
/* We are the reverse path of our reverse path (although I don't
   expect this to be used, it would stop another dup if it was /
result->return_path = qfs->file;

which has a similar bug

  f1 = open_return_path(f0);
  f2 = open_return_path(f1);
  /* now f2 == f0 */
  qemu_fclose(f0);
  /* now f2 is dangling */

If this is correct, the simplest fix is to drop the optimization.



  Source side
 Forward path - written by migration thread
: It's OK for this to be blocking, but we end up with it being
  non-blocking, and modify the socket code to emulate blocking.


This likely has a performance impact though.  The first migration thread 
code drop from Juan already improved throughput a lot, even if it kept 
the iothread all the time and only converted from nonblocking writes to 
blocking.



 Return path  - opened by main thread, read by fd_handler on main thread
: Must be non-blocking so as not to block the main thread while
  waiting for a partially sent command.


Why can't you handle this in the migration thread (or a new postcopy 
thread on the source side)?  Then it can stay blocking.



  Destination side
 Forward path - read by main thread


This must be nonblocking so that the monitor keeps responding.


 Return path  - opened by main thread, written by main thread AND postcopy
thread (protected by rp_mutex)


When does the main thread needs to write?

If it doesn't need that, you can just switch to blocking when you 
process the listen command (i.e. when the postcopy thread starts).


Paolo


 I think I'm OK with both these being blocking.

Re: [Qemu-devel] [PATCH] Tap: fix vcpu long time io blocking on tap

2014-07-16 Thread Wangkai (Kevin,C)



> -Original Message-
> From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> Sent: Tuesday, July 15, 2014 11:00 PM
> To: Wangkai (Kevin,C)
> Cc: Stefan Hajnoczi; Lee yang; qemu-devel@nongnu.org;
> aligu...@amazon.com
> Subject: Re: [Qemu-devel] [PATCH] Tap: fix vcpu long time io blocking
> on tap
> 
> On Mon, Jul 14, 2014 at 10:44:58AM +, Wangkai (Kevin,C) wrote:
> > Here the detail network:
> >
> > ++
> > | The host add tap1 and eth10 to bridge 'br1'| +-
> ---+
> > | ++ | |
> send  |
> > | |   VM  eth1-+-tap1 --- bridge --- eth10 --+-+
> > | | packets|
> > | ++ | |
> |
> > ++
> > ++ ++
> >
> > Qemu start vm by virtio, use tap interface, option is:
> > -net nic,vlan=101, model=virtio -net
> > tap,vlan=101,ifname=tap1,script=no,downscript=no
> 
> Use the newer -netdev/-device syntax to get offload support and
> slightly better performance:
> 
> -netdev tap,id=tap0,ifname=tap1,script=no,downscript=no \ -device
> virtio-net-pci,netdev=tap0
> 
> > And add tap1 and eth10 to bridge br1 in the host:
> > Brctl addif br1 tap1
> > Brctl addif br1 eth10
> >
> > total recv 505387 time 2000925 us:
> > means call tap_send once dealing 505387 packets, the packet payload
> > was 300 bytes, and time use for tap_send() was 2,000,925
> > micro-seconds, time was measured by record time stamp at function
> tap_send() start and end.
> >
> > We just test the performance of VM.
> 
> That is 150 MB of incoming packets in a single tap_send().  Network rx
> queues are maybe a few 1000 packets so I wonder what is going on here.
> 
> Maybe more packets are arriving while QEMU is reading them and we keep
> looping.  That's strange though because the virtio-net rx virtqueue
> should fill up (it only has 256 entries).
> 
> Can you investigate more and find out exactly what is going on?  It's
> not clear yet that adding a budget is the solution or just hiding a
> deeper problem.
> 
> Stefan
[Wangkai (Kevin,C)] 


Hi Stefan,

Yes, I have one machine keep sending packets 300M/second, about 1M packets 
per second, I am checking the code, you mean virtqueue has only 256 entries,
is this member keep the value? 
vq->vring.num

I will check and add some debug info to check this problem.

I have got the most time taking by the virtio_net_receive()
virtio_net_receive
qemu_deliver_packet
qemu_net_queue_send
net_hub_port_receive
qemu_deliver_packet
qemu_net_queue_send
tap_send
qemu_iohandler_poll
main_loop_wait
main_loop
main

regards
Wangkai

[Qemu-devel] [Bug 1185228] Re: with 'monitor pty', it needs to flush pts device after sending command to it

This is not a bug.  What happens is that QEMU echoes the command you
type onto the PTY. If you do not do a "cat /dev/pts/8", the PTY's buffer
gets full and QEMU stops reading from the PTY until it is emptied.

It doesn't happen if you use QMP.

** Changed in: qemu
   Status: New => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1185228

Title:
  with 'monitor pty', it needs to flush pts device after sending command
  to it

Status in QEMU:
  Invalid

Bug description:
  Environment:
  
  Host OS (ia32/ia32e/IA64):ia32e
  Guest OS (ia32/ia32e/IA64):ia32e
  Guest OS Type (Linux/Windows):Linux
  kvm.git Commit: 660696d1d16a71e15549ce1bf74953be1592bcd3
  qemu.git Commit: 64afc2b4d48fb21e085517c38a59a3f61a11283c
  Host Kernel Version:3.9.0-rc3
  Hardware: SandyBridge

  Bug detailed description:
  --
  when creating guest with parameter "-monitor pty", then send command (e.g. 
migrate, device_add) to the pts device.  If I don't flush the pts device (e.g. 
using cat /dev/pts/8) to flush the pts device, the monitor command doesn't take 
effect. I know some old qemu (e.g. 1.3) didn't have this issue.

  
  create KVM guest:
  [root@vt-snb9 ~]# qemu-system-x86_64 -enable-kvm -m 1024 -smp 2  -net none 
/root/cathy/rhel6u4.qcow -monitor pty
  char device redirected to /dev/pts/8 (label compat_monitor0)
  VNC server running on `::1:5900'

  hot plug a PCI device:
  [root@vt-snb9 ~]# echo "device_add pci-assign,host=05:10.0,id=nic" >/dev/pts/8
   (here, the device cannot hot plug to the guest.)

  if you run command "cat /dev/pts/8", the device will hot plug to the
  guest.

  hot remove the device:
  [root@vt-snb9 ~]# echo "device_del nic">/dev/pts/8
  (here, the device cannot hot remove from the guest.)

  if you run command "cat /dev/pts/8" the device can hot remove from the
  guest


  Reproduce steps:
  
  1. qemu-system-x86_64 -enable-kvm -m 1024 -smp 2  -net none 
/root/cathy/rhel6u4.qcow -monitor pty
  2. echo "device_add pci-assign,host=05:10.0,id=nic" >/dev/pts/8
  (or using 'migrate' command in qemu monitor redirection device /dev/pts/8 )

  
  Current result:
  
  device can not hot plug or hot remove from the guest before exec 'cat 
/dev/pts/8'.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1185228/+subscriptions

[Qemu-devel] [PATCH 0/2] virtio-serial: Prevent addition of ports with same name

Hi,

The 2nd patch prevents adding ports to the system with conflicts in
the 'name' parameter.  This is done by going through all the ports in
the system, including ports on other virtio-serial devices.

The first patch prepares for this by creating a linked list of all
available virtio-serial devices, so they can be walked over to check
their ports' names.

Please review.

Amit Shah (2):
  virtio-serial: create a linked list of all active devices
  virtio-serial: search for duplicate port names before adding new ports

 hw/char/virtio-serial-bus.c   | 32 
 include/hw/virtio/virtio-serial.h |  2 ++
 2 files changed, 34 insertions(+)

-- 
1.9.3

[Qemu-devel] [PATCH 1/2] virtio-serial: create a linked list of all active devices

To ensure two virtserialports don't get added to the system with the
same 'name' parameter, we need to access all the ports on all the
devices added, and compare the names.

We currently don't have a list of all VirtIOSerial devices added to the
system.  This commit adds a simple linked list in which devices are put
when they're initialized, and removed when they go away.

Signed-off-by: Amit Shah 
---
 hw/char/virtio-serial-bus.c   | 10 ++
 include/hw/virtio/virtio-serial.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 07bebc0..8c26f4e 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -26,6 +26,10 @@
 #include "hw/virtio/virtio-serial.h"
 #include "hw/virtio/virtio-access.h"
 
+struct VirtIOSerialDevices {
+QLIST_HEAD(, VirtIOSerial) devices;
+} vserdevices;
+
 static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, uint32_t id)
 {
 VirtIOSerialPort *port;
@@ -975,6 +979,8 @@ static void virtio_serial_device_realize(DeviceState *dev, 
Error **errp)
  */
 register_savevm(dev, "virtio-console", -1, 3, virtio_serial_save,
 virtio_serial_load, vser);
+
+QLIST_INSERT_HEAD(&vserdevices.devices, vser, next);
 }
 
 static void virtio_serial_port_class_init(ObjectClass *klass, void *data)
@@ -1003,6 +1009,8 @@ static void virtio_serial_device_unrealize(DeviceState 
*dev, Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOSerial *vser = VIRTIO_SERIAL(dev);
 
+QLIST_REMOVE(vser, next);
+
 unregister_savevm(dev, "virtio-console", vser);
 
 g_free(vser->ivqs);
@@ -1027,6 +1035,8 @@ static void virtio_serial_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
 
+QLIST_INIT(&vserdevices.devices);
+
 dc->props = virtio_serial_properties;
 set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
 vdc->realize = virtio_serial_device_realize;
diff --git a/include/hw/virtio/virtio-serial.h 
b/include/hw/virtio/virtio-serial.h
index 4746312..a679e54 100644
--- a/include/hw/virtio/virtio-serial.h
+++ b/include/hw/virtio/virtio-serial.h
@@ -202,6 +202,8 @@ struct VirtIOSerial {
 
 QTAILQ_HEAD(, VirtIOSerialPort) ports;
 
+QLIST_ENTRY(VirtIOSerial) next;
+
 /* bitmap for identifying active ports */
 uint32_t *ports_map;
 
-- 
1.9.3

[Qemu-devel] [PATCH 2/2] virtio-serial: search for duplicate port names before adding new ports

Before adding new ports to VirtIOSerial devices, check if there's a
conflict in the 'name' parameter.  This ensures two virtserialports with
identical names are not initialized.

Reported-by: 
Signed-off-by: Amit Shah 
---
 hw/char/virtio-serial-bus.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 8c26f4e..76e4228 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -56,6 +56,22 @@ static VirtIOSerialPort *find_port_by_vq(VirtIOSerial *vser, 
VirtQueue *vq)
 return NULL;
 }
 
+static VirtIOSerialPort *find_port_by_name(char *name)
+{
+VirtIOSerial *vser;
+
+QLIST_FOREACH(vser, &vserdevices.devices, next) {
+VirtIOSerialPort *port;
+
+QTAILQ_FOREACH(port, &vser->ports, next) {
+if (!strcmp(port->name, name)) {
+return port;
+}
+}
+}
+return NULL;
+}
+
 static bool use_multiport(VirtIOSerial *vser)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(vser);
@@ -847,6 +863,12 @@ static void virtser_port_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
+if (find_port_by_name(port->name)) {
+error_setg(errp, "virtio-serial-bus: A port already exists by name %s",
+   port->name);
+return;
+}
+
 if (port->id == VIRTIO_CONSOLE_BAD_ID) {
 if (plugging_port0) {
 port->id = 0;
-- 
1.9.3

Re: [Qemu-devel] [PATCH] linux-user: Add binfmt wrapper

2014-07-16 Thread Joakim Tjernlund

Riku Voipio  wrote on 2014/07/16 08:54:45:
> 
> On Tue, Jul 15, 2014 at 05:11:48PM +0200, Joakim Tjernlund wrote:
> > Riku Voipio  wrote on 2014/07/15 16:12:26:
> > > On Mon, Jul 14, 2014 at 05:38:49PM +0200, Joakim Tjernlund wrote:
> > > > Alexander Graf  wrote on 2014/07/14 17:21:33:
> > > > > On 14.07.14 16:38, Joakim Tjernlund wrote:
> > > > > > The popular binfmt-wrapper patch adds an additional
> > > > > > executable which mangle argv suitable for binfmt flag P.
> > > > > > In a chroot you need the both (statically linked) qemu-$arch
> > > > > > and qemu-$arch-binfmt-wrapper. This is sub optimal and a
> > > > > > better approach is to recognize the -binfmt-wrapper extension
> > > > > > within linux-user(qemu-$arch) and mangle argv there.
> > > > > > This just produces on executable which can be either copied to
> > > > > > the chroot or bind mounted with the appropriate 
-binfmt-wrapper
> > > > > > suffix.
> > > > > >
> > > > > > Signed-off-by: Joakim Tjernlund 

> > > > > 
> > > > > Please make sure to CC Riku on patches like this - he is the 
> > linux-user 
> > > > > maintainer.
> > > > 
> > > > Doesn't he read the devel list? Anyhow CC:ed
> > > 
> > > I do - but CC gets directly to my inbox while qemu-devel goes to an
> > > folder.
> > > 
> > > I take from this discussion, that this patch has been superceded by 
the
> > > Patch at: http://patchwork.ozlabs.org/patch/369770/ ?
> > 
> > BTW, any chance qemu binfmt could fixup the ps output from within a 
> > container:
> >  jocke-ppc2 ~ # ps uaxww
> > USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME 
COMMAND
> > root 1  0.1  0.0 4138016 7600 ?Ss   17:02   0:00 
> > /usr/bin/qemu-ppc /sbin/init /sbin/init
> > root79  0.0  0.0 4138016 5792 ?Ss   17:02   0:00 
> > /usr/bin/qemu-ppc /usr/sbin/sshd /usr/sbin/sshd
> > root   293  0.0  0.0 4137952 4072 ?Ss   17:02   0:00 
> > /usr/bin/qemu-ppc /bin/busybox /bin/busybox udhcpc -x 
hostname:jocke-ppc2 
> > --interface=eth0 --now --script=/lib/netifrc/sh/udhcpc-hook.sh 
> > --pidfile=/var/run/udhcpc-eth0.pid
> > root   334  0.3  0.0 4138016 5964 tty1 Ss+  17:02   0:00 
> > /usr/bin/qemu-ppc /sbin/agetty /sbin/agetty 38400 tty1 linux
> > root   335  3.1  0.0 4138048 7064 console  Ss   17:02   0:00 
> > /usr/bin/qemu-ppc /bin/login /bin/login -- root
> > root   336  3.3  0.0 4138016 9764 console  S17:02   0:00 
> > /usr/bin/qemu-ppc /bin/bash -bash
> > root   340  0.0  0.0 4138016 6336 ?R+   Jul10   0:00 
/bin/ps 
> > ps uaxww
> > 
> > As you can see, qemu-ppc is visible. 
> 
> This isn't something binfmt could do. ps uses /proc/$pid/exe to map the
> right binary. However, qemu already fixes /proc/self/exe to point to
> right file - as you can see from the last line where "ps uaxww" doesn't
> have qemu shown. So it would be possible to make do_open() in syscall.c 
do
> similar mapping for /proc/$pid/exe. 
> 
> Exactly how and when the qemu processes should be hidden within qemu
> needs careful thought thou. A naive approach would check if
> /proc/$pid/exe points to same binary as /proc/self/exe, hiding at least
> same architecture qemu processes.

Took a look at do_open(), what horror to read what you do there.
How on earth is this supposed to work?
You don't separate normal qemu invocation from binfmt, only adjust
"self" stuff, always only remove one entry(binfmt P flag is 2 entries)

Reasonably this hiding should only be performed when started by binfmt?
What are the other uses for ?

 Jocke

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:
> >>>
> >>>+
> >>>+/* If it's already open, return it */
> >>>+if (qfs->file->return_path) {
> >>>+return qfs->file->return_path;
> >>
> >>Wouldn't this leave a dangling file descriptor if you call
> >>socket_dup_return_path twice, and then close the original QEMUFile?
> >
> >Hmm - how?
> 
> The problem is that there is no reference count on QEMUFile, so if you do
> 
>   f1 = open_return_path(f0);
>   f2 = open_return_path(f0);
>   /* now f1 == f2 */
>   qemu_fclose(f1);
>   /* now f2 is dangling */

I think from the way I'm using it, I can remove the optimisation, but I do
need to check.

I'm not too sure what your worry is about 'f2' in this case; I guess the caller
needs to know that it should only close the return path once - is that
your worry?

> The remark about "closing the original QEMUFile" is also related to this
> part:
> 
> if (result) {
> /* We are the reverse path of our reverse path (although I don't
>expect this to be used, it would stop another dup if it was /
> result->return_path = qfs->file;
> 
> which has a similar bug
> 
>   f1 = open_return_path(f0);
>   f2 = open_return_path(f1);
>   /* now f2 == f0 */
>   qemu_fclose(f0);
>   /* now f2 is dangling */
> 
> If this is correct, the simplest fix is to drop the optimization.

I'm more nervous about dropping that one, because the current scheme
does provide a clean way of finding the forward path if you've got the
reverse (although I don't think I make use of it).

> >  Source side
> > Forward path - written by migration thread
> >: It's OK for this to be blocking, but we end up with it being
> >  non-blocking, and modify the socket code to emulate blocking.
> 
> This likely has a performance impact though.  The first migration thread
> code drop from Juan already improved throughput a lot, even if it kept the
> iothread all the time and only converted from nonblocking writes to
> blocking.

Can you give some more reasoning as to why you think this will hit the
performance so much; I thought the output buffers were quite big anyway.

> > Return path  - opened by main thread, read by fd_handler on main thread
> >: Must be non-blocking so as not to block the main thread while
> >  waiting for a partially sent command.
> 
> Why can't you handle this in the migration thread (or a new postcopy thread
> on the source side)?  Then it can stay blocking.

Handling it within the migration thread would make it much more complicated
(which would be bad since it's already complex enough);

A return path thread on the source side, hmm yes that could do it  - especially
since the migration thread is already a separate thread from the main thread
handling this and thus already needs locking paraphernalia.

> >  Destination side
> > Forward path - read by main thread
> 
> This must be nonblocking so that the monitor keeps responding.

Interesting, I suspect none of the code in there is set up for that at the
moment, so how does that work during migration at the moment?

Actually, I see I missed something here; this should be:

   Destination side
 Forward path - read by main thread, and listener thread (see the
 separate mail that described that listner thread)

and that means that once postcopy is going (and the listener thread started)
it can't block the monitor.

> > Return path  - opened by main thread, written by main thread AND 
> > postcopy
> >thread (protected by rp_mutex)
> 
> When does the main thread needs to write?

Not much; the only things the main thread currently responds to are the
ReqAck (ping like) requests; those are turning out to be very useful during 
debug;
I've also got the ability for the destination to send a migration result back 
to the
source which seems useful to be able to 'fail' early.

> If it doesn't need that, you can just switch to blocking when you process
> the listen command (i.e. when the postcopy thread starts).

Why don't I just do it anyway? Prior to postcopy starting we're in the same
situation as we're in with precopy today, so can already get mainblock 
threading.

Dave

> Paolo
> 
> > I think I'm OK with both these being blocking.
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Marcelo Tosatti

On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
> On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti  wrote:
> > On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
> >> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini  wrote:
> >> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
> >> >
> >> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti 
> >> >> wrote:
> >> >>>
> >> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
> >> 
> >>  On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov 
> >>  wrote:
> >> >
> >> > On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah 
> >> > wrote:
> >> >>
> >> >> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
> >> >>>
> >> >>> Hello,
> >> >>>
> >> >>> the issue is not specific to the iothread code because generic
> >> >>> virtio-blk also hangs up:
> >> >>
> >> >>
> >> >> Do you know which version works well?  If you could bisect, that'll
> >> >> help a lot.
> >> >>
> >> >> Thanks,
> >> >> Amit
> >> >
> >> >
> >> > Hi,
> >> >
> >> > 2.0 works definitely well. I`ll try to finish bisection today, though
> >> > every step takes about 10 minutes to complete.
> >> 
> >> 
> >>  Yay! It is even outside of virtio-blk.
> >> 
> >>  commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
> >>  Author: Marcelo Tosatti 
> >>  Date:   Tue Jun 3 13:34:48 2014 -0300
> >> 
> >>  kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
> >>  calculation
> >> 
> >>  Ensure proper env->tsc value for kvmclock_current_nsec 
> >>  calculation.
> >> 
> >>  Reported-by: Marcin Gibuła 
> >>  Cc: qemu-sta...@nongnu.org
> >>  Signed-off-by: Marcelo Tosatti 
> >>  Signed-off-by: Paolo Bonzini 
> >> >>>
> >> >>>
> >> >>> Andrey,
> >> >>>
> >> >>> Can you please provide instructions on how to create reproducible
> >> >>> environment?
> >> >>>
> >> >>> The following patch is equivalent to the original patch, for
> >> >>> the purposes of fixing the kvmclock problem.
> >> >>>
> >> >>> Perhaps it becomes easier to spot the reason for the hang you are
> >> >>> experiencing.
> >> >>>
> >> >>>
> >> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> >> >>> index 272a88a..feb5fc5 100644
> >> >>> --- a/hw/i386/kvm/clock.c
> >> >>> +++ b/hw/i386/kvm/clock.c
> >> >>> @@ -17,7 +17,6 @@
> >> >>>  #include "qemu/host-utils.h"
> >> >>>  #include "sysemu/sysemu.h"
> >> >>>  #include "sysemu/kvm.h"
> >> >>> -#include "sysemu/cpus.h"
> >> >>>  #include "hw/sysbus.h"
> >> >>>  #include "hw/kvm/clock.h"
> >> >>>
> >> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState 
> >> >>> *s)
> >> >>>
> >> >>>  cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
> >> >>>
> >> >>> -assert(time.tsc_timestamp <= migration_tsc);
> >> >>>  delta = migration_tsc - time.tsc_timestamp;
> >> >>>  if (time.tsc_shift < 0) {
> >> >>>  delta >>= -time.tsc_shift;
> >> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
> >> >>> int running,
> >> >>>  if (s->clock_valid) {
> >> >>>  return;
> >> >>>  }
> >> >>> -
> >> >>> -cpu_synchronize_all_states();
> >> >>>  ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> >> >>>  if (ret < 0) {
> >> >>>  fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
> >> >>> strerror(ret));
> >> >>> diff --git a/migration.c b/migration.c
> >> >>> index 8d675b3..34f2325 100644
> >> >>> --- a/migration.c
> >> >>> +++ b/migration.c
> >> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
> >> >>>  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> >> >>>  old_vm_running = runstate_is_running();
> >> >>>
> >> >>> +cpu_synchronize_all_states();
> >> >>>  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> >> >>>  if (ret >= 0) {
> >> >>>  qemu_file_set_rate_limit(s->file, INT64_MAX);
> >> >
> >> >
> >> > It could also be useful to apply the above patch _and_ revert
> >> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
> >> >
> >> > Paolo
> >>
> >> Yes, it solved the issue for me! (it took much time to check because
> >> most of country` debian mirrors went inconsistent by some reason)
> >>
> >> Also trivial addition:
> >>
> >> diff --git a/migration.c b/migration.c
> >> index 34f2325..65d1c88 100644
> >> --- a/migration.c
> >> +++ b/migration.c
> >> @@ -25,6 +25,7 @@
> >>  #include "qemu/thread.h"
> >>  #include "qmp-commands.h"
> >>  #include "trace.h"
> >> +#include "sysemu/cpus.h"
> >
> > And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?
> >
> > That is, test with a stock qemu.git tree and the patch sent today,
> > on this thread, to move cpu_synchronize_al

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Marcelo Tosatti

On Wed, Jul 16, 2014 at 09:35:16AM +0200, Marcin Gibuła wrote:
> >Andrey,
> >
> >Can you please provide instructions on how to create reproducible
> >environment?
> >
> >The following patch is equivalent to the original patch, for
> >the purposes of fixing the kvmclock problem.
> >
> >Perhaps it becomes easier to spot the reason for the hang you are
> >experiencing.
> 
> Marcelo,
> 
> the original reason for patch adding cpu_synchronize_all_states()
> there was because this bug affected non-migration operations as well
> -
> http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html.
> 
> Won't moving it only to migration code break these things again?

Yes - its just for debug purposes.

Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging

2014-07-16 Thread Peter Maydell

On 16 July 2014 10:17, Alex Bennée  wrote:
> On aarch64 we do:
>
> #define unsupported_encoding(s, insn)\
> do { \
> qemu_log_mask(LOG_UNIMP, \
>   "%s:%d: unsupported instruction encoding 0x%08x "  \
>   "at pc=%016" PRIx64 "\n",  \
>   __FILE__, __LINE__, insn, s->pc - 4);  \
> unallocated_encoding(s); \
> } while (0);
>
> So we signal it's unimplemented before falling through to the signal
> generating code.

Yes, but that macro is only for instructions which exist but which
QEMU doesn't implement, not for instructions which exist and
which we do have an implementation of but which we generate an
exception for because they aren't present on this particular CPU.
That's a separate issue from whether you might want to optionally
log every SIGILL a process generates.

thanks
-- PMM

[Qemu-devel] [RFC PATCH V4 3/6] icount: Make icount_time_shift available everywhere

icount_time_shift is used for calculting the delay
qemu has to sleep in order to synchronise the
host and guest clocks. Therefore, we need it in
cpu-exec.c.

Signed-off-by: Sebastian Tanase 
Tested-by: Camille Bégué 
Signed-off-by: Paolo Bonzini 
---
 cpus.c| 8 ++--
 include/qemu-common.h | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 34cc4c8..de0a8b2 100644
--- a/cpus.c
+++ b/cpus.c
@@ -105,8 +105,12 @@ static bool all_cpu_threads_idle(void)
 /* Compensate for varying guest execution speed.  */
 static int64_t qemu_icount_bias;
 static int64_t vm_clock_warp_start;
-/* Conversion factor from emulated instructions to virtual clock ticks.  */
-static int icount_time_shift;
+/* Conversion factor from emulated instructions to virtual clock ticks.
+ * icount_time_shift is defined as extern in include/qemu-common.h because
+ * it is used (in cpu-exec.c) for calculating the delay for sleeping
+ * qemu in order to align the host and virtual clocks.
+ */
+int icount_time_shift;
 /* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
 #define MAX_ICOUNT_SHIFT 10
 
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 860bb15..d47aa02 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -109,6 +109,7 @@ static inline char *realpath(const char *path, char 
*resolved_path)
 void configure_icount(QemuOpts *opts, Error **errp);
 extern int use_icount;
 extern int icount_align_option;
+extern int icount_time_shift;
 
 #include "qemu/osdep.h"
 #include "qemu/bswap.h"
-- 
2.0.0.rc2

[Qemu-devel] [RFC PATCH V4 1/6] icount: Add QemuOpts for icount

Make icount parameter use QemuOpts style options in order
to easily add other suboptions.

Signed-off-by: Sebastian Tanase 
Tested-by: Camille Bégué 
Signed-off-by: Paolo Bonzini 
---
 cpus.c| 10 +-
 include/qemu-common.h |  3 ++-
 qemu-options.hx   |  4 ++--
 qtest.c   | 13 +++--
 vl.c  | 35 ---
 5 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/cpus.c b/cpus.c
index 5e7f2cf..dcca96a 100644
--- a/cpus.c
+++ b/cpus.c
@@ -440,13 +440,21 @@ static const VMStateDescription vmstate_timers = {
 }
 };
 
-void configure_icount(const char *option)
+void configure_icount(QemuOpts *opts, Error **errp)
 {
+const char *option;
+
 seqlock_init(&timers_state.vm_clock_seqlock, NULL);
 vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
+option = qemu_opt_get(opts, "shift");
 if (!option) {
 return;
 }
+/* When using -icount shift, the shift option will be
+   misinterpreted as a boolean */
+if (strcmp(option, "on") == 0 || strcmp(option, "off") == 0) {
+error_setg(errp, "The shift option must be a number or auto");
+}
 
 icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME,
   icount_warp_rt, NULL);
diff --git a/include/qemu-common.h b/include/qemu-common.h
index ae76197..cc346ec 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include "glib-compat.h"
+#include "qemu/option.h"
 
 #ifdef _WIN32
 #include "sysemu/os-win32.h"
@@ -105,7 +106,7 @@ static inline char *realpath(const char *path, char 
*resolved_path)
 #endif
 
 /* icount */
-void configure_icount(const char *option);
+void configure_icount(QemuOpts *opts, Error **errp);
 extern int use_icount;
 
 #include "qemu/osdep.h"
diff --git a/qemu-options.hx b/qemu-options.hx
index 9e54686..143def4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3011,11 +3011,11 @@ re-inject them.
 ETEXI
 
 DEF("icount", HAS_ARG, QEMU_OPTION_icount, \
-"-icount [N|auto]\n" \
+"-icount [shift=N|auto]\n" \
 "enable virtual instruction counter with 2^N clock ticks 
per\n" \
 "instruction\n", QEMU_ARCH_ALL)
 STEXI
-@item -icount [@var{N}|auto]
+@item -icount [shift=@var{N}|auto]
 @findex -icount
 Enable virtual instruction counter.  The virtual cpu will execute one
 instruction every 2^@var{N} ns of virtual time.  If @code{auto} is specified
diff --git a/qtest.c b/qtest.c
index 04a6dc1..ef0d991 100644
--- a/qtest.c
+++ b/qtest.c
@@ -19,6 +19,9 @@
 #include "hw/irq.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/cpus.h"
+#include "qemu/config-file.h"
+#include "qemu/option.h"
+#include "qemu/error-report.h"
 
 #define MAX_IRQ 256
 
@@ -509,10 +512,16 @@ static void qtest_event(void *opaque, int event)
 }
 }
 
-int qtest_init_accel(MachineClass *mc)
+static void configure_qtest_icount(const char *options)
 {
-configure_icount("0");
+QemuOpts *opts  = qemu_opts_parse(qemu_find_opts("icount"), options, 1);
+configure_icount(opts, &error_abort);
+qemu_opts_del(opts);
+}
 
+int qtest_init_accel(MachineClass *mc)
+{
+configure_qtest_icount("0");
 return 0;
 }
 
diff --git a/vl.c b/vl.c
index 41ddcd2..103027f 100644
--- a/vl.c
+++ b/vl.c
@@ -537,6 +537,20 @@ static QemuOptsList qemu_mem_opts = {
 },
 };
 
+static QemuOptsList qemu_icount_opts = {
+.name = "icount",
+.implied_opt_name = "shift",
+.merge_lists = true,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_icount_opts.head),
+.desc = {
+{
+.name = "shift",
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
 /**
  * Get machine options
  *
@@ -2896,13 +2910,12 @@ int main(int argc, char **argv, char **envp)
 {
 int i;
 int snapshot, linux_boot;
-const char *icount_option = NULL;
 const char *initrd_filename;
 const char *kernel_filename, *kernel_cmdline;
 const char *boot_order;
 DisplayState *ds;
 int cyls, heads, secs, translation;
-QemuOpts *hda_opts = NULL, *opts, *machine_opts;
+QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL;
 QemuOptsList *olist;
 int optind;
 const char *optarg;
@@ -2967,6 +2980,7 @@ int main(int argc, char **argv, char **envp)
 qemu_add_opts(&qemu_msg_opts);
 qemu_add_opts(&qemu_name_opts);
 qemu_add_opts(&qemu_numa_opts);
+qemu_add_opts(&qemu_icount_opts);
 
 runstate_init();
 
@@ -3817,7 +3831,11 @@ int main(int argc, char **argv, char **envp)
 }
 break;
 case QEMU_OPTION_icount:
-icount_option = optarg;
+icount_opts = qemu_opts_parse(qemu_find_opts("icount"),
+  optarg, 1);
+if (!icount_opts) {
+exit(1);
+}

[Qemu-devel] [RFC PATCH V4 4/6] cpu_exec: Add sleeping algorithm

The goal is to sleep qemu whenever the guest clock
is in advance compared to the host clock (we use
the monotonic clocks). The amount of time to sleep
is calculated in the execution loop in cpu_exec.

At first, we tried to approximate at each for loop the real time elapsed
while searching for a TB (generating or retrieving from cache) and
executing it. We would then approximate the virtual time corresponding
to the number of virtual instructions executed. The difference between
these 2 values would allow us to know if the guest is in advance or delayed.
However, the function used for measuring the real time
(qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very expensive.
We had an added overhead of 13% of the total run time.

Therefore, we modified the algorithm and only take into account the
difference between the 2 clocks at the begining of the cpu_exec function.
During the for loop we try to reduce the advance of the guest only by
computing the virtual time elapsed and sleeping if necessary. The overhead
is thus reduced to 3%. Even though this method still has a noticeable
overhead, it no longer is a bottleneck in trying to achieve a better
guest frequency for which the guest clock is faster than the host one.

As for the the alignement of the 2 clocks, with the first algorithm
the guest clock was oscillating between -1 and 1ms compared to the host clock.
Using the second algorithm we notice that the guest is 5ms behind the host, 
which
is still acceptable for our use case.

The tests where conducted using fio and stress. The host machine in an i5 CPU at
3.10GHz running Debian Jessie (kernel 3.12). The guest machine is an arm 
versatile-pb
built with buildroot.

Currently, on our test machine, the lowest icount we can achieve that is 
suitable for
aligning the 2 clocks is 6. However, we observe that the IO tests (using fio) 
are
slower than the cpu tests (using stress).

Signed-off-by: Sebastian Tanase 
Tested-by: Camille Bégué 
Signed-off-by: Paolo Bonzini 
---
 cpu-exec.c | 97 ++
 1 file changed, 97 insertions(+)

diff --git a/cpu-exec.c b/cpu-exec.c
index 38e5f02..9ed6ac1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -22,6 +22,90 @@
 #include "tcg.h"
 #include "qemu/atomic.h"
 #include "sysemu/qtest.h"
+#include "qemu/timer.h"
+
+/* -icount align implementation. */
+
+typedef struct SyncClocks {
+int64_t diff_clk;
+int64_t original_instr_counter;
+} SyncClocks;
+
+#if !defined(CONFIG_USER_ONLY)
+/* Allow the guest to have a max 3ms advance.
+ * The difference between the 2 clocks could therefore
+ * oscillate around 0.
+ */
+#define VM_CLOCK_ADVANCE 300
+
+static int64_t delay_host(int64_t diff_clk)
+{
+if (diff_clk > VM_CLOCK_ADVANCE) {
+#ifndef _WIN32
+struct timespec sleep_delay, rem_delay;
+sleep_delay.tv_sec = diff_clk / 10LL;
+sleep_delay.tv_nsec = diff_clk % 10LL;
+if (nanosleep(&sleep_delay, &rem_delay) < 0) {
+diff_clk -= (sleep_delay.tv_sec - rem_delay.tv_sec) * 10LL;
+diff_clk -= sleep_delay.tv_nsec - rem_delay.tv_nsec;
+} else {
+diff_clk = 0;
+}
+#else
+Sleep(diff_clk / SCALE_MS);
+diff_clk = 0;
+#endif
+}
+return diff_clk;
+}
+
+static int64_t instr_to_vtime(int64_t instr_counter, const CPUState *cpu)
+{
+int64_t instr_exec_time;
+instr_exec_time = instr_counter -
+  (cpu->icount_extra +
+   cpu->icount_decr.u16.low);
+instr_exec_time = instr_exec_time << icount_time_shift;
+
+return instr_exec_time;
+}
+
+static void align_clocks(SyncClocks *sc, const CPUState *cpu)
+{
+if (!icount_align_option) {
+return;
+}
+sc->diff_clk += instr_to_vtime(sc->original_instr_counter, cpu);
+sc->original_instr_counter = cpu->icount_extra + cpu->icount_decr.u16.low;
+sc->diff_clk = delay_host(sc->diff_clk);
+}
+
+static void init_delay_params(SyncClocks *sc,
+  const CPUState *cpu)
+{
+static int64_t clocks_offset;
+if (!icount_align_option) {
+return;
+}
+int64_t realtime_clock_value, virtual_clock_value;
+realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+virtual_clock_value = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+/* Compute offset between the 2 clocks. */
+if (!clocks_offset) {
+clocks_offset = realtime_clock_value - virtual_clock_value;
+}
+sc->diff_clk = virtual_clock_value - realtime_clock_value + clocks_offset;
+sc->original_instr_counter = cpu->icount_extra + cpu->icount_decr.u16.low;
+}
+#else
+static void align_clocks(SyncClocks *sc, const CPUState *cpu)
+{
+}
+
+static void init_delay_params(SyncClocks *sc, const CPUState *cpu)
+{
+}
+#endif /* CONFIG USER ONLY */
 
 void cpu_loop_exit(CPUState *cpu)
 {
@@ -227,6 +311,8 @@ int cpu_exec(CPUArchState *env)
 TranslationBlock *tb;
 uint8_t *tc_ptr;

[Qemu-devel] [RFC PATCH V4 5/6] cpu_exec: Print to console if the guest is late

If the align option is enabled, we print to the user whenever
the guest clock is behind the host clock in order for he/she
to have a hint about the actual performance. The maximum
print interval is 2s and we limit the number of messages to 100.
If desired, this can be changed in cpu-exec.c

Signed-off-by: Sebastian Tanase 
Tested-by: Camille Bégué 

Conflicts:
cpu-exec.c
---
 cpu-exec.c | 39 ++-
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 9ed6ac1..b62df15 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -29,6 +29,7 @@
 typedef struct SyncClocks {
 int64_t diff_clk;
 int64_t original_instr_counter;
+int64_t realtime_clock;
 } SyncClocks;
 
 #if !defined(CONFIG_USER_ONLY)
@@ -37,6 +38,9 @@ typedef struct SyncClocks {
  * oscillate around 0.
  */
 #define VM_CLOCK_ADVANCE 300
+#define THRESHOLD_REDUCE 1.5
+#define MAX_DELAY_PRINT_RATE 2
+#define MAX_NB_PRINTS 100
 
 static int64_t delay_host(int64_t diff_clk)
 {
@@ -80,6 +84,28 @@ static void align_clocks(SyncClocks *sc, const CPUState *cpu)
 sc->diff_clk = delay_host(sc->diff_clk);
 }
 
+static void print_delay(const SyncClocks *sc)
+{
+static float threshold_delay;
+static int64_t last_realtime_clock;
+static int nb_prints;
+
+if (icount_align_option &&
+(sc->realtime_clock - last_realtime_clock) / 10LL
+>= MAX_DELAY_PRINT_RATE && nb_prints < MAX_NB_PRINTS) {
+if ((-sc->diff_clk / (float)10LL > threshold_delay) ||
+(-sc->diff_clk / (float)10LL <
+ (threshold_delay - THRESHOLD_REDUCE))) {
+threshold_delay = (-sc->diff_clk / 10LL) + 1;
+printf("Warning: The guest is now late by %.1f to %.1f seconds\n",
+   threshold_delay - 1,
+   threshold_delay);
+nb_prints++;
+last_realtime_clock = sc->realtime_clock;
+}
+}
+}
+
 static void init_delay_params(SyncClocks *sc,
   const CPUState *cpu)
 {
@@ -87,15 +113,18 @@ static void init_delay_params(SyncClocks *sc,
 if (!icount_align_option) {
 return;
 }
-int64_t realtime_clock_value, virtual_clock_value;
-realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-virtual_clock_value = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+sc->realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 /* Compute offset between the 2 clocks. */
 if (!clocks_offset) {
-clocks_offset = realtime_clock_value - virtual_clock_value;
+clocks_offset = sc->realtime_clock -
+qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
-sc->diff_clk = virtual_clock_value - realtime_clock_value + clocks_offset;
+sc->diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
+   sc->realtime_clock + clocks_offset;
 sc->original_instr_counter = cpu->icount_extra + cpu->icount_decr.u16.low;
+/* Print every 2s max if the guest is late. We limit the number
+   of printed messages to NB_PRINT_MAX(currently 100) */
+print_delay(sc);
 }
 #else
 static void align_clocks(SyncClocks *sc, const CPUState *cpu)
-- 
2.0.0.rc2

[Qemu-devel] [RFC PATCH V4 2/6] icount: Add align option to icount

The align option is used for activating the align algorithm
in order to synchronise the host clock and the guest clock.

Signed-off-by: Sebastian Tanase 
Tested-by: Camille Bégué 
---
 cpus.c| 19 ---
 include/qemu-common.h |  1 +
 qemu-options.hx   | 15 +--
 vl.c  |  4 
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/cpus.c b/cpus.c
index dcca96a..34cc4c8 100644
--- a/cpus.c
+++ b/cpus.c
@@ -443,25 +443,30 @@ static const VMStateDescription vmstate_timers = {
 void configure_icount(QemuOpts *opts, Error **errp)
 {
 const char *option;
+char *rem_str = NULL;
 
 seqlock_init(&timers_state.vm_clock_seqlock, NULL);
 vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
 option = qemu_opt_get(opts, "shift");
 if (!option) {
+if (qemu_opt_get(opts, "align") != NULL) {
+error_setg(errp, "Please specify shift option when using align");
+}
 return;
 }
-/* When using -icount shift, the shift option will be
-   misinterpreted as a boolean */
-if (strcmp(option, "on") == 0 || strcmp(option, "off") == 0) {
-error_setg(errp, "The shift option must be a number or auto");
-}
-
+icount_align_option = qemu_opt_get_bool(opts, "align", false);
 icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME,
   icount_warp_rt, NULL);
 if (strcmp(option, "auto") != 0) {
-icount_time_shift = strtol(option, NULL, 0);
+errno = 0;
+icount_time_shift = strtol(option, &rem_str, 0);
+if (errno != 0 || *rem_str != '\0' || !strlen(option)) {
+error_setg(errp, "icount: Invalid shift value");
+}
 use_icount = 1;
 return;
+} else if (icount_align_option) {
+error_setg(errp, "shift=auto and align=on are incompatible");
 }
 
 use_icount = 2;
diff --git a/include/qemu-common.h b/include/qemu-common.h
index cc346ec..860bb15 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -108,6 +108,7 @@ static inline char *realpath(const char *path, char 
*resolved_path)
 /* icount */
 void configure_icount(QemuOpts *opts, Error **errp);
 extern int use_icount;
+extern int icount_align_option;
 
 #include "qemu/osdep.h"
 #include "qemu/bswap.h"
diff --git a/qemu-options.hx b/qemu-options.hx
index 143def4..379d932 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3011,9 +3011,9 @@ re-inject them.
 ETEXI
 
 DEF("icount", HAS_ARG, QEMU_OPTION_icount, \
-"-icount [shift=N|auto]\n" \
+"-icount [shift=N|auto][,align=on|off]\n" \
 "enable virtual instruction counter with 2^N clock ticks 
per\n" \
-"instruction\n", QEMU_ARCH_ALL)
+"instruction and enable aligning the host and virtual 
clocks\n", QEMU_ARCH_ALL)
 STEXI
 @item -icount [shift=@var{N}|auto]
 @findex -icount
@@ -3026,6 +3026,17 @@ Note that while this option can give deterministic 
behavior, it does not
 provide cycle accurate emulation.  Modern CPUs contain superscalar out of
 order cores with complex cache hierarchies.  The number of instructions
 executed often has little or no correlation with actual performance.
+
+@option{align=on} will activate the delay algorithm which will try to
+to synchronise the host clock and the virtual clock. The goal is to
+have a guest running at the real frequency imposed by the shift option.
+Whenever the guest clock is behind the host clock and if
+@option{align=on} is specified then we print a messsage to the user
+to inform about the delay.
+Currently this option does not work when @option{shift} is @code{auto}.
+Note: The sync algorithm will work for those shift values for which
+the guest clock runs ahead of the host clock. Typically this happens
+when the shift value is high (how high depends on the host machine).
 ETEXI
 
 DEF("watchdog", HAS_ARG, QEMU_OPTION_watchdog, \
diff --git a/vl.c b/vl.c
index 103027f..d270070 100644
--- a/vl.c
+++ b/vl.c
@@ -183,6 +183,7 @@ uint8_t *boot_splash_filedata;
 size_t boot_splash_filedata_size;
 uint8_t qemu_extra_params_fw[2];
 
+int icount_align_option;
 typedef struct FWBootEntry FWBootEntry;
 
 struct FWBootEntry {
@@ -546,6 +547,9 @@ static QemuOptsList qemu_icount_opts = {
 {
 .name = "shift",
 .type = QEMU_OPT_STRING,
+}, {
+.name = "align",
+.type = QEMU_OPT_BOOL,
 },
 { /* end of list */ }
 },
-- 
2.0.0.rc2

[Qemu-devel] [RFC PATCH V4 6/6] monitor: Add drift info to 'info jit'

Show in 'info jit' the current delay between the host clock
and the guest clock. In addition, print the maximum advance
and delay of the guest compared to the host.

Signed-off-by: Sebastian Tanase 
Tested-by: Camille Bégué 
---
 cpu-exec.c| 19 ++-
 cpus.c| 17 +
 include/qemu-common.h |  5 +
 monitor.c |  1 +
 4 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index b62df15..055e0e3 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -109,19 +109,28 @@ static void print_delay(const SyncClocks *sc)
 static void init_delay_params(SyncClocks *sc,
   const CPUState *cpu)
 {
-static int64_t clocks_offset;
-if (!icount_align_option) {
-return;
+static int64_t realtime_clock_value;
+if (icount_align_option || !realtime_clock_value) {
+realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
-sc->realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 /* Compute offset between the 2 clocks. */
 if (!clocks_offset) {
-clocks_offset = sc->realtime_clock -
+clocks_offset = realtime_clock_value -
 qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
+if (!icount_align_option) {
+return;
+}
+sc->realtime_clock = realtime_clock_value;
 sc->diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
sc->realtime_clock + clocks_offset;
 sc->original_instr_counter = cpu->icount_extra + cpu->icount_decr.u16.low;
+if (sc->diff_clk < max_delay) {
+max_delay = sc->diff_clk;
+}
+if (sc->diff_clk > max_advance) {
+max_advance = sc->diff_clk;
+}
 /* Print every 2s max if the guest is late. We limit the number
of printed messages to NB_PRINT_MAX(currently 100) */
 print_delay(sc);
diff --git a/cpus.c b/cpus.c
index de0a8b2..3c3fb64 100644
--- a/cpus.c
+++ b/cpus.c
@@ -64,6 +64,9 @@
 #endif /* CONFIG_LINUX */
 
 static CPUState *next_cpu;
+int64_t clocks_offset;
+int64_t max_delay;
+int64_t max_advance;
 
 bool cpu_is_stopped(CPUState *cpu)
 {
@@ -1504,3 +1507,17 @@ void qmp_inject_nmi(Error **errp)
 error_set(errp, QERR_UNSUPPORTED);
 #endif
 }
+
+void dump_drift_info(FILE *f, fprintf_function cpu_fprintf)
+{
+cpu_fprintf(f, "Host - Guest clock  %ld(ms)\n",
+(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - clocks_offset -
+ qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL))/SCALE_MS);
+if (icount_align_option) {
+cpu_fprintf(f, "Max guest delay %ld(ms)\n", -max_delay/SCALE_MS);
+cpu_fprintf(f, "Max guest advance   %ld(ms)\n", max_advance/SCALE_MS);
+} else {
+cpu_fprintf(f, "Max guest delay NA\n");
+cpu_fprintf(f, "Max guest advance   NA\n");
+}
+}
diff --git a/include/qemu-common.h b/include/qemu-common.h
index d47aa02..fbc1bca 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -110,6 +110,11 @@ void configure_icount(QemuOpts *opts, Error **errp);
 extern int use_icount;
 extern int icount_align_option;
 extern int icount_time_shift;
+extern int64_t clocks_offset;
+/* drift information for info jit command */
+extern int64_t max_delay;
+extern int64_t max_advance;
+void dump_drift_info(FILE *f, fprintf_function cpu_fprintf);
 
 #include "qemu/osdep.h"
 #include "qemu/bswap.h"
diff --git a/monitor.c b/monitor.c
index 5bc70a6..cdbaa60 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1047,6 +1047,7 @@ static void do_info_registers(Monitor *mon, const QDict 
*qdict)
 static void do_info_jit(Monitor *mon, const QDict *qdict)
 {
 dump_exec_info((FILE *)mon, monitor_fprintf);
+dump_drift_info((FILE *)mon, monitor_fprintf);
 }
 
 static void do_info_history(Monitor *mon, const QDict *qdict)
-- 
2.0.0.rc2

[Qemu-devel] [RFC PATCH V4 0/6] icount: Implement delay algorithm between guest and host clocks

The icount option already implemented in QEMU allows the guest to run at a 
theoretical
frequency of 1/(2^N) GHz (N is the icount parameter). The goal of this patch is 
to have a
real guest frequency close to the one imposed by using the icount option.

The main idea behind the algorithm is that we compare the virtual monotonic 
clock and the
host monotonic clock. For big icounts (on our test machine, an i5 CPU @ 
3.10GHz, icounts
starting at 6) the guest clock will be ahead of the host clock. In this case, 
we try to
sleep QEMU for the difference between the 2 clocks. Therefore, the guest would 
have
executed for a period almost equally to the one imposed by icount. We should 
point out
that the algorithm works only for those icounts that allow the guest clock to 
be in front
of the host clock.

The first patch adds support for QemuOpts for the 'icount' parameter. It also 
adds a
suboption called 'shift' that will hold the value for 'icount'. Therefore we 
now have
-icount shift=N|auto or -icount N|auto.

The second patch adds the 'align' suboption for icount.

The third patch exports 'icount_time_shift' so that it can be used in places 
other than
cpus.c; we need it in cpu-exec.c for calculating for how long we want QEMU to 
sleep.

The forth patch implements the algorithm used for calculating the delay we want 
to sleep.
It uses the number of instructions executed by the virtual cpu and also 
'icount_time_shift'.

The fifth patch prints to the console whenever the guest clock runs behind the 
host
clock. The fastest printing speed is every 2 seconds, and we only print if the 
align option
is enabled. We also have a limit to 100 printed messages.

The sixth patch adds information about the difference between the host and 
guest clocks
(taking into account the offset) in the 'info jit' command. We also print the 
maximum
delay and advance of the guest clock compared to the host clock.

v3 -> v4

* Add better error handling for 'strtol' in patch 2
* Add 'Sleep' instead of 'nanosleep' for Windows hosts in patch 4
* Remove function pointers from patches 4 and 5

Sebastian Tanase (6):
  icount: Add QemuOpts for icount
  icount: Add align option to icount
  icount: Make icount_time_shift available everywhere
  cpu_exec: Add sleeping algorithm
  cpu_exec: Print to console if the guest is late
  monitor: Add drift info to 'info jit'

 cpu-exec.c| 135 ++
 cpus.c|  44 ++--
 include/qemu-common.h |  10 +++-
 monitor.c |   1 +
 qemu-options.hx   |  17 +--
 qtest.c   |  13 -
 vl.c  |  39 ---
 7 files changed, 241 insertions(+), 18 deletions(-)

-- 
2.0.0.rc2

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets


Il 16/07/2014 13:52, Dr. David Alan Gilbert ha scritto:

* Paolo Bonzini (pbonz...@redhat.com) wrote:

Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:


+
+/* If it's already open, return it */
+if (qfs->file->return_path) {
+return qfs->file->return_path;


Wouldn't this leave a dangling file descriptor if you call
socket_dup_return_path twice, and then close the original QEMUFile?


Hmm - how?


The problem is that there is no reference count on QEMUFile, so if you do

  f1 = open_return_path(f0);
  f2 = open_return_path(f0);
  /* now f1 == f2 */
  qemu_fclose(f1);
  /* now f2 is dangling */


I think from the way I'm using it, I can remove the optimisation, but I do
need to check.

I'm not too sure what your worry is about 'f2' in this case; I guess the caller
needs to know that it should only close the return path once - is that
your worry?


Yes.  The API is not well encapsulated; a "random" caller of 
open_return_path does not know (and cannot know) whether it should close 
the returned file or not.



I'm more nervous about dropping that one, because the current scheme
does provide a clean way of finding the forward path if you've got the
reverse (although I don't think I make use of it).


If it isn't used, why keep it?


 Source side
Forward path - written by migration thread
   : It's OK for this to be blocking, but we end up with it being
 non-blocking, and modify the socket code to emulate blocking.


This likely has a performance impact though.  The first migration thread
code drop from Juan already improved throughput a lot, even if it kept the
iothread all the time and only converted from nonblocking writes to
blocking.


Can you give some more reasoning as to why you think this will hit the
performance so much; I thought the output buffers were quite big anyway.


I don't really know, it's

Return path  - opened by main thread, read by fd_handler on main thread
   : Must be non-blocking so as not to block the main thread while
 waiting for a partially sent command.


Why can't you handle this in the migration thread (or a new postcopy thread
on the source side)?  Then it can stay blocking.


Handling it within the migration thread would make it much more complicated
(which would be bad since it's already complex enough);


Ok.  I'm not sure why it is more complicated since migration is 
essentially two-phase, one where the source drives it and one where the 
source just waits for requests, but I'll trust you on this. :)



 Destination side
Forward path - read by main thread


This must be nonblocking so that the monitor keeps responding.


Interesting, I suspect none of the code in there is set up for that at the
moment, so how does that work during migration at the moment?


It sure is. :)

On the destination side, migration is done in a coroutine (see 
process_incoming_migration) so it's all transparent.  Only 
socket_get_buffer has to know about this:


len = qemu_recv(s->fd, buf, size, 0);
if (len != -1) {
break;
}
if (socket_error() == EAGAIN) {
yield_until_fd_readable(s->fd);
} else if (socket_error() != EINTR) {
break;
}

If the socket is put in blocking mode recv will never return EAGAIN, so 
this code will only run if the socket is nonblocking.



Actually, I see I missed something here; this should be:

   Destination side
 Forward path - read by main thread, and listener thread (see the
 separate mail that described that listner thread)

and that means that once postcopy is going (and the listener thread started)
it can't block the monitor.


Ok, so the listener thread can do socket_set_block(qemu_get_fd(file)) 
once it gets its hands on the QEMUFile.



Return path  - opened by main thread, written by main thread AND postcopy
   thread (protected by rp_mutex)


When does the main thread needs to write?


Not much; the only things the main thread currently responds to are the
ReqAck (ping like) requests; those are turning out to be very useful during 
debug;
I've also got the ability for the destination to send a migration result back 
to the
source which seems useful to be able to 'fail' early.


Why can't this be done in the listener thread?  (Thus transforming it 
into a more general postcopy migration thread; later we could even 
change incoming migration from a coroutine to a thread).



If it doesn't need that, you can just switch to blocking when you process
the listen command (i.e. when the postcopy thread starts).


Why don't I just do it anyway? Prior to postcopy starting we're in the same
situation as we're in with precopy today, so can already get mainblock 
threading.


See above for the explanation.

Paolo

[Qemu-devel] [PATCH for-2.2 0/5] scsi: enable passthrough of vendor-specific commands

Right now scsi-generic is parsing the CDB, in order to compute
the expected number of bytes to be transferred.  This is necessary
if DMA is done by the HBA via scsi_req_data, but it prevents executing
vendor-specific commands via scsi-generic because we don't know how
to parse them.

If DMA is delegated to the SCSI layer via get_sg_list, we know in
advance how many bytes the guest will want to receive and we can pass
the information straight from the guest to SG_IO.  In this case, it is
unnecessary to parse the CDB to get the same information.  scsi-disk needs
it to detect underruns and overruns, but scsi-generic and scsi-block can
just ask the HBA about the transfer direction and size.

This series introduces a new parse_cdb callback in both the device and
the HBA.  The latter is called by scsi_bus_parse_cdb, which devices can
call for passthrough requests in their implementation of parse_cdb.

Tamuki-san, can you please test if these patches are okay for your
usecase?

Paolo

Paolo Bonzini (5):
  scsi-bus: prepare scsi_req_new for introduction of parse_cdb
  scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo
  scsi-block: extract scsi_block_is_passthrough
  scsi-block, scsi-generic: implement parse_cdb
  virtio-scsi: implement parse_cdb

 hw/scsi/scsi-bus.c | 68 ++
 hw/scsi/scsi-disk.c| 52 +-
 hw/scsi/scsi-generic.c |  7 ++
 hw/scsi/virtio-scsi.c  | 24 ++
 include/hw/scsi/scsi.h |  7 ++
 5 files changed, 124 insertions(+), 34 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH 3/5] scsi-block: extract scsi_block_is_passthrough

This will be used for both scsi_block_new_request and the scsi-block
implementation of parse_cdb.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-disk.c | 38 ++
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index d47ecd6..81b7276 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2501,12 +2501,8 @@ static int scsi_block_initfn(SCSIDevice *dev)
 return scsi_initfn(&s->qdev);
 }
 
-static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag,
-   uint32_t lun, uint8_t *buf,
-   void *hba_private)
+static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf)
 {
-SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d);
-
 switch (buf[0]) {
 case READ_6:
 case READ_10:
@@ -2523,9 +2519,9 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, 
uint32_t tag,
 case WRITE_VERIFY_12:
 case WRITE_VERIFY_16:
 /* If we are not using O_DIRECT, we might read stale data from the
-* host cache if writes were made using other commands than these
-* ones (such as WRITE SAME or EXTENDED COPY, etc.).  So, without
-* O_DIRECT everything must go through SG_IO.
+ * host cache if writes were made using other commands than these
+ * ones (such as WRITE SAME or EXTENDED COPY, etc.).  So, without
+ * O_DIRECT everything must go through SG_IO.
  */
 if (!(bdrv_get_flags(s->qdev.conf.bs) & BDRV_O_NOCACHE)) {
 break;
@@ -2542,13 +2538,31 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice 
*d, uint32_t tag,
  * just make scsi-block operate the same as scsi-generic for them.
  */
 if (s->qdev.type != TYPE_ROM) {
-return scsi_req_alloc(&scsi_disk_dma_reqops, &s->qdev, tag, lun,
-  hba_private);
+return false;
 }
+break;
+
+default:
+break;
 }
 
-return scsi_req_alloc(&scsi_generic_req_ops, &s->qdev, tag, lun,
-  hba_private);
+return true;
+}
+
+
+static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag,
+   uint32_t lun, uint8_t *buf,
+   void *hba_private)
+{
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d);
+
+if (scsi_block_is_passthrough(s, buf)) {
+return scsi_req_alloc(&scsi_generic_req_ops, &s->qdev, tag, lun,
+  hba_private);
+} else {
+return scsi_req_alloc(&scsi_disk_dma_reqops, &s->qdev, tag, lun,
+  hba_private);
+}
 }
 #endif
 
-- 
1.9.3

[Qemu-devel] [Bug 1342704] [NEW] error: Crash of qemu-img/qemu-io on the qcow2 image with large values in 'incompatible features' field

Public bug reported:

qemu-io and qemu-img fails with an assertion (see below) at attempt to
interact with the qcow2 image having large values in the 'incompatible
features' header field.

   util/error.c:34: error_set: Assertion `*errp == ((void *)0)' failed.


The backtrace file and the test image can be found in the attachment. The 
backtraces are for the next command: 

  qemu-img check -f qcow2 test_image


The image was generated by the qcow2 image fuzzer.

qemu.git head: 5a7348045091a2bc15

** Affects: qemu
 Importance: Undecided
 Status: New

** Attachment added: "Backtraces and the test image"
   
https://bugs.launchpad.net/bugs/1342704/+attachment/4153944/+files/artifacts.tar.gz

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1342704

Title:
  error: Crash of qemu-img/qemu-io on the qcow2 image with large values
  in 'incompatible features' field

Status in QEMU:
  New

Bug description:
  qemu-io and qemu-img fails with an assertion (see below) at attempt to
  interact with the qcow2 image having large values in the 'incompatible
  features' header field.

 util/error.c:34: error_set: Assertion `*errp == ((void *)0)'
  failed.

  
  The backtrace file and the test image can be found in the attachment. The 
backtraces are for the next command: 

qemu-img check -f qcow2 test_image

  
  The image was generated by the qcow2 image fuzzer.

  qemu.git head: 5a7348045091a2bc15

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1342704/+subscriptions

[Qemu-devel] [PATCH 2/5] scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo

These callbacks will let devices do their own request parsing, or
defer it to the bus.  If the bus does not provide an implementation,
in turn, fall back to the default parsing routine.

Swap the first two arguments to scsi_req_parse, and rename it to
scsi_req_parse_cdb, for consistency.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-bus.c | 25 ++---
 include/hw/scsi/scsi.h |  6 ++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index bf7ba8c..54fbf29 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -9,7 +9,7 @@
 
 static char *scsibus_get_dev_path(DeviceState *dev);
 static char *scsibus_get_fw_dev_path(DeviceState *dev);
-static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf);
+static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf);
 static void scsi_req_dequeue(SCSIRequest *req);
 static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len);
 static void scsi_target_free_buf(SCSIRequest *req);
@@ -54,6 +54,18 @@ static void scsi_device_destroy(SCSIDevice *s)
 }
 }
 
+int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
+   void *hba_private)
+{
+SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, dev->qdev.parent_bus);
+
+if (bus->info->parse_cdb) {
+return bus->info->parse_cdb(dev, cmd, buf, hba_private);
+} else {
+return scsi_req_parse_cdb(dev, cmd, buf);
+}
+}
+
 static SCSIRequest *scsi_device_alloc_req(SCSIDevice *s, uint32_t tag, 
uint32_t lun,
   uint8_t *buf, void *hba_private)
 {
@@ -562,8 +574,10 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
 {
 SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, d->qdev.parent_bus);
 const SCSIReqOps *ops;
+SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(d);
 SCSIRequest *req;
 SCSICommand cmd;
+int ret;
 
 if ((d->unit_attention.key == UNIT_ATTENTION ||
  bus->unit_attention.key == UNIT_ATTENTION) &&
@@ -586,8 +600,13 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
 ops = NULL;
 }
 
+if (ops != NULL || !sc->parse_cdb) {
+ret = scsi_req_parse_cdb(d, &cmd, buf);
+} else {
+ret = sc->parse_cdb(d, &cmd, buf, hba_private);
+}
 
-if (scsi_req_parse(&cmd, d, buf) != 0) {
+if (ret != 0) {
 trace_scsi_req_parse_bad(d->id, lun, tag, buf[0]);
 req = scsi_req_alloc(&reqops_invalid_opcode, d, tag, lun, hba_private);
 } else {
@@ -1188,7 +1207,7 @@ static uint64_t scsi_cmd_lba(SCSICommand *cmd)
 return lba;
 }
 
-static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf)
+static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf)
 {
 int rc;
 
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 1adb549..4a0b860 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -76,6 +76,8 @@ typedef struct SCSIDeviceClass {
 DeviceClass parent_class;
 int (*init)(SCSIDevice *dev);
 void (*destroy)(SCSIDevice *s);
+int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
+ void *hba_private);
 SCSIRequest *(*alloc_req)(SCSIDevice *s, uint32_t tag, uint32_t lun,
   uint8_t *buf, void *hba_private);
 void (*unit_attention_reported)(SCSIDevice *s);
@@ -131,6 +133,8 @@ struct SCSIReqOps {
 struct SCSIBusInfo {
 int tcq;
 int max_channel, max_target, max_lun;
+int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
+ void *hba_private);
 void (*transfer_data)(SCSIRequest *req, uint32_t arg);
 void (*complete)(SCSIRequest *req, uint32_t arg, size_t resid);
 void (*cancel)(SCSIRequest *req);
@@ -244,6 +248,8 @@ void scsi_req_free(SCSIRequest *req);
 SCSIRequest *scsi_req_ref(SCSIRequest *req);
 void scsi_req_unref(SCSIRequest *req);
 
+int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
+   void *hba_private);
 void scsi_req_build_sense(SCSIRequest *req, SCSISense sense);
 void scsi_req_print(SCSIRequest *req);
 void scsi_req_continue(SCSIRequest *req);
-- 
1.9.3

[Qemu-devel] [PATCH 1/5] scsi-bus: prepare scsi_req_new for introduction of parse_cdb

The per-SCSIDevice parse_cdb callback must not be called if the
request will go through special SCSIReqOps, so detect the special
cases early enough.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-bus.c | 44 +---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 4341754..bf7ba8c 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -561,9 +561,32 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
   uint8_t *buf, void *hba_private)
 {
 SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, d->qdev.parent_bus);
+const SCSIReqOps *ops;
 SCSIRequest *req;
 SCSICommand cmd;
 
+if ((d->unit_attention.key == UNIT_ATTENTION ||
+ bus->unit_attention.key == UNIT_ATTENTION) &&
+(buf[0] != INQUIRY &&
+ buf[0] != REPORT_LUNS &&
+ buf[0] != GET_CONFIGURATION &&
+ buf[0] != GET_EVENT_STATUS_NOTIFICATION &&
+
+ /*
+  * If we already have a pending unit attention condition,
+  * report this one before triggering another one.
+  */
+ !(buf[0] == REQUEST_SENSE && d->sense_is_ua))) {
+ops = &reqops_unit_attention;
+} else if (lun != d->lun ||
+   buf[0] == REPORT_LUNS ||
+   (buf[0] == REQUEST_SENSE && d->sense_len)) {
+ops = &reqops_target_command;
+} else {
+ops = NULL;
+}
+
+
 if (scsi_req_parse(&cmd, d, buf) != 0) {
 trace_scsi_req_parse_bad(d->id, lun, tag, buf[0]);
 req = scsi_req_alloc(&reqops_invalid_opcode, d, tag, lun, hba_private);
@@ -577,25 +600,8 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
 
 if (cmd.xfer > INT32_MAX) {
 req = scsi_req_alloc(&reqops_invalid_field, d, tag, lun, 
hba_private);
-} else if ((d->unit_attention.key == UNIT_ATTENTION ||
-   bus->unit_attention.key == UNIT_ATTENTION) &&
-  (buf[0] != INQUIRY &&
-   buf[0] != REPORT_LUNS &&
-   buf[0] != GET_CONFIGURATION &&
-   buf[0] != GET_EVENT_STATUS_NOTIFICATION &&
-
-   /*
-* If we already have a pending unit attention condition,
-* report this one before triggering another one.
-*/
-   !(buf[0] == REQUEST_SENSE && d->sense_is_ua))) {
-req = scsi_req_alloc(&reqops_unit_attention, d, tag, lun,
- hba_private);
-} else if (lun != d->lun ||
-   buf[0] == REPORT_LUNS ||
-   (buf[0] == REQUEST_SENSE && d->sense_len)) {
-req = scsi_req_alloc(&reqops_target_command, d, tag, lun,
- hba_private);
+} else if (ops) {
+req = scsi_req_alloc(ops, d, tag, lun, hba_private);
 } else {
 req = scsi_device_alloc_req(d, tag, lun, buf, hba_private);
 }
-- 
1.9.3

[Qemu-devel] [PATCH 4/5] scsi-block, scsi-generic: implement parse_cdb

The callback lets the bus provide the direction and transfer count
for passthrough commands, enabling passthrough of vendor-specific
commands.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-bus.c |  3 +--
 hw/scsi/scsi-disk.c| 14 ++
 hw/scsi/scsi-generic.c |  7 +++
 include/hw/scsi/scsi.h |  1 +
 4 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 54fbf29..6167bea 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -9,7 +9,6 @@
 
 static char *scsibus_get_dev_path(DeviceState *dev);
 static char *scsibus_get_fw_dev_path(DeviceState *dev);
-static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf);
 static void scsi_req_dequeue(SCSIRequest *req);
 static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len);
 static void scsi_target_free_buf(SCSIRequest *req);
@@ -1207,7 +1206,7 @@ static uint64_t scsi_cmd_lba(SCSICommand *cmd)
 return lba;
 }
 
-static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf)
+int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf)
 {
 int rc;
 
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 81b7276..d55521d 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2564,6 +2564,19 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice 
*d, uint32_t tag,
   hba_private);
 }
 }
+
+static int scsi_block_parse_cdb(SCSIDevice *d, SCSICommand *cmd,
+  uint8_t *buf, void *hba_private)
+{
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d);
+
+if (scsi_block_is_passthrough(s, buf)) {
+return scsi_bus_parse_cdb(&s->qdev, cmd, buf, hba_private);
+} else {
+return scsi_req_parse_cdb(&s->qdev, cmd, buf);
+}
+}
+
 #endif
 
 #define DEFINE_SCSI_DISK_PROPERTIES()\
@@ -2672,6 +2685,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, 
void *data)
 sc->init = scsi_block_initfn;
 sc->destroy  = scsi_destroy;
 sc->alloc_req= scsi_block_new_request;
+sc->parse_cdb= scsi_block_parse_cdb;
 dc->fw_name = "disk";
 dc->desc = "SCSI block device passthrough";
 dc->reset = scsi_disk_reset;
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 3733d2c..0b2ff90 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -490,6 +490,12 @@ static Property scsi_generic_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static int scsi_generic_parse_cdb(SCSIDevice *dev, SCSICommand *cmd,
+  uint8_t *buf, void *hba_private)
+{
+return scsi_bus_parse_cdb(dev, cmd, buf, hba_private);
+}
+
 static void scsi_generic_class_initfn(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -498,6 +504,7 @@ static void scsi_generic_class_initfn(ObjectClass *klass, 
void *data)
 sc->init = scsi_generic_initfn;
 sc->destroy  = scsi_destroy;
 sc->alloc_req= scsi_new_request;
+sc->parse_cdb= scsi_generic_parse_cdb;
 dc->fw_name = "disk";
 dc->desc = "pass through generic scsi device (/dev/sg*)";
 dc->reset = scsi_generic_reset;
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 4a0b860..a7a28e6 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -250,6 +250,7 @@ void scsi_req_unref(SCSIRequest *req);
 
 int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
void *hba_private);
+int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf);
 void scsi_req_build_sense(SCSIRequest *req, SCSISense sense);
 void scsi_req_print(SCSIRequest *req);
 void scsi_req_continue(SCSIRequest *req);
-- 
1.9.3

[Qemu-devel] [PATCH 5/5] virtio-scsi: implement parse_cdb

Enable passthrough of vendor-specific commands.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/virtio-scsi.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 0eb069a..e8f4c0c 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -406,6 +406,29 @@ static void virtio_scsi_command_complete(SCSIRequest *r, 
uint32_t status,
 virtio_scsi_complete_cmd_req(req);
 }
 
+static int virtio_scsi_parse_cdb(SCSIDevice *dev, SCSICommand *cmd,
+ uint8_t *buf, void *hba_private)
+{
+VirtIOSCSIReq *req = hba_private;
+
+cmd->lba = -1;
+cmd->len = MIN(VIRTIO_SCSI_CDB_SIZE, SCSI_CMD_BUF_SIZE);
+memcpy(cmd->buf, buf, cmd->len);
+
+/* Extract the direction and mode directly from the request, for
+ * host device passthrough.
+ */
+cmd->xfer = req->qsgl.size;
+if (cmd->xfer == 0) {
+cmd->mode = SCSI_XFER_NONE;
+} else if (iov_size(req->elem.in_sg, req->elem.in_num) > req->resp_size) {
+cmd->mode = SCSI_XFER_FROM_DEV;
+} else {
+cmd->mode = SCSI_XFER_TO_DEV;
+}
+return 0;
+}
+
 static QEMUSGList *virtio_scsi_get_sg_list(SCSIRequest *r)
 {
 VirtIOSCSIReq *req = r->hba_private;
@@ -658,6 +681,7 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = {
 .change = virtio_scsi_change,
 .hotplug = virtio_scsi_hotplug,
 .hot_unplug = virtio_scsi_hot_unplug,
+.parse_cdb = virtio_scsi_parse_cdb,
 .get_sg_list = virtio_scsi_get_sg_list,
 .save_request = virtio_scsi_save_request,
 .load_request = virtio_scsi_load_request,
-- 
1.9.3

Re: [Qemu-devel] [RFC PATCH V4 6/6] monitor: Add drift info to 'info jit'


Il 16/07/2014 14:18, Sebastian Tanase ha scritto:

-static int64_t clocks_offset;
-if (!icount_align_option) {
-return;
+static int64_t realtime_clock_value;


Does this really need to be static?


+if (icount_align_option || !realtime_clock_value) {
+realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
-sc->realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 /* Compute offset between the 2 clocks. */
 if (!clocks_offset) {
-clocks_offset = sc->realtime_clock -
+clocks_offset = realtime_clock_value -
 qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }


Isn't clocks_offset roughly the same as -timers_state.cpu_clock_offset? 
 If so, you could be some simplification in the code.  Feel free to 
move the TimersState struct definition to include/sysemu/cpus.h and make 
timers_state public.



+cpu_fprintf(f, "Host - Guest clock  %ld(ms)\n",
+(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - clocks_offset -
+ qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL))/SCALE_MS);


I think this is (cpu_get_clock() - cpu_get_icount()) / SCALE_MS.

Paolo

Re: [Qemu-devel] [RFC PATCH V4 0/6] icount: Implement delay algorithm between guest and host clocks


Il 16/07/2014 14:18, Sebastian Tanase ha scritto:

v3 -> v4

* Add better error handling for 'strtol' in patch 2
* Add 'Sleep' instead of 'nanosleep' for Windows hosts in patch 4
* Remove function pointers from patches 4 and 5


Hi Sebastian, I think we're getting really close.

I asked a question about clocks_offset; I think it's not necessary to 
compute it in cpu-exec.c because the same information is available 
elsewhere.  It is also probably not necessary to make it public.  Can 
you please check this?  Once this is sorted out, the patch should be 
ready for inclusion in 2.2.


Thanks for your effort!

Paolo

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Andrey Korolyov

On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti  wrote:
> On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
>> On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti  wrote:
>> > On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
>> >> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini  
>> >> wrote:
>> >> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
>> >> >
>> >> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti 
>> >> >> wrote:
>> >> >>>
>> >> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>> >> 
>> >>  On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov 
>> >>  wrote:
>> >> >
>> >> > On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah 
>> >> > wrote:
>> >> >>
>> >> >> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>> >> >>>
>> >> >>> Hello,
>> >> >>>
>> >> >>> the issue is not specific to the iothread code because generic
>> >> >>> virtio-blk also hangs up:
>> >> >>
>> >> >>
>> >> >> Do you know which version works well?  If you could bisect, that'll
>> >> >> help a lot.
>> >> >>
>> >> >> Thanks,
>> >> >> Amit
>> >> >
>> >> >
>> >> > Hi,
>> >> >
>> >> > 2.0 works definitely well. I`ll try to finish bisection today, 
>> >> > though
>> >> > every step takes about 10 minutes to complete.
>> >> 
>> >> 
>> >>  Yay! It is even outside of virtio-blk.
>> >> 
>> >>  commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>> >>  Author: Marcelo Tosatti 
>> >>  Date:   Tue Jun 3 13:34:48 2014 -0300
>> >> 
>> >>  kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
>> >>  calculation
>> >> 
>> >>  Ensure proper env->tsc value for kvmclock_current_nsec 
>> >>  calculation.
>> >> 
>> >>  Reported-by: Marcin Gibuła 
>> >>  Cc: qemu-sta...@nongnu.org
>> >>  Signed-off-by: Marcelo Tosatti 
>> >>  Signed-off-by: Paolo Bonzini 
>> >> >>>
>> >> >>>
>> >> >>> Andrey,
>> >> >>>
>> >> >>> Can you please provide instructions on how to create reproducible
>> >> >>> environment?
>> >> >>>
>> >> >>> The following patch is equivalent to the original patch, for
>> >> >>> the purposes of fixing the kvmclock problem.
>> >> >>>
>> >> >>> Perhaps it becomes easier to spot the reason for the hang you are
>> >> >>> experiencing.
>> >> >>>
>> >> >>>
>> >> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
>> >> >>> index 272a88a..feb5fc5 100644
>> >> >>> --- a/hw/i386/kvm/clock.c
>> >> >>> +++ b/hw/i386/kvm/clock.c
>> >> >>> @@ -17,7 +17,6 @@
>> >> >>>  #include "qemu/host-utils.h"
>> >> >>>  #include "sysemu/sysemu.h"
>> >> >>>  #include "sysemu/kvm.h"
>> >> >>> -#include "sysemu/cpus.h"
>> >> >>>  #include "hw/sysbus.h"
>> >> >>>  #include "hw/kvm/clock.h"
>> >> >>>
>> >> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState 
>> >> >>> *s)
>> >> >>>
>> >> >>>  cpu_physical_memory_read(kvmclock_struct_pa, &time, 
>> >> >>> sizeof(time));
>> >> >>>
>> >> >>> -assert(time.tsc_timestamp <= migration_tsc);
>> >> >>>  delta = migration_tsc - time.tsc_timestamp;
>> >> >>>  if (time.tsc_shift < 0) {
>> >> >>>  delta >>= -time.tsc_shift;
>> >> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
>> >> >>> int running,
>> >> >>>  if (s->clock_valid) {
>> >> >>>  return;
>> >> >>>  }
>> >> >>> -
>> >> >>> -cpu_synchronize_all_states();
>> >> >>>  ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>> >> >>>  if (ret < 0) {
>> >> >>>  fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
>> >> >>> strerror(ret));
>> >> >>> diff --git a/migration.c b/migration.c
>> >> >>> index 8d675b3..34f2325 100644
>> >> >>> --- a/migration.c
>> >> >>> +++ b/migration.c
>> >> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>> >> >>>  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>> >> >>>  old_vm_running = runstate_is_running();
>> >> >>>
>> >> >>> +cpu_synchronize_all_states();
>> >> >>>  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>> >> >>>  if (ret >= 0) {
>> >> >>>  qemu_file_set_rate_limit(s->file, INT64_MAX);
>> >> >
>> >> >
>> >> > It could also be useful to apply the above patch _and_ revert
>> >> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
>> >> >
>> >> > Paolo
>> >>
>> >> Yes, it solved the issue for me! (it took much time to check because
>> >> most of country` debian mirrors went inconsistent by some reason)
>> >>
>> >> Also trivial addition:
>> >>
>> >> diff --git a/migration.c b/migration.c
>> >> index 34f2325..65d1c88 100644
>> >> --- a/migration.c
>> >> +++ b/migration.c
>> >> @@ -25,6 +25,7 @@
>> >>  #include "qemu/thread.h"
>> >>  #include "qmp-commands.h"
>> >>  #include "trace.h"
>> >> +#inc

Re: [Qemu-devel] [RFC PATCH V4 0/6] icount: Implement delay algorithm between guest and host clocks

2014-07-16 Thread Luiz Capitulino

On Wed, 16 Jul 2014 14:18:00 +0200
Sebastian Tanase  wrote:

> The icount option already implemented in QEMU allows the guest to run at a 
> theoretical
> frequency of 1/(2^N) GHz (N is the icount parameter). The goal of this patch 
> is to have a
> real guest frequency close to the one imposed by using the icount option.

Is this really an RFC?

[Qemu-devel] [Bug 1342686] [NEW] Windows 95 setup hangs

2014-07-16 Thread Piotr Gliźniewicz

Public bug reported:

Windows 95 (first version, not 95A or OSR2) setup hangs on QEMU version 2.0.0.
It was compiled from the sources on Fedora 20.
Setup starts without problems, it goes through the first phase and then hangs
on the "Getting ready to run Windows 95 for the first time..." screen
(http://www.guidebookgallery.org/screenshots/win95#win95startupshutdownsplash1correctaspect).

Steps to reproduce:
qemu-img create -f qcow2 win95.img 2G
qemu -L . -machine isapc -cpu pentium -m 32 -vga std -soundhw sb16 -hda
win95.img -cdrom POLWIN95.ISO -fda BOOT95A.IMA -boot a
after this select default options everywhere. When it asks to remove the boot
floppy do "eject floppy0" and confirm.
It displays the "Getting ready to run Windows 95 for the first time..." and
hangs.

The same behavior can be observed on 2.1.0-rc2. On 1.7.1 It hangs
immediately after this screen, it hangs on displaying a underscore
cursor.

I managed to complete the setup on QEMU 1.6.2.

** Affects: qemu
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1342686

Title:
Windows 95 setup hangs

Status in QEMU:
New

Bug description:
Windows 95 (first version, not 95A or OSR2) setup hangs on QEMU version
2.0.0. It was compiled from the sources on Fedora 20.
Setup starts without problems, it goes through the first phase and then hangs
on the "Getting ready to run Windows 95 for the first time..." screen
(http://www.guidebookgallery.org/screenshots/win95#win95startupshutdownsplash1correctaspect).

The same behavior can be observed on 2.1.0-rc2. On 1.7.1 It hangs
immediately after this screen, it hangs on displaying a underscore
cursor.

I managed to complete the setup on QEMU 1.6.2.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1342686/+subscriptions

Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-07-16 Thread Konrad Rzeszutek Wilk

On Thu, Jul 03, 2014 at 11:27:40PM +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 03, 2014 at 12:09:28PM -0700, Jesse Barnes wrote:
> > On Thu, 3 Jul 2014 14:26:12 -0400
> > Konrad Rzeszutek Wilk  wrote:
> > 
> > > On Thu, Jul 03, 2014 at 10:32:12AM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 02, 2014 at 12:23:37PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > > On Wed, Jul 02, 2014 at 04:50:15PM +0200, Paolo Bonzini wrote:
> > > > > > Il 02/07/2014 16:00, Konrad Rzeszutek Wilk ha scritto:
> > > > > > >With this long thread I lost a bit context about the challenges
> > > > > > >that exists. But let me try summarizing it here - which will 
> > > > > > >hopefully
> > > > > > >get some consensus.
> > > > > > >
> > > > > > >1). Fix IGD hardware to not use Southbridge magic addresses.
> > > > > > >We can moan and moan but I doubt it is going to change.
> > > > > > 
> > > > > > There are two problems:
> > > > > > 
> > > > > > - Northbridge (i.e. MCH i.e. PCI host bridge) configuration space 
> > > > > > addresses
> > > > > 
> > > > > Right. So in  drivers/gpu/drm/i915/i915_dma.c:
> > > > > 1135 #define MCHBAR_I915 0x44 
> > > > >
> > > > > 1136 #define MCHBAR_I965 0x48 
> > > > > 
> > > > > 1147 int reg = INTEL_INFO(dev)->gen >= 4 ? MCHBAR_I965 : 
> > > > > MCHBAR_I915;
> > > > > 1152 if (INTEL_INFO(dev)->gen >= 4)   
> > > > >
> > > > > 1153 pci_read_config_dword(dev_priv->bridge_dev, reg 
> > > > > + 4, &temp_hi); 
> > > > > 1154 pci_read_config_dword(dev_priv->bridge_dev, reg, 
> > > > > &temp_lo); 
> > > > > 1155 mchbar_addr = ((u64)temp_hi << 32) | temp_lo;
> > > > > 
> > > > > 
> > > > > and
> > > > > 
> > > > > 1139 #define DEVEN_REG 0x54   
> > > > >
> > > > > 
> > > > > 1193 int mchbar_reg = INTEL_INFO(dev)->gen >= 4 ? MCHBAR_I965 
> > > > > : MCHBAR_I915; 
> > > > > 1202 if (IS_I915G(dev) || IS_I915GM(dev)) {   
> > > > >
> > > > > 1203 pci_read_config_dword(dev_priv->bridge_dev, 
> > > > > DEVEN_REG, &temp);  
> > > > > 1204 enabled = !!(temp & DEVEN_MCHBAR_EN);
> > > > >
> > > > > 1205 } else { 
> > > > >
> > > > > 1206 pci_read_config_dword(dev_priv->bridge_dev, 
> > > > > mchbar_reg, &temp); 
> > > > > 1207 enabled = temp & 1;  
> > > > >
> > > > > 1208 }
> > > > > > 
> > > > > > - Southbridge (i.e. PCH i.e. ISA bridge) vendor/device ID; some 
> > > > > > versions of
> > > > > > the driver identify it by class, some versions identify it by slot 
> > > > > > (1f.0).
> > > > > 
> > > > > Right, So in  drivers/gpu/drm/i915/i915_drv.c the giant 
> > > > > intel_detect_pch
> > > > > which sets the pch_type based on :
> > > > > 
> > > > >  432 if (pch->vendor == PCI_VENDOR_ID_INTEL) {
> > > > >
> > > > >  433 unsigned short id = pch->device & 
> > > > > INTEL_PCH_DEVICE_ID_MASK;
> > > > >  434 dev_priv->pch_id = id;   
> > > > >
> > > > >  435  
> > > > >
> > > > >  436 if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) 
> > > > > { 
> > > > > 
> > > > > It checks for 0x3b00, 0x1c00, 0x1e00, 0x8c00 and 0x9c00.
> > > > > The INTEL_PCH_DEVICE_ID_MASK is 0xff00
> > > > > > 
> > > > > > To solve the first, make a new machine type, PIIX4-based, and pass 
> > > > > > through
> > > > > > the registers you need.  The patch must document _exactly_ why the 
> > > > > > registers
> > > > > > are safe to pass.  If they are not reserved on PIIX4, the patch must
> > > > > > document what the same offsets mean on PIIX4, and why it's sensible 
> > > > > > to
> > > > > > assume that firmware for virtual machine will not read/write them.  
> > > > > > Bonus
> > > > > > point for also documenting the same for Q35.
> > > > > 
> > > > > OK. They look to be related to setting up an MBAR , but I don't 
> > > > > understand
> > > > > why it is needed. Hopefully some of the i915 folks CC-ed here can 
> > > > > answer.
> > > > 
> > > > In particular, I think setting up MBAR should move out of i915 and 
> > > > become
> > > > the bridge driver tweak on linux and windows.
> > > 
> > > That is an excellent idea.
> > > 
> > > However I am still curious to _why_ it has to be done in the first place.
> > 
> > The display part of the GPU is partly on the PCH, and it's possible to
> > mix & match them a bit, so we have this detection code to figure things
> > o

[Qemu-devel] [PATCH 0/2] Add SP810 to Versatile Express boards

2014-07-16 Thread Fabian Aggeler

Hi,

The Versatile Express emulation in QEMU currently does not
have a model of the SP810 used in real hardware. The registers
provided by this System Controller can be used to set the
frequency of the SP804 timers. On newer releases of the board
the SP804 is set to 32kHz by default and has to be increased
by writing to the SCCTRL. See: 
http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/038696.html

Since QEMU sets the SP804 timers to 1MHz by default this should
be reflected in the SCCTRL register. These two patches add a
model of the SP804 to the vexpress-boards and sets the SCCTRL 
register so that software that queries this register behaves 
as expected.

Feedback is greatly appreciated!

Best,
Fabian

Fabian Aggeler (2):
  hw/misc/arm_sp810: Create SP810 device
  hw/arm/vexpress: add SP810 to the vexpress

 default-configs/arm-softmmu.mak |   1 +
 hw/arm/vexpress.c   |  11 ++-
 hw/misc/Makefile.objs   |   1 +
 hw/misc/arm_sp810.c | 184 
 4 files changed, 195 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/arm_sp810.c

-- 
1.8.3.2

[Qemu-devel] [PATCH 1/2] hw/misc/arm_sp810: Create SP810 device

2014-07-16 Thread Fabian Aggeler

This adds a device model for the PrimeXsys System Controller (SP810)
which is present in the Versatile Express motherboards. It is
so far read-only but allows to set the SCCTRL register.

Signed-off-by: Fabian Aggeler 
---
 default-configs/arm-softmmu.mak |   1 +
 hw/misc/Makefile.objs   |   1 +
 hw/misc/arm_sp810.c | 184 
 3 files changed, 186 insertions(+)
 create mode 100644 hw/misc/arm_sp810.c

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index f3513fa..67ba99a 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -55,6 +55,7 @@ CONFIG_PL181=y
 CONFIG_PL190=y
 CONFIG_PL310=y
 CONFIG_PL330=y
+CONFIG_SP810=y
 CONFIG_CADENCE=y
 CONFIG_XGMAC=y
 CONFIG_EXYNOS4=y
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 979e532..49d023b 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -13,6 +13,7 @@ common-obj-$(CONFIG_PL310) += arm_l2x0.o
 common-obj-$(CONFIG_INTEGRATOR_DEBUG) += arm_integrator_debug.o
 common-obj-$(CONFIG_A9SCU) += a9scu.o
 common-obj-$(CONFIG_ARM11SCU) += arm11scu.o
+common-obj-$(CONFIG_SP810) += arm_sp810.o
 
 # PKUnity SoC devices
 common-obj-$(CONFIG_PUV3) += puv3_pm.o
diff --git a/hw/misc/arm_sp810.c b/hw/misc/arm_sp810.c
new file mode 100644
index 000..82cbcf0
--- /dev/null
+++ b/hw/misc/arm_sp810.c
@@ -0,0 +1,184 @@
+/*
+ * ARM PrimeXsys System Controller (SP810)
+ *
+ * Copyright (C) 2014 Fabian Aggeler 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include "hw/hw.h"
+#include "hw/sysbus.h"
+
+#define LOCK_VALUE 0xa05f
+
+#define TYPE_ARM_SP810 "arm_sp810"
+#define ARM_SP810(obj) OBJECT_CHECK(arm_sp810_state, (obj), TYPE_ARM_SP810)
+
+typedef struct {
+SysBusDevice parent_obj;
+MemoryRegion iomem;
+
+uint32_t sc_ctrl; /* SCCTRL */
+} arm_sp810_state;
+
+#define TIMEREN0SEL (1 << 15)
+#define TIMEREN1SEL (1 << 17)
+#define TIMEREN2SEL (1 << 19)
+#define TIMEREN3SEL (1 << 21)
+
+static const VMStateDescription vmstate_arm_sysctl = {
+.name = "arm_sp810",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(sc_ctrl, arm_sp810_state),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static uint64_t arm_sp810_read(void *opaque, hwaddr offset, unsigned size)
+{
+arm_sp810_state *s = (arm_sp810_state *)opaque;
+
+switch (offset) {
+case 0x000: /* SCCTRL */
+return s->sc_ctrl;
+case 0x004: /* SCSYSSTAT */
+case 0x008: /* SCIMCTRL */
+case 0x00C: /* SCIMSTAT */
+case 0x010: /* SCXTALCTRL */
+case 0x014: /* SCPLLCTRL */
+case 0x018: /* SCPLLFCTRL */
+case 0x01C: /* SCPERCTRL0 */
+case 0x020: /* SCPERCTRL1 */
+case 0x024: /* SCPEREN */
+case 0x028: /* SCPERDIS */
+case 0x02C: /* SCPERCLKEN */
+case 0x030: /* SCPERSTAT */
+case 0xEE0: /* SCSYSID0 */
+case 0xEE4: /* SCSYSID1 */
+case 0xEE8: /* SCSYSID2 */
+case 0xEEC: /* SCSYSID3 */
+case 0xF00: /* SCITCR */
+case 0xF04: /* SCITIR0 */
+case 0xF08: /* SCITIR1 */
+case 0xF0C: /* SCITOR */
+case 0xF10: /* SCCNTCTRL */
+case 0xF14: /* SCCNTDATA */
+case 0xF18: /* SCCNTSTEP */
+case 0xFE0: /* SCPERIPHID0 */
+case 0xFE4: /* SCPERIPHID1 */
+case 0xFE8: /* SCPERIPHID2 */
+case 0xFEC: /* SCPERIPHID3 */
+case 0xFF0: /* SCPCELLID0 */
+case 0xFF4: /* SCPCELLID1 */
+case 0xFF8: /* SCPCELLID2 */
+case 0xFFC: /* SCPCELLID3 */
+default:
+qemu_log_mask(LOG_UNIMP,
+"arm_sp810_read: Register not yet implemented: "
+"0x%x\n",
+(int)offset);
+return 0;
+}
+}
+
+
+
+static void arm_sp810_write(void *opaque, hwaddr offset,
+ uint64_t val, unsigned size)
+{
+switch (offset) {
+case 0x000: /* SCCTRL */
+case 0x004: /* SCSYSSTAT */
+case 0x008: /* SCIMCTRL */
+case 0x00C: /* SCIMSTAT */
+case 0x010: /* SCXTALCTRL */
+case 0x014: /* SCPLLCTRL */
+case 0x018: /* SCPLLFCTRL */
+case 0x01C: /* SCPERCTRL0 */
+case 0x020: /* SCPERCTRL1 */
+case 0x024: /* SCPEREN */
+case 0x028: /* SCPERDIS */
+case 0x02C: /* SCPERCLKEN */
+case 0x030: /* SCPERSTAT */
+case 0xEE0: /* SCSYSID0 */
+case 0xEE4: /* SCSYSID1 */
+case 0xEE8: /* SCSYSID2 */
+case 0xEEC: /* SCSYSID3 */
+case 0xF00: /* SCITCR */
+case 0xF04: /* SCITIR0 */
+case 0xF08: /* SCITIR1 */
+case 0xF0C: /* SCITOR

[Qemu-devel] [PATCH 2/2] hw/arm/vexpress: add SP810 to the vexpress

2014-07-16 Thread Fabian Aggeler

The SP810, which is present in the Versatile Express motherboards,
allows to set the timing reference to either REFCLK or TIMCLK.
QEMU currently sets the SP804 timer to 1MHz by default. To reflect
this, we set the TimerEn0Sel, TimerEn1Sel, TimerEn2Sel, and
TimerEn3Sel of the system control register (SCCTRL) to TIMCLK (1).

Signed-off-by: Fabian Aggeler 
---
 hw/arm/vexpress.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
index a88732c..b96c3fd 100644
--- a/hw/arm/vexpress.c
+++ b/hw/arm/vexpress.c
@@ -512,7 +512,7 @@ static pflash_t *ve_pflash_cfi01_register(hwaddr base, 
const char *name,
 static void vexpress_common_init(VEDBoardInfo *daughterboard,
  MachineState *machine)
 {
-DeviceState *dev, *sysctl, *pl041;
+DeviceState *dev, *sysctl, *pl041, *sp810;
 qemu_irq pic[64];
 uint32_t sys_id;
 DriveInfo *dinfo;
@@ -575,7 +575,14 @@ static void vexpress_common_init(VEDBoardInfo 
*daughterboard,
 qdev_init_nofail(sysctl);
 sysbus_mmio_map(SYS_BUS_DEVICE(sysctl), 0, map[VE_SYSREGS]);
 
-/* VE_SP810: not modelled */
+/* VE_SP810 */
+sp810 = qdev_create(NULL, "arm_sp810");
+/* SP804 is already running at 1MHz (TIMCLK) so SCCTRL TimerEnXSel=1 */
+qdev_prop_set_uint32(sp810, "sc-ctrl", (1 << 15) | (1 << 17)
+  | (1 << 19) | (1 << 21));
+qdev_init_nofail(sp810);
+sysbus_mmio_map(SYS_BUS_DEVICE(sp810), 0, map[VE_SP810]);
+
 /* VE_SERIALPCI: not modelled */
 
 pl041 = qdev_create(NULL, "pl041");
-- 
1.8.3.2

Re: [Qemu-devel] [PATCH 2/2] hw/arm/vexpress: add SP810 to the vexpress

2014-07-16 Thread Alex Bennée


Fabian Aggeler writes:

> The SP810, which is present in the Versatile Express motherboards,
> allows to set the timing reference to either REFCLK or TIMCLK.
> QEMU currently sets the SP804 timer to 1MHz by default. To reflect
> this, we set the TimerEn0Sel, TimerEn1Sel, TimerEn2Sel, and
> TimerEn3Sel of the system control register (SCCTRL) to TIMCLK (1).
>
> Signed-off-by: Fabian Aggeler 
> ---
>  hw/arm/vexpress.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
> index a88732c..b96c3fd 100644
> --- a/hw/arm/vexpress.c
> +++ b/hw/arm/vexpress.c
> @@ -512,7 +512,7 @@ static pflash_t *ve_pflash_cfi01_register(hwaddr base, 
> const char *name,
>  static void vexpress_common_init(VEDBoardInfo *daughterboard,
>   MachineState *machine)
>  {
> -DeviceState *dev, *sysctl, *pl041;
> +DeviceState *dev, *sysctl, *pl041, *sp810;
>  qemu_irq pic[64];
>  uint32_t sys_id;
>  DriveInfo *dinfo;
> @@ -575,7 +575,14 @@ static void vexpress_common_init(VEDBoardInfo 
> *daughterboard,
>  qdev_init_nofail(sysctl);
>  sysbus_mmio_map(SYS_BUS_DEVICE(sysctl), 0, map[VE_SYSREGS]);
>  
> -/* VE_SP810: not modelled */
> +/* VE_SP810 */
> +sp810 = qdev_create(NULL, "arm_sp810");
> +/* SP804 is already running at 1MHz (TIMCLK) so SCCTRL TimerEnXSel=1 */
> +qdev_prop_set_uint32(sp810, "sc-ctrl", (1 << 15) | (1 << 17)
> +  | (1 << 19) | (1 << 21));


Could the #defines in the first patch be moved into a header and used
here rather than manually  setting these bits?

-- 
Alex Bennée

[Qemu-devel] [PATCH for-2.1 0/2] raw-posix: Fail gracefully if no working alignment is found

Kevin Wolf (2):
  block: Add Error argument to bdrv_refresh_limits()
  raw-posix: Fail gracefully if no working alignment is found

 block.c   | 33 +++--
 block/iscsi.c |  3 +--
 block/qcow2.c |  4 +---
 block/qed.c   |  4 +---
 block/raw-posix.c | 39 ---
 block/raw_bsd.c   |  3 +--
 block/stream.c|  2 +-
 block/vmdk.c  |  4 +---
 include/block/block.h |  2 +-
 include/block/block_int.h |  2 +-
 10 files changed, 59 insertions(+), 37 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH 1/2] block: Add Error argument to bdrv_refresh_limits()

Signed-off-by: Kevin Wolf 
---
 block.c   | 33 +++--
 block/iscsi.c |  3 +--
 block/qcow2.c |  4 +---
 block/qed.c   |  4 +---
 block/raw-posix.c |  4 +---
 block/raw_bsd.c   |  3 +--
 block/stream.c|  2 +-
 block/vmdk.c  |  4 +---
 include/block/block.h |  2 +-
 include/block/block_int.h |  2 +-
 10 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/block.c b/block.c
index 3e252a2..8cf519b 100644
--- a/block.c
+++ b/block.c
@@ -508,19 +508,24 @@ int bdrv_create_file(const char *filename, QemuOpts 
*opts, Error **errp)
 return ret;
 }
 
-int bdrv_refresh_limits(BlockDriverState *bs)
+void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BlockDriver *drv = bs->drv;
+Error *local_err = NULL;
 
 memset(&bs->bl, 0, sizeof(bs->bl));
 
 if (!drv) {
-return 0;
+return;
 }
 
 /* Take some limits from the children as a default */
 if (bs->file) {
-bdrv_refresh_limits(bs->file);
+bdrv_refresh_limits(bs->file, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
 bs->bl.opt_transfer_length = bs->file->bl.opt_transfer_length;
 bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
 } else {
@@ -528,7 +533,11 @@ int bdrv_refresh_limits(BlockDriverState *bs)
 }
 
 if (bs->backing_hd) {
-bdrv_refresh_limits(bs->backing_hd);
+bdrv_refresh_limits(bs->backing_hd, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
 bs->bl.opt_transfer_length =
 MAX(bs->bl.opt_transfer_length,
 bs->backing_hd->bl.opt_transfer_length);
@@ -539,10 +548,8 @@ int bdrv_refresh_limits(BlockDriverState *bs)
 
 /* Then let the driver override it */
 if (drv->bdrv_refresh_limits) {
-return drv->bdrv_refresh_limits(bs);
+drv->bdrv_refresh_limits(bs, errp);
 }
-
-return 0;
 }
 
 /*
@@ -993,7 +1000,13 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
 goto free_and_fail;
 }
 
-bdrv_refresh_limits(bs);
+bdrv_refresh_limits(bs, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto free_and_fail;
+}
+
 assert(bdrv_opt_mem_align(bs) != 0);
 assert((bs->request_alignment != 0) || bs->sg);
 return 0;
@@ -1154,7 +1167,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 bdrv_op_unblock(bs->backing_hd, BLOCK_OP_TYPE_COMMIT,
 bs->backing_blocker);
 out:
-bdrv_refresh_limits(bs);
+bdrv_refresh_limits(bs, NULL);
 }
 
 /*
@@ -1778,7 +1791,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
   BDRV_O_CACHE_WB);
 reopen_state->bs->read_only = !(reopen_state->flags & BDRV_O_RDWR);
 
-bdrv_refresh_limits(reopen_state->bs);
+bdrv_refresh_limits(reopen_state->bs, NULL);
 }
 
 /*
diff --git a/block/iscsi.c b/block/iscsi.c
index f3e83e2..a7bb697 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1450,7 +1450,7 @@ static void iscsi_close(BlockDriverState *bs)
 memset(iscsilun, 0, sizeof(IscsiLun));
 }
 
-static int iscsi_refresh_limits(BlockDriverState *bs)
+static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 IscsiLun *iscsilun = bs->opaque;
 
@@ -1475,7 +1475,6 @@ static int iscsi_refresh_limits(BlockDriverState *bs)
 }
 bs->bl.opt_transfer_length = sector_lun2qemu(iscsilun->bl.opt_xfer_len,
  iscsilun);
-return 0;
 }
 
 /* Since iscsi_open() ignores bdrv_flags, there is nothing to do here in
diff --git a/block/qcow2.c b/block/qcow2.c
index b0faa69..445ead4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -855,13 +855,11 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 return ret;
 }
 
-static int qcow2_refresh_limits(BlockDriverState *bs)
+static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVQcowState *s = bs->opaque;
 
 bs->bl.write_zeroes_alignment = s->cluster_sectors;
-
-return 0;
 }
 
 static int qcow2_set_key(BlockDriverState *bs, const char *key)
diff --git a/block/qed.c b/block/qed.c
index cd4872b..7944832 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -528,13 +528,11 @@ out:
 return ret;
 }
 
-static int bdrv_qed_refresh_limits(BlockDriverState *bs)
+static void bdrv_qed_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVQEDState *s = bs->opaque;
 
 bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
-
-return 0;
 }
 
 /* We have nothing to do for QED reopen, stubs just return
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2bcc73d..ef497b2 100644
--- a/blo

[Qemu-devel] [PATCH 2/2] raw-posix: Fail gracefully if no working alignment is found

If qemu couldn't find out what O_DIRECT alignment to use with a given
file, it would run into assert(bdrv_opt_mem_align(bs) != 0); in block.c
and confuse users. This adds a more descriptive error message for such
cases.

Signed-off-by: Kevin Wolf 
---
 block/raw-posix.c | 35 +++
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index ef497b2..8e9758e 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -221,7 +221,7 @@ static int raw_normalize_devicepath(const char **filename)
 }
 #endif
 
-static void raw_probe_alignment(BlockDriverState *bs)
+static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 char *buf;
@@ -240,24 +240,24 @@ static void raw_probe_alignment(BlockDriverState *bs)
 s->buf_align = 0;
 
 #ifdef BLKSSZGET
-if (ioctl(s->fd, BLKSSZGET, §or_size) >= 0) {
+if (ioctl(fd, BLKSSZGET, §or_size) >= 0) {
 bs->request_alignment = sector_size;
 }
 #endif
 #ifdef DKIOCGETBLOCKSIZE
-if (ioctl(s->fd, DKIOCGETBLOCKSIZE, §or_size) >= 0) {
+if (ioctl(fd, DKIOCGETBLOCKSIZE, §or_size) >= 0) {
 bs->request_alignment = sector_size;
 }
 #endif
 #ifdef DIOCGSECTORSIZE
-if (ioctl(s->fd, DIOCGSECTORSIZE, §or_size) >= 0) {
+if (ioctl(fd, DIOCGSECTORSIZE, §or_size) >= 0) {
 bs->request_alignment = sector_size;
 }
 #endif
 #ifdef CONFIG_XFS
 if (s->is_xfs) {
 struct dioattr da;
-if (xfsctl(NULL, s->fd, XFS_IOC_DIOINFO, &da) >= 0) {
+if (xfsctl(NULL, fd, XFS_IOC_DIOINFO, &da) >= 0) {
 bs->request_alignment = da.d_miniosz;
 /* The kernel returns wrong information for d_mem */
 /* s->buf_align = da.d_mem; */
@@ -270,7 +270,7 @@ static void raw_probe_alignment(BlockDriverState *bs)
 size_t align;
 buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
 for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
-if (pread(s->fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
+if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
 s->buf_align = align;
 break;
 }
@@ -282,13 +282,18 @@ static void raw_probe_alignment(BlockDriverState *bs)
 size_t align;
 buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
 for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
-if (pread(s->fd, buf, align, 0) >= 0) {
+if (pread(fd, buf, align, 0) >= 0) {
 bs->request_alignment = align;
 break;
 }
 }
 qemu_vfree(buf);
 }
+
+if (!s->buf_align || !bs->request_alignment) {
+error_setg(errp, "Could not find working O_DIRECT alignment. "
+ "Try cache.direct=off.");
+}
 }
 
 static void raw_parse_flags(int bdrv_flags, int *open_flags)
@@ -505,6 +510,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 BDRVRawState *s;
 BDRVRawReopenState *raw_s;
 int ret = 0;
+Error *local_err = NULL;
 
 assert(state != NULL);
 assert(state->bs != NULL);
@@ -577,6 +583,19 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 ret = -1;
 }
 }
+
+/* Fail already reopen_prepare() if we can't get a working O_DIRECT
+ * alignment with the new fd. */
+if (raw_s->fd != -1) {
+raw_probe_alignment(state->bs, raw_s->fd, &local_err);
+if (local_err) {
+qemu_close(raw_s->fd);
+raw_s->fd = -1;
+error_propagate(errp, local_err);
+ret = -EINVAL;
+}
+}
+
 return ret;
 }
 
@@ -619,7 +638,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error 
**errp)
 {
 BDRVRawState *s = bs->opaque;
 
-raw_probe_alignment(bs);
+raw_probe_alignment(bs, s->fd, errp);
 bs->bl.opt_mem_alignment = s->buf_align;
 }
 
-- 
1.8.3.1

[Qemu-devel] [PATCH qemu] i386, linux-headers: Add support for kvm_get_rng_seed

2014-07-16 Thread Andy Lutomirski

This updates x86's kvm_para.h for the feature bit definition and
target-i386/cpu.c for the feature name and default.

Signed-off-by: Andy Lutomirski 
---
 linux-headers/asm-x86/kvm_para.h | 2 ++
 target-i386/cpu.c| 5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index e41c5c1..a9b27ce 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 8fd1497..4ea7e6c 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -236,7 +236,7 @@ static const char *ext4_feature_name[] = {
 static const char *kvm_feature_name[] = {
 "kvmclock", "kvm_nopiodelay", "kvm_mmu", "kvmclock",
 "kvm_asyncpf", "kvm_steal_time", "kvm_pv_eoi", "kvm_pv_unhalt",
-NULL, NULL, NULL, NULL,
+"kvm_get_rng_seed", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -368,7 +368,8 @@ static uint32_t kvm_default_features[FEATURE_WORDS] = {
 (1 << KVM_FEATURE_ASYNC_PF) |
 (1 << KVM_FEATURE_STEAL_TIME) |
 (1 << KVM_FEATURE_PV_EOI) |
-(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT),
+(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1 << KVM_FEATURE_GET_RNG_SEED),
 [FEAT_1_ECX] = CPUID_EXT_X2APIC,
 };
 
-- 
1.9.3

Re: [Qemu-devel] [musl] Re: Bogus struct stat64 for qemu-microblaze (user emulation)?

2014-07-16 Thread Rich Felker

On Wed, Jul 16, 2014 at 09:36:23AM +0100, Peter Maydell wrote:
> On 16 July 2014 05:02, Rich Felker  wrote:
> > The qemu-microblaze definition of struct stat64 seems to mismatch the
> > kernel definition, which is using asm-generic/stat.h. See:
> >
> > http://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/syscall_defs.h;h=c9e6323905486452f518102bf40ba73143c9d601;hb=HEAD#l1469
> > http://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/syscall.c;h=a50229d0d72fc68966515fcf2bc308b833a3c032;hb=HEAD#l4949
> >
> > This seems to be causing a truncated-to-32-bit inode number to be
> > stored in the location where st_ino should reside, and a spurious copy
> > of the inode number to be written in a unused slot at the end of the
> > structure.
> 
> Sounds quite plausible -- we've had issues with other archs
> not having correct stat struct definitions in QEMU. I don't
> suppose anybody's done much testing of the microblaze
> linux-user code.

The bug seems to have been introduced here.

http://git.qemu.org/?p=qemu.git;a=commitdiff;h=a523eb06ec3fb2f4f4f4d362bb23704811d11379

I'm CC'ing the author/committer in case he has any input on why he did
this.

> > Is my analysis correct? Stefan Kristiansson and I found this while
> > working on the or1k port of musl libc, where it seems our structure
> > for the existing microblaze port is wrongly aligned with the qemu
> > definition rather than the definition the real kernel is using. Before
> > I try correcting this on our side, I want to make sure we're working
> > with the right version.
> 
> I would definitely trust the kernel definition, not QEMU's!

Yes.

Rich

Re: [Qemu-devel] [PATCH qemu] i386, linux-headers: Add support for kvm_get_rng_seed


Il 16/07/2014 17:52, Andy Lutomirski ha scritto:

This updates x86's kvm_para.h for the feature bit definition and
target-i386/cpu.c for the feature name and default.

Signed-off-by: Andy Lutomirski 


Thanks, looks good---assuming the kernel side will make it into 3.17, 
I'll sync the headers once 3.17 is released and then apply the patch. 
As mentioned in kvm@ someone will have to add the pc-2.2 machine type too.


Paolo


---
 linux-headers/asm-x86/kvm_para.h | 2 ++
 target-i386/cpu.c| 5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index e41c5c1..a9b27ce 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8

 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05

 struct kvm_steal_time {
__u64 steal;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 8fd1497..4ea7e6c 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -236,7 +236,7 @@ static const char *ext4_feature_name[] = {
 static const char *kvm_feature_name[] = {
 "kvmclock", "kvm_nopiodelay", "kvm_mmu", "kvmclock",
 "kvm_asyncpf", "kvm_steal_time", "kvm_pv_eoi", "kvm_pv_unhalt",
-NULL, NULL, NULL, NULL,
+"kvm_get_rng_seed", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -368,7 +368,8 @@ static uint32_t kvm_default_features[FEATURE_WORDS] = {
 (1 << KVM_FEATURE_ASYNC_PF) |
 (1 << KVM_FEATURE_STEAL_TIME) |
 (1 << KVM_FEATURE_PV_EOI) |
-(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT),
+(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1 << KVM_FEATURE_GET_RNG_SEED),
 [FEAT_1_ECX] = CPUID_EXT_X2APIC,
 };

[Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases (was: [ANNOUNCE] QEMU 2.1.0-rc2 is now available)

2014-07-16 Thread Stefan Weil

Am 16.07.2014 02:26, schrieb Michael Roth:
> Hello,
>
> On behalf of the QEMU Team, I'd like to announce the availability of the
> third release candidate for the QEMU 2.1 release.  This release is meant
> for testing purposes and should not be used in a production environment.
>
>   http://wiki.qemu.org/download/qemu-2.1.0-rc2.tar.bz2
>
> You can help improve the quality of the QEMU 2.1 release by testing this
> release and reporting bugs on Launchpad:
>
>   https://bugs.launchpad.net/qemu/

Hi,

a recent commit (e49ab19fcaa617ad6cdfe1ac401327326b6a2552) increased the
requirements for libiscsi from 1.4.0 to 1.9.0. From a user's point of
view, this change results in a regression for Debian and similar Linux
distributions: QEMU no longer includes iscsi features.

In this special case, the current Debian stable includes QEMU packages
with libiscsi 1.4, but new builds won't include it because that version
is now too old. Debian testing includes a brand new libiscsi, but it
does not include libiscsi.pc, so pkg-config won't know that it is
available and configure will disable libiscsi. I have a patch which
fixes this, so QEMU for Debian testing could include libiscsi again.

Is a feature regression like this one acceptable? Do we need additional
testing (maybe run the build bots with --enable-xxx, so builds fail when
xxx no longer works)?

Regards
Stefan

[Qemu-devel] [PATCH RFC 00/14] dataplane: performance optimization and multi virtqueue

Hi,

These patches bring up below 4 changes:

- introduce selective coroutine bypass mechanism
for improving performance of virtio-blk dataplane with
raw format image

- introduce object allocation pool and apply it to
virtio-blk dataplane for improving its performance

- linux-aio changes: fixing for cases of -EAGAIN and partial
completion, increase max events to 256, and remove one unuseful
fields in 'struct qemu_laiocb'

- support multi virtqueue for virtio-blk dataplane

The virtio-blk multi virtqueue feature will be added to virtio spec 1.1[1],
and the 3.17 linux kernel[2] will support the feature in virtio-blk driver.
For those who wants to play the stuff, the kernel side patche can be found
in either Jens's block tree[3] or linux-next[4].

Below fio script running from VM is used for test improvement of these patches:

[global]
direct=1
size=128G
bsrange=4k-4k
timeout=120
numjobs=${JOBS}
ioengine=libaio
iodepth=64
filename=/dev/vdc
group_reporting=1

[f]
rw=randread

One quadcore VM is created in below two hosts to run above fio test:

- laptop(quadcore: 2 physical cores, 2 threads per physical core)
- server(16cores: 8 physical cores, 2 threads per physical core)

Follows the test result on throughput improvement(IOPS) with
this patchset(2 virtqueues per virito-blk device) against QEMU
2.1.0-rc1:

---
| VM in server host | VM in laptop host
---
JOBS=2  | +21%  | +30%
---
JOBS=4  | +84%  | +85%
---

>From above result, we can see both scalability and performance
get improved a lot.

After commit 580b6b2aa2(dataplane: use the QEMU block
layer for I/O), average time for submiting one single
request has been increased a lot, as my trace, the average
time taken for submiting one request has been doubled even
though block plug&unplug mechanism is introduced to
ease its effect. That is why this patchset introduces
selective coroutine bypass mechanism and object allocation
pool for saving the time first. Based on QEMU 2.0, only
single virtio-blk dataplane multi virtqueue patch can get
much better improvement than current result[5].

TODO:
- optimize block layer for linux aio, so that
more time can be saved for submitting request
- support more than one aio-context for improving
virtio-blk performance

 async.c |1 +
 block.c |  129 ++-
 block/linux-aio.c   |   93 +++-
 block/raw-posix.c   |   34 ++
 hw/block/dataplane/virtio-blk.c |  221 ++-
 hw/block/virtio-blk.c   |   30 +-
 include/block/aio.h |   13 +++
 include/block/block.h   |9 ++
 include/block/coroutine.h   |7 ++
 include/block/coroutine_int.h   |5 +
 include/hw/virtio/virtio-blk.h  |   15 +++
 include/qemu/obj_pool.h |   64 
 qemu-coroutine-lock.c   |4 +-
 qemu-coroutine.c|   33 ++
 14 files changed, 553 insertions(+), 105 deletions(-)



[1], http://marc.info/?l=linux-api&m=140486843317107&w=2
[2], http://marc.info/?l=linux-api&m=140418368421229&w=2
[3], http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/ 
#for-3.17/drivers
[4], https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/
[5], http://marc.info/?l=linux-api&m=140377573830230&w=2

Thanks,
--
Ming Lei

[Qemu-devel] [PATCH RFC 03/14] block: support to bypass qemu coroutinue

This patch adds bypass mode support for the coroutinue
in bdrv_co_aio_rw_vector(), which is in the fast path
of lots of block device, especially for virtio-blk dataplane.

Signed-off-by: Ming Lei 
---
 block.c |  129 +++
 1 file changed, 105 insertions(+), 24 deletions(-)

diff --git a/block.c b/block.c
index 128a14f..db7ba37 100644
--- a/block.c
+++ b/block.c
@@ -55,6 +55,21 @@ struct BdrvDirtyBitmap {
 QLIST_ENTRY(BdrvDirtyBitmap) list;
 };
 
+typedef struct CoroutineIOCompletion {
+Coroutine *coroutine;
+int ret;
+bool bypass;
+QEMUIOVector *bounced_iov;
+} CoroutineIOCompletion;
+
+typedef struct BlockDriverAIOCBCoroutine {
+BlockDriverAIOCB common;
+BlockRequest req;
+bool is_write;
+bool *done;
+QEMUBH *bh;
+} BlockDriverAIOCBCoroutine;
+
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
 #define COROUTINE_POOL_RESERVATION 64 /* number of coroutines to reserve */
@@ -122,6 +137,21 @@ int is_windows_drive(const char *filename)
 }
 #endif
 
+static CoroutineIOCompletion *bdrv_get_co_io_comp(BlockDriverAIOCBCoroutine
+  *acb)
+{
+return (CoroutineIOCompletion *)((void *)acb +
+   sizeof(BlockDriverAIOCBCoroutine));
+}
+
+static BlockDriverAIOCBCoroutine *bdrv_get_aio_co(CoroutineIOCompletion *co)
+{
+assert(co->bypass);
+
+return (BlockDriverAIOCBCoroutine *)((void *)co -
+   sizeof(BlockDriverAIOCBCoroutine));
+}
+
 /* throttling disk I/O limits */
 void bdrv_set_io_limits(BlockDriverState *bs,
 ThrottleConfig *cfg)
@@ -3074,7 +3104,16 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 ret = drv->bdrv_co_readv(bs, sector_num, local_sectors,
  &local_qiov);
 
-qemu_iovec_destroy(&local_qiov);
+
+if (qemu_coroutine_self_bypassed()) {
+CoroutineIOCompletion *pco = bdrv_get_co_io_comp(
+ (BlockDriverAIOCBCoroutine *)
+ qemu_coroutine_get_var());
+pco->bounced_iov = g_malloc(sizeof(QEMUIOVector));
+*pco->bounced_iov = local_qiov;
+} else {
+qemu_iovec_destroy(&local_qiov);
+}
 } else {
 ret = 0;
 }
@@ -4652,15 +4691,6 @@ static BlockDriverAIOCB 
*bdrv_aio_writev_em(BlockDriverState *bs,
 return bdrv_aio_rw_vector(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
 }
 
-
-typedef struct BlockDriverAIOCBCoroutine {
-BlockDriverAIOCB common;
-BlockRequest req;
-bool is_write;
-bool *done;
-QEMUBH* bh;
-} BlockDriverAIOCBCoroutine;
-
 static void bdrv_aio_co_cancel_em(BlockDriverAIOCB *blockacb)
 {
 AioContext *aio_context = bdrv_get_aio_context(blockacb->bs);
@@ -4679,6 +4709,12 @@ static const AIOCBInfo bdrv_em_co_aiocb_info = {
 .cancel = bdrv_aio_co_cancel_em,
 };
 
+static const AIOCBInfo bdrv_em_co_bypass_aiocb_info = {
+.aiocb_size = sizeof(BlockDriverAIOCBCoroutine) +
+  sizeof(CoroutineIOCompletion),
+.cancel = bdrv_aio_co_cancel_em,
+};
+
 static void bdrv_co_em_bh(void *opaque)
 {
 BlockDriverAIOCBCoroutine *acb = opaque;
@@ -4698,6 +4734,12 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
 {
 BlockDriverAIOCBCoroutine *acb = opaque;
 BlockDriverState *bs = acb->common.bs;
+bool bypass = qemu_coroutine_self_bypassed();
+
+if (bypass) {
+qemu_coroutine_set_var(acb);
+memset(bdrv_get_co_io_comp(acb), 0, sizeof(CoroutineIOCompletion));
+}
 
 if (!acb->is_write) {
 acb->req.error = bdrv_co_do_readv(bs, acb->req.sector,
@@ -4707,8 +4749,10 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
 acb->req.nb_sectors, acb->req.qiov, acb->req.flags);
 }
 
-acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb);
-qemu_bh_schedule(acb->bh);
+if (!bypass) {
+acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb);
+qemu_bh_schedule(acb->bh);
+}
 }
 
 static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
@@ -4722,8 +4766,18 @@ static BlockDriverAIOCB 
*bdrv_co_aio_rw_vector(BlockDriverState *bs,
 {
 Coroutine *co;
 BlockDriverAIOCBCoroutine *acb;
+const AIOCBInfo *aiocb_info;
+bool bypass;
 
-acb = qemu_aio_get(&bdrv_em_co_aiocb_info, bs, cb, opaque);
+if (qemu_aio_get_bypass_co(bdrv_get_aio_context(bs))) {
+aiocb_info = &bdrv_em_co_bypass_aiocb_info;
+bypass = true;
+} else {
+aiocb_info = &bdrv_em_co_aiocb_info;
+bypass = false;
+}
+
+acb = qemu_aio_get(aiocb_info, bs, cb, opaque);
 acb->req.sector = sector_num;
 acb->req.nb_sectors = nb_sectors;
 acb->req.qiov = qiov

[Qemu-devel] [PATCH RFC 01/14] qemu coroutine: support bypass mode

This patch introduces several APIs for supporting bypass qemu coroutine
in case of being not necessary and for performance's sake.

Signed-off-by: Ming Lei 
---
 include/block/coroutine.h |7 +++
 include/block/coroutine_int.h |5 +
 qemu-coroutine-lock.c |4 ++--
 qemu-coroutine.c  |   33 +
 4 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index b9b7f48..9bd64da 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -234,4 +234,11 @@ void coroutine_fn yield_until_fd_readable(int fd);
  */
 void qemu_coroutine_adjust_pool_size(int n);
 
+/* qemu coroutine bypass APIs */
+void qemu_coroutine_set_bypass(bool bypass);
+bool qemu_coroutine_bypassed(Coroutine *self);
+bool qemu_coroutine_self_bypassed(void);
+void qemu_coroutine_set_var(void *var);
+void *qemu_coroutine_get_var(void);
+
 #endif /* QEMU_COROUTINE_H */
diff --git a/include/block/coroutine_int.h b/include/block/coroutine_int.h
index f133d65..106d0b2 100644
--- a/include/block/coroutine_int.h
+++ b/include/block/coroutine_int.h
@@ -39,6 +39,11 @@ struct Coroutine {
 Coroutine *caller;
 QSLIST_ENTRY(Coroutine) pool_next;
 
+bool bypass;
+
+/* only used in bypass mode */
+void *opaque;
+
 /* Coroutines that should be woken up when we yield or terminate */
 QTAILQ_HEAD(, Coroutine) co_queue_wakeup;
 QTAILQ_ENTRY(Coroutine) co_queue_next;
diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
index e4860ae..7c69ff6 100644
--- a/qemu-coroutine-lock.c
+++ b/qemu-coroutine-lock.c
@@ -82,13 +82,13 @@ static bool qemu_co_queue_do_restart(CoQueue *queue, bool 
single)
 
 bool coroutine_fn qemu_co_queue_next(CoQueue *queue)
 {
-assert(qemu_in_coroutine());
+assert(qemu_in_coroutine() || qemu_coroutine_self_bypassed());
 return qemu_co_queue_do_restart(queue, true);
 }
 
 void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue)
 {
-assert(qemu_in_coroutine());
+assert(qemu_in_coroutine() || qemu_coroutine_self_bypassed());
 qemu_co_queue_do_restart(queue, false);
 }
 
diff --git a/qemu-coroutine.c b/qemu-coroutine.c
index bd574aa..324f5ad 100644
--- a/qemu-coroutine.c
+++ b/qemu-coroutine.c
@@ -157,3 +157,36 @@ void qemu_coroutine_adjust_pool_size(int n)
 
 qemu_mutex_unlock(&pool_lock);
 }
+
+void qemu_coroutine_set_bypass(bool bypass)
+{
+Coroutine *self = qemu_coroutine_self();
+
+self->bypass = bypass;
+}
+
+bool qemu_coroutine_bypassed(Coroutine *self)
+{
+return self->bypass;
+}
+
+bool qemu_coroutine_self_bypassed(void)
+{
+Coroutine *self = qemu_coroutine_self();
+
+return qemu_coroutine_bypassed(self);
+}
+
+void qemu_coroutine_set_var(void *var)
+{
+Coroutine *self = qemu_coroutine_self();
+
+self->opaque = var;
+}
+
+void *qemu_coroutine_get_var(void)
+{
+Coroutine *self = qemu_coroutine_self();
+
+return self->opaque;
+}
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 10/14] linux-aio: remove 'node' from 'struct qemu_laiocb'

No one uses the 'node' field any more, so remove it
from 'struct qemu_laiocb', and this can save 16byte
for the struct on 64bit arch.

Signed-off-by: Ming Lei 
---
 block/linux-aio.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index c06a57d..337f879 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -35,7 +35,6 @@ struct qemu_laiocb {
 size_t nbytes;
 QEMUIOVector *qiov;
 bool is_read;
-QLIST_ENTRY(qemu_laiocb) node;
 };
 
 typedef struct {
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 02/14] qemu aio: prepare for supporting selective bypass coroutine

If device thinks that it isn't necessary to apply coroutine
in its performance sensitive path, it can call
qemu_aio_set_bypass_co(false) to bypass the coroutine which
has supported bypass mode and just call the function directly.

One example is virtio-blk dataplane.

Signed-off-by: Ming Lei 
---
 async.c |1 +
 include/block/aio.h |   13 +
 2 files changed, 14 insertions(+)

diff --git a/async.c b/async.c
index 34af0b2..251a074 100644
--- a/async.c
+++ b/async.c
@@ -293,6 +293,7 @@ AioContext *aio_context_new(void)
(EventNotifierHandler *)
event_notifier_test_and_clear);
 timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
+qemu_aio_set_bypass_co(ctx, false);
 
 return ctx;
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index c23de3c..48d827e 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -87,6 +87,9 @@ struct AioContext {
 
 /* TimerLists for calling timers - one per clock type */
 QEMUTimerListGroup tlg;
+
+/* support selective bypass coroutine */
+bool bypass_co;
 };
 
 /* Used internally to synchronize aio_poll against qemu_bh_schedule.  */
@@ -303,4 +306,14 @@ static inline void aio_timer_init(AioContext *ctx,
 timer_init(ts, ctx->tlg.tl[type], scale, cb, opaque);
 }
 
+static inline void qemu_aio_set_bypass_co(AioContext *ctx, bool bypass)
+{
+ctx->bypass_co = bypass;
+}
+
+static inline bool qemu_aio_get_bypass_co(AioContext *ctx)
+{
+return ctx->bypass_co;
+}
+
 #endif
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 14/14] dataplane: virtio-blk: support mutlti virtqueue

This patch supports to handle host notify from multi
virt queues, but still process/submit io in the
one iothread.

Currently this patch brings up below improvement
with two virtqueues(against single virtqueue):

---
| VM in server host | VM in laptop host
---
JOBS=2  | +8%   | -11%
---
JOBS=4  | +64%  | +29%
---

The reason is that commit 580b6b2aa2(dataplane: use the QEMU
block layer for I/O) causes average submitting time for single
request doubled, so io thread performance get decreased.

Based on QEMU 2.0, only this single patch can achieve
very good improvement:

http://marc.info/?l=linux-api&m=140377573830230&w=2

So hope QEMU block layer can get optimized for linux aio,
or maybe a fast path is needed for linux aio.

Signed-off-by: Ming Lei 
---
 hw/block/dataplane/virtio-blk.c |  209 ---
 include/hw/virtio/virtio-blk.h  |1 +
 2 files changed, 153 insertions(+), 57 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 828fe99..bd66274 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -26,6 +26,11 @@
 
 #define REQ_POOL_SZ 128
 
+typedef struct {
+EventNotifier notifier;
+VirtIOBlockDataPlane *s;
+} VirtIOBlockNotifier;
+
 struct VirtIOBlockDataPlane {
 bool started;
 bool starting;
@@ -35,9 +40,10 @@ struct VirtIOBlockDataPlane {
 VirtIOBlkConf *blk;
 
 VirtIODevice *vdev;
-Vring vring;/* virtqueue vring */
-EventNotifier *guest_notifier;  /* irq */
-QEMUBH *bh; /* bh for guest notification */
+Vring *vring;/* virtqueue vring */
+EventNotifier **guest_notifier;  /* irq */
+uint64_t   pending_guest_notifier;  /* pending guest notifer for vq */
+QEMUBH *bh;  /* bh for guest notification */
 
 /* Note that these EventNotifiers are assigned by value.  This is
  * fine as long as you do not call event_notifier_cleanup on them
@@ -47,7 +53,9 @@ struct VirtIOBlockDataPlane {
 IOThread *iothread;
 IOThread internal_iothread_obj;
 AioContext *ctx;
-EventNotifier host_notifier;/* doorbell */
+VirtIOBlockNotifier *host_notifier; /* doorbell */
+uint64_t   pending_host_notifier;   /* pending host notifer for vq */
+QEMUBH *host_notifier_bh;   /* for handle host notifier */
 
 /* Operation blocker on BDS */
 Error *blocker;
@@ -60,20 +68,26 @@ struct VirtIOBlockDataPlane {
 };
 
 /* Raise an interrupt to signal guest, if necessary */
-static void notify_guest(VirtIOBlockDataPlane *s)
+static void notify_guest(VirtIOBlockDataPlane *s, unsigned int qid)
 {
-if (!vring_should_notify(s->vdev, &s->vring)) {
-return;
+if (vring_should_notify(s->vdev, &s->vring[qid])) {
+event_notifier_set(s->guest_notifier[qid]);
 }
-
-event_notifier_set(s->guest_notifier);
 }
 
 static void notify_guest_bh(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;
+unsigned int qid;
+uint64_t pending = s->pending_guest_notifier;
+
+s->pending_guest_notifier = 0;
 
-notify_guest(s);
+while ((qid = ffsl(pending))) {
+qid--;
+notify_guest(s, qid);
+pending &= ~(1 << qid);
+}
 }
 
 static void complete_request_vring(VirtIOBlockReq *req, unsigned char status)
@@ -81,7 +95,7 @@ static void complete_request_vring(VirtIOBlockReq *req, 
unsigned char status)
 VirtIOBlockDataPlane *s = req->dev->dataplane;
 stb_p(&req->in->status, status);
 
-vring_push(&req->dev->dataplane->vring, &req->elem,
+vring_push(&s->vring[req->qid], &req->elem,
req->qiov.size + sizeof(*req->in));
 
 /* Suppress notification to guest by BH and its scheduled
@@ -90,17 +104,15 @@ static void complete_request_vring(VirtIOBlockReq *req, 
unsigned char status)
  * executed in dataplane aio context even after it is
  * stopped, so needn't worry about notification loss with BH.
  */
+assert(req->qid < 64);
+s->pending_guest_notifier |= (1 << req->qid);
 qemu_bh_schedule(s->bh);
 }
 
-static void handle_notify(EventNotifier *e)
+static void process_vq_notify(VirtIOBlockDataPlane *s, unsigned short qid)
 {
-VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
-   host_notifier);
 VirtIOBlock *vblk = VIRTIO_BLK(s->vdev);
 
-event_notifier_test_and_clear(&s->host_notifier);
-bdrv_io_plug(s->blk->conf.bs);
 for (;;) {
 MultiReqBuffer mrb = {
 .num_writes = 0,
@@ -108,12 +120,13 @@ static void handle_notify(EventNotifier *e)
 in

[Qemu-devel] [PATCH RFC 08/14] linux-aio: fix submit aio as a batch

In the enqueue path, we can't complete request, otherwise
"Co-routine re-entered recursively" may be caused, so this
patch fixes the issue with below ideas:

- for -EAGAIN or partial completion, retry the submission by
an introduced event handler
- for part of completion, also update the io queue
- for other failure, return the failure if in enqueue path,
otherwise, abort all queued I/O

Signed-off-by: Ming Lei 
---
 block/linux-aio.c |   90 -
 1 file changed, 68 insertions(+), 22 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 7ac7e8c..5eb9c92 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -51,6 +51,7 @@ struct qemu_laio_state {
 
 /* io queue for submit at batch */
 LaioQueue io_q;
+EventNotifier retry;  /* handle -EAGAIN and partial completion */
 };
 
 static inline ssize_t io_event_ret(struct io_event *ev)
@@ -154,45 +155,80 @@ static void ioq_init(LaioQueue *io_q)
 io_q->plugged = 0;
 }
 
-static int ioq_submit(struct qemu_laio_state *s)
+static void abort_queue(struct qemu_laio_state *s)
+{
+int i;
+for (i = 0; i < s->io_q.idx; i++) {
+struct qemu_laiocb *laiocb = container_of(s->io_q.iocbs[i],
+  struct qemu_laiocb,
+  iocb);
+laiocb->ret = -EIO;
+qemu_laio_process_completion(s, laiocb);
+}
+}
+
+static int ioq_submit(struct qemu_laio_state *s, bool enqueue)
 {
 int ret, i = 0;
 int len = s->io_q.idx;
+int j = 0;
 
-do {
-ret = io_submit(s->ctx, len, s->io_q.iocbs);
-} while (i++ < 3 && ret == -EAGAIN);
+if (!len) {
+return 0;
+}
 
-/* empty io queue */
-s->io_q.idx = 0;
+ret = io_submit(s->ctx, len, s->io_q.iocbs);
+if (ret == -EAGAIN) {
+event_notifier_set(&s->retry);
+return 0;
+} else if (ret < 0) {
+if (enqueue) {
+return ret;
+}
 
-if (ret < 0) {
-i = 0;
-} else {
-i = ret;
+/* in non-queue path, all IOs have to be completed */
+abort_queue(s);
+ret = len;
+} else if (ret == 0) {
+goto out;
 }
 
-for (; i < len; i++) {
-struct qemu_laiocb *laiocb =
-container_of(s->io_q.iocbs[i], struct qemu_laiocb, iocb);
-
-laiocb->ret = (ret < 0) ? ret : -EIO;
-qemu_laio_process_completion(s, laiocb);
+for (i = ret; i < len; i++) {
+s->io_q.iocbs[j++] = s->io_q.iocbs[i];
 }
+
+ out:
+/* update io queue */
+s->io_q.idx -= ret;
+
 return ret;
 }
 
-static void ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
+static void ioq_submit_retry(EventNotifier *e)
+{
+struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, retry);
+
+event_notifier_test_and_clear(e);
+ioq_submit(s, false);
+}
+
+static int ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
 {
 unsigned int idx = s->io_q.idx;
 
+if (unlikely(idx == s->io_q.size)) {
+return -1;
+}
+
 s->io_q.iocbs[idx++] = iocb;
 s->io_q.idx = idx;
 
-/* submit immediately if queue is full */
-if (idx == s->io_q.size) {
-ioq_submit(s);
+/* submit immediately if queue depth is above 2/3 */
+if (idx > s->io_q.size * 2 / 3) {
+return ioq_submit(s, true);
 }
+
+return 0;
 }
 
 void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
@@ -214,7 +250,7 @@ int laio_io_unplug(BlockDriverState *bs, void *aio_ctx, 
bool unplug)
 }
 
 if (s->io_q.idx > 0) {
-ret = ioq_submit(s);
+ret = ioq_submit(s, false);
 }
 
 return ret;
@@ -258,7 +294,9 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void 
*aio_ctx, int fd,
 goto out_free_aiocb;
 }
 } else {
-ioq_enqueue(s, iocbs);
+if (ioq_enqueue(s, iocbs) < 0) {
+goto out_free_aiocb;
+}
 }
 return &laiocb->common;
 
@@ -272,6 +310,7 @@ void laio_detach_aio_context(void *s_, AioContext 
*old_context)
 struct qemu_laio_state *s = s_;
 
 aio_set_event_notifier(old_context, &s->e, NULL);
+aio_set_event_notifier(old_context, &s->retry, NULL);
 }
 
 void laio_attach_aio_context(void *s_, AioContext *new_context)
@@ -279,6 +318,7 @@ void laio_attach_aio_context(void *s_, AioContext 
*new_context)
 struct qemu_laio_state *s = s_;
 
 aio_set_event_notifier(new_context, &s->e, qemu_laio_completion_cb);
+aio_set_event_notifier(new_context, &s->retry, ioq_submit_retry);
 }
 
 void *laio_init(void)
@@ -295,9 +335,14 @@ void *laio_init(void)
 }
 
 ioq_init(&s->io_q);
+if (event_notifier_init(&s->retry, false) < 0) {
+goto out_notifer_init;
+}
 
 return s;
 
+out_notifer_init:
+io_destroy(s->ctx);
 out_close_efd:
 event_notifier_cleanup(&s->e);
 out_free_state:
@@ -310,6 +355,7

[Qemu-devel] [PATCH RFC 12/14] hw/virtio/virtio-blk.h: introduce VIRTIO_BLK_F_MQ

Prepare for supporting mutli vqs per virtio-blk device.

Signed-off-by: Ming Lei 
---
 include/hw/virtio/virtio-blk.h |8 
 1 file changed, 8 insertions(+)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 45f8894..ad70c9a 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -42,6 +42,12 @@
 #define VIRTIO_BLK_F_TOPOLOGY   10  /* Topology information is available */
 #define VIRTIO_BLK_F_CONFIG_WCE 11  /* write cache configurable */
 
+/*
+ * support multi vqs, and virtio_blk_config.num_queues is only
+ * available when this feature is enabled
+ */
+#define VIRTIO_BLK_F_MQ12
+
 #define VIRTIO_BLK_ID_BYTES 20  /* ID string length */
 
 struct virtio_blk_config
@@ -58,6 +64,8 @@ struct virtio_blk_config
 uint16_t min_io_size;
 uint32_t opt_io_size;
 uint8_t wce;
+uint8_t unused;
+uint16_t num_queues;   /* must be at the end */
 } QEMU_PACKED;
 
 /* These two define direction. */
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 04/14] Revert "raw-posix: drop raw_get_aio_fd() since it is no longer used"

This reverts commit 76ef2cf5493a215efc351f48ae7094d6c183fcac.

Reintroduce the helper of raw_get_aio_fd() for enabling
coroutinue bypass mode in case of raw image.

Signed-off-by: Ming Lei 
---
 block/raw-posix.c |   34 ++
 include/block/block.h |9 +
 2 files changed, 43 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2bcc73d..98b9626 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -2419,6 +2419,40 @@ static BlockDriver bdrv_host_cdrom = {
 };
 #endif /* __FreeBSD__ */
 
+#ifdef CONFIG_LINUX_AIO
+/**
+ * Return the file descriptor for Linux AIO
+ *
+ * This function is a layering violation and should be removed when it becomes
+ * possible to call the block layer outside the global mutex.  It allows the
+ * caller to hijack the file descriptor so I/O can be performed outside the
+ * block layer.
+ */
+int raw_get_aio_fd(BlockDriverState *bs)
+{
+BDRVRawState *s;
+
+if (!bs->drv) {
+return -ENOMEDIUM;
+}
+
+if (bs->drv == bdrv_find_format("raw")) {
+bs = bs->file;
+}
+
+/* raw-posix has several protocols so just check for raw_aio_readv */
+if (bs->drv->bdrv_aio_readv != raw_aio_readv) {
+return -ENOTSUP;
+}
+
+s = bs->opaque;
+if (!s->use_aio) {
+return -ENOTSUP;
+}
+return s->fd;
+}
+#endif /* CONFIG_LINUX_AIO */
+
 static void bdrv_file_init(void)
 {
 /*
diff --git a/include/block/block.h b/include/block/block.h
index 32d3676..8d15693 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -482,6 +482,15 @@ void bdrv_op_block_all(BlockDriverState *bs, Error 
*reason);
 void bdrv_op_unblock_all(BlockDriverState *bs, Error *reason);
 bool bdrv_op_blocker_is_empty(BlockDriverState *bs);
 
+#ifdef CONFIG_LINUX_AIO
+int raw_get_aio_fd(BlockDriverState *bs);
+#else
+static inline int raw_get_aio_fd(BlockDriverState *bs)
+{
+return -ENOTSUP;
+}
+#endif
+
 enum BlockAcctType {
 BDRV_ACCT_READ,
 BDRV_ACCT_WRITE,
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 05/14] dataplane: enable selective bypassing coroutine

This patch enables selective bypassing for the
coroutine in bdrv_co_aio_rw_vector() if the image
format is raw.

With this patch, ~16% thoughput improvement can
be observed in my laptop based VM test, and ~7%
improvement is observed in the VM based on server.

Signed-off-by: Ming Lei 
---
 hw/block/dataplane/virtio-blk.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index d6ba65c..2093e4a 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -28,6 +28,7 @@ struct VirtIOBlockDataPlane {
 bool started;
 bool starting;
 bool stopping;
+bool raw_format;
 
 VirtIOBlkConf *blk;
 
@@ -193,6 +194,8 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *blk,
 error_setg(&s->blocker, "block device is in use by data plane");
 bdrv_op_block_all(blk->conf.bs, s->blocker);
 
+s->raw_format = (raw_get_aio_fd(blk->conf.bs) >= 0);
+
 *dataplane = s;
 }
 
@@ -262,6 +265,10 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 /* Kick right away to begin processing requests already in vring */
 event_notifier_set(virtio_queue_get_host_notifier(vq));
 
+if (s->raw_format) {
+qemu_aio_set_bypass_co(s->ctx, true);
+}
+
 /* Get this show started by hooking up our callbacks */
 aio_context_acquire(s->ctx);
 aio_set_event_notifier(s->ctx, &s->host_notifier, handle_notify);
@@ -291,6 +298,9 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 
 aio_context_release(s->ctx);
 
+if (s->raw_format) {
+qemu_aio_set_bypass_co(s->ctx, false);
+}
 /* Sync vring state back to virtqueue so that non-dataplane request
  * processing can continue when we disable the host notifier below.
  */
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 13/14] hw/block/virtio-blk: create num_queues vqs if dataplane is enabled

Now we only support multi vqs for dataplane case.

Signed-off-by: Ming Lei 
---
 hw/block/virtio-blk.c  |   16 +++-
 include/hw/virtio/virtio-blk.h |3 +++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index ab99156..160b021 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -556,6 +556,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 blkcfg.physical_block_exp = get_physical_block_exp(s->conf);
 blkcfg.alignment_offset = 0;
 blkcfg.wce = bdrv_enable_write_cache(s->bs);
+stw_p(&blkcfg.num_queues, s->blk.num_queues);
 memcpy(config, &blkcfg, sizeof(struct virtio_blk_config));
 }
 
@@ -590,6 +591,12 @@ static uint32_t virtio_blk_get_features(VirtIODevice 
*vdev, uint32_t features)
 if (bdrv_is_read_only(s->bs))
 features |= 1 << VIRTIO_BLK_F_RO;
 
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+if (s->blk.num_queues > 1) {
+features |= 1 << VIRTIO_BLK_F_MQ;
+}
+#endif
+
 return features;
 }
 
@@ -739,8 +746,13 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
 Error *err = NULL;
 #endif
+int i;
 static int virtio_blk_id;
 
+#ifndef CONFIG_VIRTIO_BLK_DATA_PLANE
+blk->num_queues = 1;
+#endif
+
 if (!blk->conf.bs) {
 error_setg(errp, "drive property not set");
 return;
@@ -765,7 +777,9 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 s->rq = NULL;
 s->sector_mask = (s->conf->logical_block_size / BDRV_SECTOR_SIZE) - 1;
 
-s->vq = virtio_add_queue(vdev, 128, virtio_blk_handle_output);
+for (i = 0; i < blk->num_queues; i++)
+s->vqs[i] = virtio_add_queue(vdev, 128, virtio_blk_handle_output);
+s->vq = s->vqs[0];
 s->complete_request = virtio_blk_complete_request;
 #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
 virtio_blk_data_plane_create(vdev, blk, &s->dataplane, &err);
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index ad70c9a..91489b0 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -50,6 +50,8 @@
 
 #define VIRTIO_BLK_ID_BYTES 20  /* ID string length */
 
+#define VIRTIO_BLK_MAX_VQS 16  /* max virtio queues supported now */
+
 struct virtio_blk_config
 {
 uint64_t capacity;
@@ -132,6 +134,7 @@ typedef struct VirtIOBlock {
 VirtIODevice parent_obj;
 BlockDriverState *bs;
 VirtQueue *vq;
+VirtQueue *vqs[VIRTIO_BLK_MAX_VQS];
 void *rq;
 QEMUBH *bh;
 BlockConf *conf;
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 06/14] qemu/obj_pool.h: introduce object allocation pool

This patch introduces object allocation pool for speeding up
object allocation in fast path.

Signed-off-by: Ming Lei 
---
 include/qemu/obj_pool.h |   64 +++
 1 file changed, 64 insertions(+)
 create mode 100644 include/qemu/obj_pool.h

diff --git a/include/qemu/obj_pool.h b/include/qemu/obj_pool.h
new file mode 100644
index 000..94b5f49
--- /dev/null
+++ b/include/qemu/obj_pool.h
@@ -0,0 +1,64 @@
+#ifndef QEMU_OBJ_POOL_HEAD
+#define QEMU_OBJ_POOL_HEAD
+
+typedef struct {
+unsigned int size;
+unsigned int cnt;
+
+void **free_obj;
+int free_idx;
+
+char *objs;
+} ObjPool;
+
+static inline void obj_pool_init(ObjPool *op, void *objs_buf, void **free_objs,
+ unsigned int obj_size, unsigned cnt)
+{
+int i;
+
+op->objs = (char *)objs_buf;
+op->free_obj = free_objs;
+op->size = obj_size;
+op->cnt = cnt;
+
+for (i = 0; i < op->cnt; i++) {
+op->free_obj[i] = (void *)&op->objs[i * op->size];
+}
+op->free_idx = op->cnt;
+}
+
+static inline void *obj_pool_get(ObjPool *op)
+{
+void *obj;
+
+if (!op) {
+return NULL;
+}
+
+if (op->free_idx <= 0) {
+return NULL;
+}
+
+obj = op->free_obj[--op->free_idx];
+return obj;
+}
+
+static inline bool obj_pool_has_obj(ObjPool *op, void *obj)
+{
+return op && (unsigned long)obj >= (unsigned long)&op->objs[0] &&
+   (unsigned long)obj <=
+   (unsigned long)&op->objs[(op->cnt - 1) * op->size];
+}
+
+static inline void obj_pool_put(ObjPool *op, void *obj)
+{
+if (!op || !obj_pool_has_obj(op, obj)) {
+return;
+}
+
+assert(op->free_idx < op->cnt);
+
+op->free_obj[op->free_idx++] = obj;
+}
+
+#endif
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 07/14] dataplane: use object pool to speed up allocation for virtio blk request

g_slice_new(VirtIOBlockReq), its free pair and access the instance
is a bit slow since sizeof(VirtIOBlockReq) takes more than 40KB,
so use object pool to speed up its allocation and release.

With this patch, ~20% thoughput improvement can
be observed in my laptop based VM test, and ~5%
improvement is observed in the VM based on server.

Signed-off-by: Ming Lei 
---
 hw/block/dataplane/virtio-blk.c |   12 
 hw/block/virtio-blk.c   |   13 +++--
 include/hw/virtio/virtio-blk.h  |2 ++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 2093e4a..828fe99 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -24,6 +24,8 @@
 #include "hw/virtio/virtio-bus.h"
 #include "qom/object_interfaces.h"
 
+#define REQ_POOL_SZ 128
+
 struct VirtIOBlockDataPlane {
 bool started;
 bool starting;
@@ -51,6 +53,10 @@ struct VirtIOBlockDataPlane {
 Error *blocker;
 void (*saved_complete_request)(struct VirtIOBlockReq *req,
unsigned char status);
+
+VirtIOBlockReq  reqs[REQ_POOL_SZ];
+void *free_reqs[REQ_POOL_SZ];
+ObjPool  req_pool;
 };
 
 /* Raise an interrupt to signal guest, if necessary */
@@ -238,6 +244,10 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 return;
 }
 
+vblk->obj_pool = &s->req_pool;
+obj_pool_init(vblk->obj_pool, s->reqs, s->free_reqs,
+  sizeof(VirtIOBlockReq), REQ_POOL_SZ);
+
 /* Set up guest notifier (irq) */
 if (k->set_guest_notifiers(qbus->parent, 1, true) != 0) {
 fprintf(stderr, "virtio-blk failed to set guest notifier, "
@@ -298,6 +308,8 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 
 aio_context_release(s->ctx);
 
+vblk->obj_pool = NULL;
+
 if (s->raw_format) {
 qemu_aio_set_bypass_co(s->ctx, false);
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index c241c50..2a11bc4 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -31,7 +31,11 @@
 
 VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
 {
-VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq);
+VirtIOBlockReq *req = obj_pool_get(s->obj_pool);
+
+if (!req) {
+req = g_slice_new(VirtIOBlockReq);
+}
 req->dev = s;
 req->qiov.size = 0;
 req->next = NULL;
@@ -41,7 +45,11 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
 void virtio_blk_free_request(VirtIOBlockReq *req)
 {
 if (req) {
-g_slice_free(VirtIOBlockReq, req);
+if (obj_pool_has_obj(req->dev->obj_pool, req)) {
+obj_pool_put(req->dev->obj_pool, req);
+} else {
+g_slice_free(VirtIOBlockReq, req);
+}
 }
 }
 
@@ -801,6 +809,7 @@ static void virtio_blk_instance_init(Object *obj)
 {
 VirtIOBlock *s = VIRTIO_BLK(obj);
 
+s->obj_pool = NULL;
 object_property_add_link(obj, "iothread", TYPE_IOTHREAD,
  (Object **)&s->blk.iothread,
  qdev_prop_allow_set_link_before_realize,
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index afb7b8d..49ac234 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -18,6 +18,7 @@
 #include "hw/block/block.h"
 #include "sysemu/iothread.h"
 #include "block/block.h"
+#include "qemu/obj_pool.h"
 
 #define TYPE_VIRTIO_BLK "virtio-blk-device"
 #define VIRTIO_BLK(obj) \
@@ -135,6 +136,7 @@ typedef struct VirtIOBlock {
 Notifier migration_state_notifier;
 struct VirtIOBlockDataPlane *dataplane;
 #endif
+ObjPool *obj_pool;
 } VirtIOBlock;
 
 typedef struct MultiReqBuffer {
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 09/14] linux-aio: increase max event to 256

This patch increases max event to 256 for the comming
virtio-blk multi virtqueue support.

Signed-off-by: Ming Lei 
---
 block/linux-aio.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 5eb9c92..c06a57d 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -23,7 +23,7 @@
  *  than this we will get EAGAIN from io_submit which is communicated to
  *  the guest as an I/O error.
  */
-#define MAX_EVENTS 128
+#define MAX_EVENTS 256
 
 #define MAX_QUEUED_IO  128
 
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 11/14] hw/virtio-pci: introduce num_queues property

This patch introduces the parameter of 'num_queues', and
prepare for supporting mutli vqs of virtio-blk.

Signed-off-by: Ming Lei 
---
 hw/block/virtio-blk.c  |1 +
 include/hw/virtio/virtio-blk.h |1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 2a11bc4..ab99156 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -826,6 +826,7 @@ static Property virtio_blk_properties[] = {
 #endif
 #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
 DEFINE_PROP_BIT("x-data-plane", VirtIOBlock, blk.data_plane, 0, false),
+DEFINE_PROP_UINT32("num_queues", VirtIOBlock, blk.num_queues, 1),
 #endif
 DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 49ac234..45f8894 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -114,6 +114,7 @@ struct VirtIOBlkConf
 uint32_t scsi;
 uint32_t config_wce;
 uint32_t data_plane;
+uint32_t num_queues;
 };
 
 struct VirtIOBlockDataPlane;
-- 
1.7.9.5

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases (was: [ANNOUNCE] QEMU 2.1.0-rc2 is now available)

2014-07-16 Thread Peter Maydell

On 16 July 2014 17:28, Stefan Weil  wrote:
> a recent commit (e49ab19fcaa617ad6cdfe1ac401327326b6a2552) increased the
> requirements for libiscsi from 1.4.0 to 1.9.0. From a user's point of
> view, this change results in a regression for Debian and similar Linux
> distributions: QEMU no longer includes iscsi features.
>
> In this special case, the current Debian stable includes QEMU packages
> with libiscsi 1.4, but new builds won't include it because that version
> is now too old. Debian testing includes a brand new libiscsi, but it
> does not include libiscsi.pc, so pkg-config won't know that it is
> available and configure will disable libiscsi. I have a patch which
> fixes this, so QEMU for Debian testing could include libiscsi again.
>
> Is a feature regression like this one acceptable? Do we need additional
> testing (maybe run the build bots with --enable-xxx, so builds fail when
> xxx no longer works)?

In general, we should try to avoid feature regressions, but it's
going to be a case-by-case thing. In this particular instance,
upstream libiscsi don't recommend pre-1.9 for production use
(as the commit message documents), and I don't think we would
be doing our users any favours by allowing them to build something
that's likely to be broken. We should of course flag up this sort of
"minimum version of our dependencies has been bumped" info in
the release notes.

I think that "does QEMU still build with all the features we need on
our distro?" has to be testing done by the people who package and
maintain QEMU for each distro -- they're the only people who know
which configurations options they enable and which baseline versions
of their distro they still care about packaging mainline QEMU for.

thanks
-- PMM

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases


Il 16/07/2014 18:28, Stefan Weil ha scritto:

Debian testing includes a brand new libiscsi, but it
does not include libiscsi.pc, so pkg-config won't know that it is
available and configure will disable libiscsi.


That's a packaging bug.


I have a patch which
fixes this, so QEMU for Debian testing could include libiscsi again.

Is a feature regression like this one acceptable? Do we need additional
testing (maybe run the build bots with --enable-xxx, so builds fail when
xxx no longer works)?


As mentioned in the e49ab19fcaa617ad6cdfe1ac401327326b6a2552 commit 
message, this was intentional.  I was reluctant to do it, but ultimately 
Peter Lieven convinced me that it isn't just about using fancy new APIs; 
libiscsi was too buggy to be useful until release 1.8.0 (even 1.9.0 
requires a patch to avoid segfaults, and many more if you want to run it 
on ARM).


Paolo

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> Il 16/07/2014 13:52, Dr. David Alan Gilbert ha scritto:
> >* Paolo Bonzini (pbonz...@redhat.com) wrote:
> >>Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:
> >
> >+
> >+/* If it's already open, return it */
> >+if (qfs->file->return_path) {
> >+return qfs->file->return_path;
> 
> Wouldn't this leave a dangling file descriptor if you call
> socket_dup_return_path twice, and then close the original QEMUFile?
> >>>
> >>>Hmm - how?
> >>
> >>The problem is that there is no reference count on QEMUFile, so if you do
> >>
> >>  f1 = open_return_path(f0);
> >>  f2 = open_return_path(f0);
> >>  /* now f1 == f2 */
> >>  qemu_fclose(f1);
> >>  /* now f2 is dangling */
> >
> >I think from the way I'm using it, I can remove the optimisation, but I do
> >need to check.
> >
> >I'm not too sure what your worry is about 'f2' in this case; I guess the 
> >caller
> >needs to know that it should only close the return path once - is that
> >your worry?
> 
> Yes.  The API is not well encapsulated; a "random" caller of
> open_return_path does not know (and cannot know) whether it should close the
> returned file or not.

OK, then yes I'll give that a go taking out those optimisations.

> >I'm more nervous about dropping that one, because the current scheme
> >does provide a clean way of finding the forward path if you've got the
> >reverse (although I don't think I make use of it).
> 
> If it isn't used, why keep it?

It just felt pleasently symmetric being able to get the forward path
by asking for the return path on the return path; but I can remove it.

> >>> Source side
> >>>Forward path - written by migration thread
> >>>   : It's OK for this to be blocking, but we end up with it being
> >>> non-blocking, and modify the socket code to emulate blocking.
> >>
> >>This likely has a performance impact though.  The first migration thread
> >>code drop from Juan already improved throughput a lot, even if it kept the
> >>iothread all the time and only converted from nonblocking writes to
> >>blocking.
> >
> >Can you give some more reasoning as to why you think this will hit the
> >performance so much; I thought the output buffers were quite big anyway.
> 
> I don't really know, it's
> >>>Return path  - opened by main thread, read by fd_handler on main thread
> >>>   : Must be non-blocking so as not to block the main thread while
> >>> waiting for a partially sent command.
> >>
> >>Why can't you handle this in the migration thread (or a new postcopy thread
> >>on the source side)?  Then it can stay blocking.
> >
> >Handling it within the migration thread would make it much more complicated
> >(which would be bad since it's already complex enough);
> 
> Ok.  I'm not sure why it is more complicated since migration is essentially
> two-phase, one where the source drives it and one where the source just
> waits for requests, but I'll trust you on this. :)

It's not as clean a split like that; during the postcopy phase we still do the 
linear
page scan to send pages before they're requested, so the main migration thread
code keeps going.
(There's an 'interesting' balance here, send too many linear pages and they get
in the way of the postcopy requests and increase the latency, but sending them
means that you get a lot of the pages without having to request them which is 0
latency)

> >>> Destination side
> >>>Forward path - read by main thread
> >>
> >>This must be nonblocking so that the monitor keeps responding.
> >
> >Interesting, I suspect none of the code in there is set up for that at the
> >moment, so how does that work during migration at the moment?
> 
> It sure is. :)

Oh so it is; I missed the 'qemu_set_nonblock(fd);' in process_incoming_migration

> On the destination side, migration is done in a coroutine (see
> process_incoming_migration) so it's all transparent.  Only socket_get_buffer
> has to know about this:
> 
> len = qemu_recv(s->fd, buf, size, 0);
> if (len != -1) {
> break;
> }
> if (socket_error() == EAGAIN) {
> yield_until_fd_readable(s->fd);
> } else if (socket_error() != EINTR) {
> break;
> }
> 
> If the socket is put in blocking mode recv will never return EAGAIN, so this
> code will only run if the socket is nonblocking.

OK, yes.

> >Actually, I see I missed something here; this should be:
> >
> >   Destination side
> > Forward path - read by main thread, and listener thread (see the
> > separate mail that described that listner thread)
> >
> >and that means that once postcopy is going (and the listener thread started)
> >it can't block the monitor.
> 
> Ok, so the listener thread can do socket_set_block(qemu_get_fd(file)) once
> it gets its hands on the QEMUFile.
> 
> >>>Return path  - opened by main thread, written by main thread AND 
> >>> postcopy

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases

2014-07-16 Thread Stefan Weil

Am 16.07.2014 18:49, schrieb Paolo Bonzini:
> Il 16/07/2014 18:28, Stefan Weil ha scritto:
>> Debian testing includes a brand new libiscsi, but it
>> does not include libiscsi.pc, so pkg-config won't know that it is
>> available and configure will disable libiscsi.
> 
> That's a packaging bug.

CC'ing Michael as he is the Debian maintainer of this package and
Aurélien who maintains QEMU for Debian.

Michael, should I send a Debian bug report for libiscsi-dev? Would an
update of libiscsi for Debian stable be reasonable if versions older
than 1.9 are too buggy to be used? I must admit that I'm a little bit
surprised because iSCSI support worked for me quite well the last time I
used it with Debian wheezy.

Regards
Stefan

> 
>> I have a patch which
>> fixes this, so QEMU for Debian testing could include libiscsi again.
>>
>> Is a feature regression like this one acceptable? Do we need additional
>> testing (maybe run the build bots with --enable-xxx, so builds fail when
>> xxx no longer works)?
> 
> As mentioned in the e49ab19fcaa617ad6cdfe1ac401327326b6a2552 commit
> message, this was intentional.  I was reluctant to do it, but ultimately
> Peter Lieven convinced me that it isn't just about using fancy new APIs;
> libiscsi was too buggy to be useful until release 1.8.0 (even 1.9.0
> requires a patch to avoid segfaults, and many more if you want to run it
> on ARM).
> 
> Paolo

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases

2014-07-16 Thread ronnie sahlberg

On Wed, Jul 16, 2014 at 10:11 AM, Stefan Weil  wrote:
> Am 16.07.2014 18:49, schrieb Paolo Bonzini:
>> Il 16/07/2014 18:28, Stefan Weil ha scritto:
>>> Debian testing includes a brand new libiscsi, but it
>>> does not include libiscsi.pc, so pkg-config won't know that it is
>>> available and configure will disable libiscsi.
>>
>> That's a packaging bug.
>
> CC'ing Michael as he is the Debian maintainer of this package and
> Aurélien who maintains QEMU for Debian.
>
> Michael, should I send a Debian bug report for libiscsi-dev? Would an
> update of libiscsi for Debian stable be reasonable if versions older
> than 1.9 are too buggy to be used?

If you ask debian to upgrade. Could you ask them to wait and upgrade after I
have release the next version, hopefully if all goes well, at the end
of this week?

It contains new functionality, thanks to plieven, to better handle
cases where active/passive storage arrays
perform failover.


> I must admit that I'm a little bit
> surprised because iSCSI support worked for me quite well the last time I
> used it with Debian wheezy.

I think, and plieven please correct me if I am wrong, earlier version
would work reasonably well for basic use
but there were bugs and gaps in functionality that made it ill suited
for enterprise environments.

>
> Regards
> Stefan
>
>>
>>> I have a patch which
>>> fixes this, so QEMU for Debian testing could include libiscsi again.
>>>
>>> Is a feature regression like this one acceptable? Do we need additional
>>> testing (maybe run the build bots with --enable-xxx, so builds fail when
>>> xxx no longer works)?
>>
>> As mentioned in the e49ab19fcaa617ad6cdfe1ac401327326b6a2552 commit
>> message, this was intentional.  I was reluctant to do it, but ultimately
>> Peter Lieven convinced me that it isn't just about using fancy new APIs;
>> libiscsi was too buggy to be useful until release 1.8.0 (even 1.9.0
>> requires a patch to avoid segfaults, and many more if you want to run it
>> on ARM).
>>
>> Paolo
>
>

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases

2014-07-16 Thread Michael Tokarev

16.07.2014 21:11, Stefan Weil wrote:
> Am 16.07.2014 18:49, schrieb Paolo Bonzini:
>> Il 16/07/2014 18:28, Stefan Weil ha scritto:
>>> Debian testing includes a brand new libiscsi, but it
>>> does not include libiscsi.pc, so pkg-config won't know that it is
>>> available and configure will disable libiscsi.
>>
>> That's a packaging bug.
> 
> CC'ing Michael as he is the Debian maintainer of this package and
> Aurélien who maintains QEMU for Debian.

Actually I maintain both for several years.

> Michael, should I send a Debian bug report for libiscsi-dev? Would an
> update of libiscsi for Debian stable be reasonable if versions older
> than 1.9 are too buggy to be used? I must admit that I'm a little bit
> surprised because iSCSI support worked for me quite well the last time I
> used it with Debian wheezy.

The bug is indeed in libiscsi-dev debian package in testing/jessie, I
forgot to include the .pc file.  Since so far, qemu is the only user of
libiscsi in debian, and since current version of qemu in debian testing
builds with thie libiscsi-dev, the bug went unnoticed.  I'll update the
libiscsi package in a moment, to include the .pc file.  I haven't read
whole thread yet, but I assume that new (2.1-tobe) qemu needs a .pc
file for libiscsi.

More recent libiscsi will _not_ be available directly in debian stable
(wheezy), but I'll provide one in a form of a backport shortly too,
because of exactly this issue.  I'm not sure yet if I'll provide a
backport of libiscsi before doing qemu-2.1 backport, however.

Thank you for the heads-up!

And no, there is no need to tweak qemu for this, the new version
requiriment is here for a reason.

/mjt

>>> I have a patch which
>>> fixes this, so QEMU for Debian testing could include libiscsi again.
>>>
>>> Is a feature regression like this one acceptable? Do we need additional
>>> testing (maybe run the build bots with --enable-xxx, so builds fail when
>>> xxx no longer works)?
>>
>> As mentioned in the e49ab19fcaa617ad6cdfe1ac401327326b6a2552 commit
>> message, this was intentional.  I was reluctant to do it, but ultimately
>> Peter Lieven convinced me that it isn't just about using fancy new APIs;
>> libiscsi was too buggy to be useful until release 1.8.0 (even 1.9.0
>> requires a patch to avoid segfaults, and many more if you want to run it
>> on ARM).
>>
>> Paolo
>

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases

2014-07-16 Thread Michael Tokarev

16.07.2014 21:23, ronnie sahlberg wrote:

> If you ask debian to upgrade. Could you ask them to wait and upgrade after I
> have release the next version, hopefully if all goes well, at the end
> of this week?

There's no problem in updating now to fix missing .pc file and to update
next week to include a new version.

Thanks,

/mjt

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases

2014-07-16 Thread Peter Lieven

Am 16.07.2014 19:23, schrieb ronnie sahlberg:
> On Wed, Jul 16, 2014 at 10:11 AM, Stefan Weil  wrote:
>> Am 16.07.2014 18:49, schrieb Paolo Bonzini:
>>> Il 16/07/2014 18:28, Stefan Weil ha scritto:
 Debian testing includes a brand new libiscsi, but it
 does not include libiscsi.pc, so pkg-config won't know that it is
 available and configure will disable libiscsi.
>>> That's a packaging bug.
>> CC'ing Michael as he is the Debian maintainer of this package and
>> Aurélien who maintains QEMU for Debian.
>>
>> Michael, should I send a Debian bug report for libiscsi-dev? Would an
>> update of libiscsi for Debian stable be reasonable if versions older
>> than 1.9 are too buggy to be used?
> If you ask debian to upgrade. Could you ask them to wait and upgrade after I
> have release the next version, hopefully if all goes well, at the end
> of this week?

If someone from Ubuntu reads here, this is also for you. Your latest
LTS release does also contain 1.4.0.

>
> It contains new functionality, thanks to plieven, to better handle
> cases where active/passive storage arrays
> perform failover.

Yes and it fixes yet another bug in serial logic. Not a big one, but it
could cause protocol errors.

>
>
>> I must admit that I'm a little bit
>> surprised because iSCSI support worked for me quite well the last time I
>> used it with Debian wheezy.
> I think, and plieven please correct me if I am wrong, earlier version
> would work reasonably well for basic use
> but there were bugs and gaps in functionality that made it ill suited
> for enterprise environments.

It depends how you define basic use. It causes severe protocol violations 
especially
on reconnects and it will simply stop sending PDUs if CmdSN wraps from 
0x to 0x0.
Thats ok for try and mount an iSCSI volume and see if it works, but its
not usable for production use enterprise or not.

I mentioned the most critical bugs in the commit message to the libiscsi
version bump to 1.8.0 (which Paolo later increased to 1.9.0 to make the
BUSY handling possible).

Peter

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases

2014-07-16 Thread Peter Lieven

Am 16.07.2014 18:46, schrieb Peter Maydell:
> On 16 July 2014 17:28, Stefan Weil  wrote:
>> a recent commit (e49ab19fcaa617ad6cdfe1ac401327326b6a2552) increased the
>> requirements for libiscsi from 1.4.0 to 1.9.0. From a user's point of
>> view, this change results in a regression for Debian and similar Linux
>> distributions: QEMU no longer includes iscsi features.
>>
>> In this special case, the current Debian stable includes QEMU packages
>> with libiscsi 1.4, but new builds won't include it because that version
>> is now too old. Debian testing includes a brand new libiscsi, but it
>> does not include libiscsi.pc, so pkg-config won't know that it is
>> available and configure will disable libiscsi. I have a patch which
>> fixes this, so QEMU for Debian testing could include libiscsi again.
>>
>> Is a feature regression like this one acceptable? Do we need additional
>> testing (maybe run the build bots with --enable-xxx, so builds fail when
>> xxx no longer works)?
> In general, we should try to avoid feature regressions, but it's
> going to be a case-by-case thing. In this particular instance,
> upstream libiscsi don't recommend pre-1.9 for production use
> (as the commit message documents), and I don't think we would
> be doing our users any favours by allowing them to build something
> that's likely to be broken. We should of course flag up this sort of
> "minimum version of our dependencies has been bumped" info in
> the release notes.

I will update the Wiki.

Peter

>
> I think that "does QEMU still build with all the features we need on
> our distro?" has to be testing done by the people who package and
> maintain QEMU for each distro -- they're the only people who know
> which configurations options they enable and which baseline versions
> of their distro they still care about packaging mainline QEMU for.
>
> thanks
> -- PMM

Re: [Qemu-devel] [PATCH 1/2] block: Add Error argument to bdrv_refresh_limits()

2014-07-16 Thread Eric Blake

On 07/16/2014 09:48 AM, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  block.c   | 33 +++--
>  block/iscsi.c |  3 +--
>  block/qcow2.c |  4 +---
>  block/qed.c   |  4 +---
>  block/raw-posix.c |  4 +---
>  block/raw_bsd.c   |  3 +--
>  block/stream.c|  2 +-
>  block/vmdk.c  |  4 +---
>  include/block/block.h |  2 +-
>  include/block/block_int.h |  2 +-
>  10 files changed, 32 insertions(+), 29 deletions(-)
> 

>  /* Take some limits from the children as a default */
>  if (bs->file) {
> -bdrv_refresh_limits(bs->file);
> +bdrv_refresh_limits(bs->file, &local_err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
> +}

Nice that you are no longer ignoring failure from the child.


> @@ -1154,7 +1167,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
> BlockDriverState *backing_hd)
>  bdrv_op_unblock(bs->backing_hd, BLOCK_OP_TYPE_COMMIT,
>  bs->backing_blocker);
>  out:
> -bdrv_refresh_limits(bs);
> +bdrv_refresh_limits(bs, NULL);
>  }
>  
>  /*
> @@ -1778,7 +1791,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>BDRV_O_CACHE_WB);
>  reopen_state->bs->read_only = !(reopen_state->flags & BDRV_O_RDWR);
>  
> -bdrv_refresh_limits(reopen_state->bs);
> +bdrv_refresh_limits(reopen_state->bs, NULL);

> +++ b/block/stream.c
> @@ -76,7 +76,7 @@ static void close_unused_images(BlockDriverState *top, 
> BlockDriverState *base,
>  bdrv_unref(unused);
>  }
>  
> -bdrv_refresh_limits(top);
> +bdrv_refresh_limits(top, NULL);
>  }
>  

Should these three callers be concerned about failure?  If so, would
&error_abort be better than NULL?  But as for this patch, you are
preserving existing semantics, so you could save it for a later patch.

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/2] raw-posix: Fail gracefully if no working alignment is found

2014-07-16 Thread Eric Blake

On 07/16/2014 09:48 AM, Kevin Wolf wrote:
> If qemu couldn't find out what O_DIRECT alignment to use with a given
> file, it would run into assert(bdrv_opt_mem_align(bs) != 0); in block.c
> and confuse users. This adds a more descriptive error message for such
> cases.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/raw-posix.c | 35 +++
>  1 file changed, 27 insertions(+), 8 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Andrey Korolyov

On Wed, Jul 16, 2014 at 5:24 PM, Andrey Korolyov  wrote:
> On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti  wrote:
>> On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
>>> On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti  
>>> wrote:
>>> > On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
>>> >> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini  
>>> >> wrote:
>>> >> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
>>> >> >
>>> >> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti 
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>>> >> 
>>> >>  On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov 
>>> >>  wrote:
>>> >> >
>>> >> > On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah 
>>> >> > wrote:
>>> >> >>
>>> >> >> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>>> >> >>>
>>> >> >>> Hello,
>>> >> >>>
>>> >> >>> the issue is not specific to the iothread code because generic
>>> >> >>> virtio-blk also hangs up:
>>> >> >>
>>> >> >>
>>> >> >> Do you know which version works well?  If you could bisect, 
>>> >> >> that'll
>>> >> >> help a lot.
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Amit
>>> >> >
>>> >> >
>>> >> > Hi,
>>> >> >
>>> >> > 2.0 works definitely well. I`ll try to finish bisection today, 
>>> >> > though
>>> >> > every step takes about 10 minutes to complete.
>>> >> 
>>> >> 
>>> >>  Yay! It is even outside of virtio-blk.
>>> >> 
>>> >>  commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>>> >>  Author: Marcelo Tosatti 
>>> >>  Date:   Tue Jun 3 13:34:48 2014 -0300
>>> >> 
>>> >>  kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
>>> >>  calculation
>>> >> 
>>> >>  Ensure proper env->tsc value for kvmclock_current_nsec 
>>> >>  calculation.
>>> >> 
>>> >>  Reported-by: Marcin Gibuła 
>>> >>  Cc: qemu-sta...@nongnu.org
>>> >>  Signed-off-by: Marcelo Tosatti 
>>> >>  Signed-off-by: Paolo Bonzini 
>>> >> >>>
>>> >> >>>
>>> >> >>> Andrey,
>>> >> >>>
>>> >> >>> Can you please provide instructions on how to create reproducible
>>> >> >>> environment?
>>> >> >>>
>>> >> >>> The following patch is equivalent to the original patch, for
>>> >> >>> the purposes of fixing the kvmclock problem.
>>> >> >>>
>>> >> >>> Perhaps it becomes easier to spot the reason for the hang you are
>>> >> >>> experiencing.
>>> >> >>>
>>> >> >>>
>>> >> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
>>> >> >>> index 272a88a..feb5fc5 100644
>>> >> >>> --- a/hw/i386/kvm/clock.c
>>> >> >>> +++ b/hw/i386/kvm/clock.c
>>> >> >>> @@ -17,7 +17,6 @@
>>> >> >>>  #include "qemu/host-utils.h"
>>> >> >>>  #include "sysemu/sysemu.h"
>>> >> >>>  #include "sysemu/kvm.h"
>>> >> >>> -#include "sysemu/cpus.h"
>>> >> >>>  #include "hw/sysbus.h"
>>> >> >>>  #include "hw/kvm/clock.h"
>>> >> >>>
>>> >> >>> @@ -66,7 +65,6 @@ static uint64_t 
>>> >> >>> kvmclock_current_nsec(KVMClockState *s)
>>> >> >>>
>>> >> >>>  cpu_physical_memory_read(kvmclock_struct_pa, &time, 
>>> >> >>> sizeof(time));
>>> >> >>>
>>> >> >>> -assert(time.tsc_timestamp <= migration_tsc);
>>> >> >>>  delta = migration_tsc - time.tsc_timestamp;
>>> >> >>>  if (time.tsc_shift < 0) {
>>> >> >>>  delta >>= -time.tsc_shift;
>>> >> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void 
>>> >> >>> *opaque,
>>> >> >>> int running,
>>> >> >>>  if (s->clock_valid) {
>>> >> >>>  return;
>>> >> >>>  }
>>> >> >>> -
>>> >> >>> -cpu_synchronize_all_states();
>>> >> >>>  ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>>> >> >>>  if (ret < 0) {
>>> >> >>>  fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
>>> >> >>> strerror(ret));
>>> >> >>> diff --git a/migration.c b/migration.c
>>> >> >>> index 8d675b3..34f2325 100644
>>> >> >>> --- a/migration.c
>>> >> >>> +++ b/migration.c
>>> >> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>>> >> >>>  
>>> >> >>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>>> >> >>>  old_vm_running = runstate_is_running();
>>> >> >>>
>>> >> >>> +cpu_synchronize_all_states();
>>> >> >>>  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>> >> >>>  if (ret >= 0) {
>>> >> >>>  qemu_file_set_rate_limit(s->file, INT64_MAX);
>>> >> >
>>> >> >
>>> >> > It could also be useful to apply the above patch _and_ revert
>>> >> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
>>> >> >
>>> >> > Paolo
>>> >>
>>> >> Yes, it solved the issue for me! (it took much time to check because
>>> >> most of country` debian mirrors went inconsistent by some reason)
>>> >>
>>> >> Also trivial addition:
>>> >>
>>> >> diff --git a/migra

Re: [Qemu-devel] [PATCH 1/2] block: Add Error argument to bdrv_refresh_limits()

Am 16.07.2014 um 20:07 hat Eric Blake geschrieben:
> On 07/16/2014 09:48 AM, Kevin Wolf wrote:
> > @@ -1154,7 +1167,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
> > BlockDriverState *backing_hd)
> >  bdrv_op_unblock(bs->backing_hd, BLOCK_OP_TYPE_COMMIT,
> >  bs->backing_blocker);
> >  out:
> > -bdrv_refresh_limits(bs);
> > +bdrv_refresh_limits(bs, NULL);
> >  }
> >  
> >  /*
> > @@ -1778,7 +1791,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
> >BDRV_O_CACHE_WB);
> >  reopen_state->bs->read_only = !(reopen_state->flags & BDRV_O_RDWR);
> >  
> > -bdrv_refresh_limits(reopen_state->bs);
> > +bdrv_refresh_limits(reopen_state->bs, NULL);
> 
> > +++ b/block/stream.c
> > @@ -76,7 +76,7 @@ static void close_unused_images(BlockDriverState *top, 
> > BlockDriverState *base,
> >  bdrv_unref(unused);
> >  }
> >  
> > -bdrv_refresh_limits(top);
> > +bdrv_refresh_limits(top, NULL);
> >  }
> >  
> 
> Should these three callers be concerned about failure?  If so, would
> &error_abort be better than NULL?  But as for this patch, you are
> preserving existing semantics, so you could save it for a later patch.

They probably should, but I couldn't figure out what they should to on
failure. Aborting qemu because a single block device fails isn't nice.
&error_abort has similar semantics as assert() for me ("This can't ever
happen"), and this is definitely not the case here as I/O errors can
happen anytime.

If anything, it's similar to a qcow2 image that has detected corruption
and therefore made the BDS unusable. But setting bs->drv = NULL within a
single block driver is already tricky (and so it was buggy at first),
doing it in block.c with BDSes of any driver sounds even worse.

This is the reason why I kept NULL here, even if it's not completely
right.

Kevin

pgphNT7w32VvT.pgp
Description: PGP signature

[Qemu-devel] [Bug 1338563] Re: README refers to a non-extant file

2014-07-16 Thread Daniel U. Thibault

** Changed in: qemu
   Status: Invalid => Opinion

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1338563

Title:
  README refers to a non-extant file

Status in QEMU:
  Opinion

Bug description:
  The current stable QEMU release (1.4.2-89400a8) README consists of a
  single line telling the new user to "read the documentation in qemu-
  doc.html or on http://wiki.qemu.org";.  The distribution includes no
  qemu-doc.html, just a qemu-doc.texi.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1338563/+subscriptions

[Qemu-devel] [PATCH V3 2/5] runner: Tool for fuzz tests execution

The purpose of the test runner is to prepare test environment (e.g. create a
work directory, a test image, etc), execute the program under test with
parameters, indicate a test failure if the program was killed during test
execution and collect core dumps, logs and other test artifacts.

The test runner doesn't depend on image format or a program will be tested, so
it can be used with any external image generator and program under test.

Signed-off-by: Maria Kustova 
---
 tests/image-fuzzer/runner/runner.py | 370 
 1 file changed, 370 insertions(+)
 create mode 100755 tests/image-fuzzer/runner/runner.py

diff --git a/tests/image-fuzzer/runner/runner.py 
b/tests/image-fuzzer/runner/runner.py
new file mode 100755
index 000..f6fe8d1
--- /dev/null
+++ b/tests/image-fuzzer/runner/runner.py
@@ -0,0 +1,370 @@
+#!/usr/bin/env python
+
+# Tool for running fuzz tests
+#
+# Copyright (C) 2014 Maria Kustova 
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import sys, os, signal
+import subprocess
+import random
+from itertools import count
+from shutil import rmtree
+import getopt
+try:
+import json
+except ImportError:
+try:
+import simplejson as json
+except ImportError:
+print "Warning: Module for JSON processing is not found.\n" + \
+"'--config' and '--command' options are not supported."
+import resource
+resource.setrlimit(resource.RLIMIT_CORE, (-1, -1))
+
+
+def multilog(msg, *output):
+""" Write an object to all of specified file descriptors
+"""
+
+for fd in output:
+fd.write(msg)
+fd.flush()
+
+
+def str_signal(sig):
+""" Convert a numeric value of a system signal to the string one
+defined by the current operational system
+"""
+
+for k, v in signal.__dict__.items():
+if v == sig:
+return k
+
+
+class TestException(Exception):
+"""Exception for errors risen by TestEnv objects"""
+pass
+
+
+class TestEnv(object):
+""" Trivial test object
+
+The class sets up test environment, generates backing and test images
+and executes application under tests with specified arguments and a test
+image provided.
+All logs are collected.
+Summary log will contain short descriptions and statuses of tests in
+a run.
+Test log will include application (e.g. 'qemu-img') logs besides info sent
+to the summary log.
+"""
+
+def __init__(self, test_id, seed, work_dir, run_log,
+ cleanup=True, log_all=False):
+"""Set test environment in a specified work directory.
+
+Path to qemu-img and qemu-io will be retrieved from 'QEMU_IMG' and
+'QEMU_IO' environment variables
+"""
+if seed is not None:
+self.seed = seed
+else:
+self.seed = str(hash(random.randint(0, sys.maxint)))
+random.seed(self.seed)
+
+self.init_path = os.getcwd()
+self.work_dir = work_dir
+self.current_dir = os.path.join(work_dir, 'test-' + test_id)
+self.qemu_img = \
+os.environ.get('QEMU_IMG', 'qemu-img')\
+  .strip().split(' ')
+self.qemu_io = \
+   os.environ.get('QEMU_IO', 'qemu-io').strip().split(' ')
+self.commands = [['qemu-img', 'check', '-f', 'qcow2', '$test_img'],
+ ['qemu-img', 'info', '-f', 'qcow2', '$test_img'],
+ ['qemu-io', '$test_img', '-c', 'read $off $len'],
+ ['qemu-io', '$test_img', '-c', 'write $off $len'],
+ ['qemu-io', '$test_img', '-c',
+  'aio_read $off $len'],
+ ['qemu-io', '$test_img', '-c',
+  'aio_write $off $len'],
+ ['qemu-io', '$test_img', '-c', 'flush'],
+ ['qemu-io', '$test_img', '-c',
+  'discard $off $len'],
+ ['qemu-io', '$test_img', '-c',
+  'truncate $off']]
+for fmt in ['raw', 'vmdk', 'vdi', 'cow', 'qcow2', 'file',
+'qed', 'vpc']:
+self.commands.append(
+ ['qemu-img', 'convert', '-f', 'qcow2', '-O', fmt,
+  '$test_img', 'converted_image.' + fmt])
+
+t

[Qemu-devel] [PATCH V3 0/5] tests: Add the image fuzzer with qcow2 support

This patch series introduces the image fuzzer, a tool for stability and
reliability testing.
Its approach is to run large amount of tests in background. During every test a
program (e.g. qemu-img) is called to read or modify an invalid test image.
A test image has valid inner structure defined by its format specification with
some fields having random invalid values.

Patch 1 contains documentation for the image fuzzer, patch 2 is the test runner
and remaining ones relate to the image generator for qcow2 format.

This patch series was created for the 'block-next' branch.

v2 -> v3:
* the runner.py takes binaries under test from environment variables only
* the runner.py accepts JSON with a list of commands under test
* the runner.py has default list of commands under test
* *.ini files for fuzzer configuration were replaced by JSON
* the runner creates backing files of different formats
* the image generator supports backing file name
* the header extensions are generated dependently on available free space
* the specification reflects changes mentioned above
* the specification has the copyright header (based on Eric Blake comments)

Maria Kustova (5):
  docs: Specification for the image fuzzer
  runner: Tool for fuzz tests execution
  fuzz: Fuzzing functions for qcow2 images
  layout: Generator of fuzzed qcow2 images
  package: Public API for image-fuzzer/runner/runner.py

 tests/image-fuzzer/docs/image-fuzzer.txt | 239 
 tests/image-fuzzer/qcow2/__init__.py |   1 +
 tests/image-fuzzer/qcow2/fuzz.py | 336 
 tests/image-fuzzer/qcow2/layout.py   | 359 ++
 tests/image-fuzzer/runner/runner.py  | 370 +++
 5 files changed, 1305 insertions(+)
 create mode 100644 tests/image-fuzzer/docs/image-fuzzer.txt
 create mode 100644 tests/image-fuzzer/qcow2/__init__.py
 create mode 100644 tests/image-fuzzer/qcow2/fuzz.py
 create mode 100644 tests/image-fuzzer/qcow2/layout.py
 create mode 100755 tests/image-fuzzer/runner/runner.py

-- 
1.9.3

[Qemu-devel] [PATCH V3 1/5] docs: Specification for the image fuzzer

'Overall fuzzer requirements' chapter contains the current product vision and
features done and to be done. This chapter is still in progress.

Signed-off-by: Maria Kustova 
---
 tests/image-fuzzer/docs/image-fuzzer.txt | 239 +++
 1 file changed, 239 insertions(+)
 create mode 100644 tests/image-fuzzer/docs/image-fuzzer.txt

diff --git a/tests/image-fuzzer/docs/image-fuzzer.txt 
b/tests/image-fuzzer/docs/image-fuzzer.txt
new file mode 100644
index 000..c1594a0
--- /dev/null
+++ b/tests/image-fuzzer/docs/image-fuzzer.txt
@@ -0,0 +1,239 @@
+# Specification for the fuzz testing tool
+#
+# Copyright (C) 2014 Maria Kustova 
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+
+Image fuzzer
+
+
+Description
+---
+
+The goal of the image fuzzer is to catch crashes of qemu-io/qemu-img
+by providing to them randomly corrupted images.
+Test images are generated from scratch and have valid inner structure with some
+elements, e.g. L1/L2 tables, having random invalid values.
+
+
+Test runner
+---
+
+The test runner generates test images, executes tests utilizing generated
+images, indicates their results and collects all test related artifacts (logs,
+core dumps, test images).
+The test means execution of all available commands under test with the same
+generated test image.
+By default, the test runner generates new tests and executes them until
+keyboard interruption. But if a test seed is specified via the '--seed' runner
+parameter, then only one test with this seed will be executed, after its finish
+the runner will exit.
+
+The runner uses an external image fuzzer to generate test images. An image
+generator should be specified as a mandatory parameter of the test runner.
+Details about interactions between the runner and fuzzers see "Module
+interfaces".
+
+The runner activates generation of core dumps during test executions, but it
+assumes that core dumps will be generated in the current working directory.
+For comprehensive test results, please, set up your test environment
+properly.
+
+Paths to binaries under test (SUTs) qemu-img and qemu-io are retrieved from
+environment variables. If the environment check fails the runner will
+use SUTs installed in system paths.
+qemu-img is required for creation of backing files, so it's mandatory to set
+the related environment variable if it's not installed in the system path.
+For details about environment variables see qemu-iotests/check.
+
+The runner accepts via the '--config' argument JSON objects with lists of
+fields expected to be fuzzed, e.g.
+
+   '[["feature_name_table"], ["header", "l1_table_offset"]]'
+
+Each sublist can be have one or two strings defining image structure elements.
+In the latter case a parent element should be placed on the first position,
+and a field name on the second one.
+
+The runner accepts lists of commands under test as JSON objects via
+the '--command' argument. Each command is a list containing a SUT and all its
+arguments, e.g.
+
+   runner.py -c '[["qemu-io", "$test_img", "-c", "write $off $len"]]'
+ /tmp/test ../qcow2
+
+For variable arguments next aliases can be used:
+- $test_img for a fuzzed img
+- $off for an offset in the fuzzed image
+- $len for a data size
+
+Values for last two aliases will be generated based on the size of the virtal
+disk in the fuzzed image.
+In case when no commands are specified the runner will execute commands from
+the default list:
+- qemu-img check
+- qemu-img info
+- qemu-img convert
+- qemu-io -c read
+- qemu-io -c write
+- qemu-io -c aio_read
+- qemu-io -c aio_write
+- qemu-io -c flush
+- qemu-io -c discard
+- qemu-io -c truncate
+
+
+Qcow2 image generator
+-
+
+The 'qcow2' generator is a Python package providing 'create_image' method as
+a single public API. See details in 'Test runner/image fuzzer' in 'Module
+interfaces'.
+
+Qcow2 contains two submodules: fuzz.py and layout.py.
+
+'fuzz.py' contains all fuzzing functions, one per image field. It's assumed
+that after code analysis every field will have own constraints for its value.
+For now only universal potentially dangerous values are used, e.g. type limits
+for integers or unsafe symbols as '%s' for strings. For bitmasks random amount
+of bits are set to ones. All fuzzed values are checked on non-e

[Qemu-devel] [PATCH V3 5/5] package: Public API for image-fuzzer/runner/runner.py

__init__.py provides the public API required by the test runner

Signed-off-by: Maria Kustova 
---
 tests/image-fuzzer/qcow2/__init__.py | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 tests/image-fuzzer/qcow2/__init__.py

diff --git a/tests/image-fuzzer/qcow2/__init__.py 
b/tests/image-fuzzer/qcow2/__init__.py
new file mode 100644
index 000..e2ebe19
--- /dev/null
+++ b/tests/image-fuzzer/qcow2/__init__.py
@@ -0,0 +1 @@
+from layout import create_image
-- 
1.9.3

[Qemu-devel] [PATCH V3 4/5] layout: Generator of fuzzed qcow2 images

The layout submodule of the qcow2 package creates a random valid image,
randomly selects some amount of its fields, fuzzes them and write the fuzzed
image to the file. Fuzzing process can be controlled by external configuration.

Now only a header, header extensions and a backing file name are generated.

Signed-off-by: Maria Kustova 
---
 tests/image-fuzzer/qcow2/layout.py | 359 +
 1 file changed, 359 insertions(+)
 create mode 100644 tests/image-fuzzer/qcow2/layout.py

diff --git a/tests/image-fuzzer/qcow2/layout.py 
b/tests/image-fuzzer/qcow2/layout.py
new file mode 100644
index 000..f475639
--- /dev/null
+++ b/tests/image-fuzzer/qcow2/layout.py
@@ -0,0 +1,359 @@
+# Generator of fuzzed qcow2 images
+#
+# Copyright (C) 2014 Maria Kustova 
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import random
+import struct
+import fuzz
+
+MAX_IMAGE_SIZE = 10*2**20
+# Standard sizes
+UINT32_S = 4
+UINT64_S = 8
+
+
+class Field(object):
+"""Atomic image element (field)
+
+The class represents an image field as quadruple of a data format
+of value necessary for its packing to binary form, an offset from
+the beginning of the image, a value and a name.
+
+The field can be iterated as a list [format, offset, value]
+"""
+__slots__ = ('fmt', 'offset', 'value', 'name')
+
+def __init__(self, fmt, offset, val, name):
+self.fmt = fmt
+self.offset = offset
+self.value = val
+self.name = name
+
+def __iter__(self):
+return (x for x in [self.fmt, self.offset, self.value])
+
+def __repr__(self):
+return "Field(fmt='%s', offset=%d, value=%s, name=%s)" % \
+(self.fmt, self.offset, str(self.value), self.name)
+
+
+class FieldsList(object):
+"""List of fields
+
+The class allows access to a field in the list by its name and joins
+several list in one via in-place addition
+"""
+def __init__(self, meta_data=None):
+if meta_data is None:
+self.data = []
+else:
+self.data = [Field(f[0], f[1], f[2], f[3])
+ for f in meta_data]
+
+def __getitem__(self, name):
+return [x for x in self.data if x.name == name]
+
+def __iter__(self):
+return (x for x in self.data)
+
+def __iadd__(self, other):
+self.data += other.data
+return self
+
+def __len__(self):
+return len(self.data)
+
+
+class Image(object):
+""" Qcow2 image object
+
+This class allows to create qcow2 images with random valid structures and
+values, fuzz them via external qcow2.fuzz module and write the result to
+a file
+"""
+@staticmethod
+def _size_params():
+"""Generate a random image size aligned to a random correct
+cluster size
+"""
+cluster_bits = random.randrange(9, 21)
+cluster_size = 1 << cluster_bits
+img_size = random.randrange(0, MAX_IMAGE_SIZE + 1,
+cluster_size)
+return [cluster_bits, img_size]
+
+@staticmethod
+def _header(cluster_bits, img_size, backing_file_name=None):
+"""Generate a random valid header"""
+meta_header = [
+['>4s', 0, "QFI\xfb", 'magic'],
+['>I', 4, random.randint(2, 3), 'version'],
+['>Q', 8, 0, 'backing_file_offset'],
+['>I', 16, 0, 'backing_file_size'],
+['>I', 20, cluster_bits, 'cluster_bits'],
+['>Q', 24, img_size, 'size'],
+['>I', 32, 0, 'crypt_method'],
+['>I', 36, 0, 'l1_size'],
+['>Q', 40, 0, 'l1_table_offset'],
+['>Q', 48, 0, 'refcount_table_offset'],
+['>I', 56, 0, 'refcount_table_clusters'],
+['>I', 60, 0, 'nb_snapshots'],
+['>Q', 64, 0, 'snapshots_offset'],
+['>Q', 72, 0, 'incompatible_features'],
+['>Q', 80, 0, 'compatible_features'],
+['>Q', 88, 0, 'autoclear_features'],
+# Only refcount_order = 4 is supported by current (07.2014)
+# implementaation of QEMU
+['>I', 96, 4, 'refcount_order'],
+['>I', 100, 0, 'header_length']
+]
+v_header = FieldsList(meta_header)
+
+if v_header['version'][0].value == 2:
+v_header['header_length'][0].value = 72
+

[Qemu-devel] [PATCH V3 3/5] fuzz: Fuzzing functions for qcow2 images