[Qemu-devel] [PATCH] quorum: don't share qiov

2015-01-30 Thread Wen Congyang
If the child touches qiov->iov, it will cause unexpected results.

Signed-off-by: Wen Congyang 
---
 block/quorum.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index cdc026c..ef0c1e9 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -165,6 +165,10 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
 qemu_vfree(acb->qcrs[i].buf);
 qemu_iovec_destroy(&acb->qcrs[i].qiov);
 }
+} else {
+for (i = 0; i <= acb->child_iter; i++) {
+qemu_iovec_destroy(&acb->qcrs[i].qiov);
+}
 }
 
 g_free(acb->qcrs);
@@ -709,8 +713,12 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
   cb, opaque);
 int i;
 
+acb->child_iter = s->num_children - 1;
 for (i = 0; i < s->num_children; i++) {
-acb->qcrs[i].aiocb = bdrv_aio_writev(s->bs[i], sector_num, qiov,
+qemu_iovec_init(&acb->qcrs[i].qiov, acb->qiov->niov);
+qemu_iovec_concat(&acb->qcrs[i].qiov, acb->qiov, 0, acb->qiov->size);
+acb->qcrs[i].aiocb = bdrv_aio_writev(s->bs[i], sector_num,
+ &acb->qcrs[i].qiov,
  nb_sectors, &quorum_aio_cb,
  &acb->qcrs[i]);
 }
-- 
2.1.0



Re: [Qemu-devel] -device xen-platform crashes

2015-01-30 Thread Markus Armbruster
Stefano Stabellini  writes:

> On Thu, 29 Jan 2015, Markus Armbruster wrote:
>> Stefano Stabellini  writes:
>> 
>> > On Thu, 29 Jan 2015, Markus Armbruster wrote:
>> >> Reproducer: qemu -nodefaults -S -display none -device xen-platform
>> >> 
>> >> Yes, xen-platform makes no sense without Xen, but it shouldn't crash.
>> >
>> > Is it just a matter of doing the following?
>> >
>> > diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
>> > index 28b324a..40ae1f3 100644
>> > --- a/hw/i386/xen/xen_platform.c
>> > +++ b/hw/i386/xen/xen_platform.c
>> > @@ -177,6 +177,10 @@ static void platform_fixed_ioport_writeb(void 
>> > *opaque, uint32_t addr, uint32_t v
>> >  {
>> >  PCIXenPlatformState *s = opaque;
>> >  
>> > +if (!xen_enabled()) {
>> > +return;
>> > +}
>> > +
>> >  switch (addr) {
>> >  case 0: /* Platform flags */ {
>> >  hvmmem_type_t mem_type = (val & PFFLAG_ROM_LOCK) ?
>> 
>> Fixes the crash for me.
>> 
>> Should Xen-only devices fail to realize when !xen_enabled()?
>  
> Accelerators are configured after device registration unfortunately.

Realization happens even later, doesn't it?

With this patch:

@@ -389,6 +393,11 @@ static int xen_platform_initfn(PCIDevice *dev)
 PCIXenPlatformState *d = XEN_PLATFORM(dev);
 uint8_t *pci_conf;

+if (!xen_enabled()) {
+error_report("Nope");
+return -1;
+}
+
 pci_conf = dev->config;

 pci_set_word(pci_conf + PCI_COMMAND, PCI_COMMAND_IO | 
PCI_COMMAND_MEMORY);

I get:

$ upstream-qemu -nodefaults -S -display none -device xen-platform
upstream-qemu: -device xen-platform: Nope
upstream-qemu: -device xen-platform: Device initialization failed.
upstream-qemu: -device xen-platform: Device 'xen-platform' could not be 
initialized
[Exit 1 ]

Note that error_report() is a bit problematic in init methods.  The best
solution is to convert the device to realize (requires my "pci: Partial
conversion to realize" series) and use error_setg().



Re: [Qemu-devel] [RFC PATCH v1 07/13] spapr: Start all the threads of CPU core when core is hotplugged

2015-01-30 Thread Bharata B Rao
On Thu, Jan 29, 2015 at 12:36:55PM +1100, David Gibson wrote:
> On Thu, Jan 08, 2015 at 11:40:14AM +0530, Bharata B Rao wrote:
> > PowerPC kernel adds or removes CPUs in core granularity and hence
> > onlines/offlines all the SMT threads of a core during hot plug/unplug.
> > Support this notion by starting all SMT threads of a core when a core
> > is hotplugged.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  hw/ppc/spapr.c | 25 +
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index a293a59..4347471 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1376,6 +1376,8 @@ static void spapr_drc_reset(void *opaque)
> >  }
> >  }
> >  
> > +static const char *current_cpu_model;
> 
> More new global variables?  Please don't.

Sure, I should be able to get the model name from the type name
of the object as Igor suggested earlier.

Regards,
Bharata.




Re: [Qemu-devel] [RFC PATCH v1 09/13] spapr: CPU hot unplug support

2015-01-30 Thread Bharata B Rao
On Thu, Jan 29, 2015 at 12:39:58PM +1100, David Gibson wrote:
> On Thu, Jan 08, 2015 at 11:40:16AM +0530, Bharata B Rao wrote:
> > Support hot removal of CPU for sPAPR guests.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  hw/ppc/spapr.c | 43 +++
> >  1 file changed, 43 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 4347471..ec793b1 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1908,6 +1908,22 @@ static void spapr_cpu_hotplug_add(DeviceState *dev, 
> > CPUState *cs)
> >  drck->attach(drc, dev, fdt, offset, false);
> >  }
> >  
> > +static void spapr_cpu_release(DeviceState *dev, void *opaque)
> > +{
> > +/* Release vCPU */
> 
> Um.. should this actually do something?

Actual vCPU removal code in the next patch, but as you commented on the
next patch, I will clear generic and ppc parts into different patches
next time.

Regards,
Bharata.




Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()

2015-01-30 Thread Gonglei
On 2015/1/30 15:46, Markus Armbruster wrote:

> Gonglei  writes:
> 
>> On 2015/1/30 0:03, Alexander Graf wrote:
>>
>>>
>>>
>>> On 29.01.15 14:29, arei.gong...@huawei.com wrote:
 From: Gonglei 

 If boot order is invaild or is set failed,
 exit qemu.

 Signed-off-by: Gonglei 
>>>
>>> Do we really want to kill the machine only because the boot device
>>> string doesn't validate?
>>>
>>
>> Not all of the situation. If people want to change boot order by qmp/hmp
>> command, it just report an error, please see do_boot_set(). But if the boot
>> order is set in qemu command line, it will exit qemu if the boot device 
>> string
>> is invalidate, as this patch's situation, which follow the original 
>> processing
>> way (commit ef3adf68).
> 
> I think Alex isn't concerned about the monitor command, but what happens
> when boot order "once" is reset to "order" on system reset.
> 
> -boot errors should have been detected during command line processing
> (strongly preferred) or initial startup (acceptable).  Detecting

Yes, and it had done it just like that, please see main() of vl.c. So, actually
it wouldn't fail in the check of restore_boot_order function's calling.
The only possible fails will happen to call boot_set_handler(). Take
x86 pc machine example, set_boot_dev() callback  may return errors.

> configuration errors during operation is nasty.  In cases where we can't
> avoid it (and I'm not sure this is one), we need to consider very
> carefully whether the error should be fatal.

Indeed, maybe we only need to set boot order failed  and report
an error message in this scenario, do you agree?

Regards,
-Gonglei




Re: [Qemu-devel] [RFC PATCH v1 10/13] cpus, spapr: reclaim allocated vCPU objects

2015-01-30 Thread Bharata B Rao
On Thu, Jan 29, 2015 at 12:48:39PM +1100, David Gibson wrote:
> On Thu, Jan 08, 2015 at 11:40:17AM +0530, Bharata B Rao wrote:
> > From: Gu Zheng 
> 
> This needs a commit message, it's not at all clear from the 1-line 
> description.

Borrowed patch, but I should have put in a description.

> > diff --git a/kvm-all.c b/kvm-all.c
> > index 18cc6b4..6f543ce 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -71,6 +71,12 @@ typedef struct KVMSlot
> >  
> >  typedef struct kvm_dirty_log KVMDirtyLog;
> >  
> > +struct KVMParkedVcpu {
> > +unsigned long vcpu_id;
> > +int kvm_fd;
> > +QLIST_ENTRY(KVMParkedVcpu) node;
> > +};
> > +
> >  struct KVMState
> >  {
> >  AccelState parent_obj;
> > @@ -107,6 +113,7 @@ struct KVMState
> >  QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) 
> > msi_hashtab[KVM_MSI_HASHTAB_SIZE];
> >  bool direct_msi;
> >  #endif
> > +QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus;
> >  };
> >  
> >  #define TYPE_KVM_ACCEL ACCEL_CLASS_NAME("kvm")
> > @@ -247,6 +254,53 @@ static int kvm_set_user_memory_region(KVMState *s, 
> > KVMSlot *slot)
> >  return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
> >  }
> >  
> > +int kvm_destroy_vcpu(CPUState *cpu)
> > +{
> > +KVMState *s = kvm_state;
> > +long mmap_size;
> > +struct KVMParkedVcpu *vcpu = NULL;
> > +int ret = 0;
> > +
> > +DPRINTF("kvm_destroy_vcpu\n");
> > +
> > +mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
> > +if (mmap_size < 0) {
> > +ret = mmap_size;
> > +DPRINTF("kvm_destroy_vcpu failed\n");
> > +goto err;
> > +}
> > +
> > +ret = munmap(cpu->kvm_run, mmap_size);
> > +if (ret < 0) {
> > +goto err;
> > +}
> > +
> > +vcpu = g_malloc0(sizeof(*vcpu));
> > +vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
> > +vcpu->kvm_fd = cpu->kvm_fd;
> > +QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
> 
> What's the reason for parking vcpus rather than removing / recreating
> them at the kvm level?

Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
correctly, certain work arounds have to be employed to allow reuse of
vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
proposed workaround is to park the vcpu fd in userspace during cpu unplug
and reuse it later during next hotplug.

Some details can be found here:
KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

Regards,
Bharata.




Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Markus Armbruster
"Dr. David Alan Gilbert"  writes:

> * Eric Blake (ebl...@redhat.com) wrote:
>> On 01/29/2015 09:28 AM, Dr. David Alan Gilbert wrote:
>> > * Eric Blake (ebl...@redhat.com) wrote:
>> > > On 01/29/2015 08:54 AM, Dr. David Alan Gilbert wrote:
>> > > >> The idea of a QMP command to trigger incoming migration looks
>> > > >> reasonable.  We can probably use a qapi union for a nicer syntax,
>> > > >> something like:
>> > > >>
>> > > >> {"execute": "migrate-incoming", "arguments": {
>> > > >>   "type": "tcp", "port": 44 } }
>> > > >> vs.
>> > > >> {"execute": "migrate-incoming", "arguments": {
>> > > >>   "type": "fd", "fd": 0 } }
>> > > >> vs.
>> > > >> {"execute": "migrate-incoming", "arguments": {
>> > > >>   "type": "exec", "command": [ "cat", "/path/to/file" ] } }
>> > > >>
>> > > >> and so forth.
>> > > >
>> > > > Compared to just taking a URI argument that Dan suggested, that's 
>> > > > quite a
>> > > > bit of rework to do the reworking of each transport which is pretty
>> > > > trivial.
>> > >
>> > > Yes, but getting the interface right means that adding future extensions
>> > > will be easier, with less string parsing hacks.

We have a general rule for QMP: no syntax embedded in string arguments,
use JSON.

>> > I guess so, but I still have to maintain the -incoming string interface
>> > and an HMP equivalent of whatever we come up with here.

The HMP equivalent may or may not be needed.  If we decide we want it,
reusing the command line's parser there probably makes more sense than
inventing yet another syntax.

>> > So what would the .args_type look like in qmp-commands.hx;
>> > something like this?
>> >
>> >   .args-type = "type:s,port:-i,host:-s,command:-s"
>>
>> No, it would be more like the blockdev-add interface, where one command
>> accepts a dictionary object containing a union of valid values, where
>> the set of valid values is determined by the discriminator field.
>> .args_type = "options:q".

Note that blockdev-add has wraps its arguments rather inelegantly: it
takes a single argument 'options' of union type 'BlockdevOptions'.
Because of that, you have to write

"arguments": { "options" : { ... } }

instead of just

"arguments": { ... }

I'd love to get that cleaned up, but Kevin is already worrying about
backwards compatibility.  He has a point in theory, because we neglected
to mark blockdev-add as unstable.  I have a point in practice, because
blockdev-add hasn't been usable for real work (as some of our poor users
discovered the hard way) due to numerous restrictions we're still busy
lifting.  Anyway, I digressed, back to the topic at hand.

> What causes the parser to generate a 'BlockdevOptions' as opposed to any
> standard options type for the parameter of qmp_blockdev_add?

qmp-commands.hx is a relict.  It's still around because we still haven't
completed the conversion of QMP to QAPI.  We suck :)

The QAPI schema defines QMP commands.  The marshalling / unmarshalling
code mapping between QMP the text protocol and actual QMP command
handlers written in C gets generated from the schema.

docs/qapi-code-gen.txt explains this.  A much improved version is
sitting in Eric's queue[*].  Perhaps Eric can provide a pointer to his
current version.

qmp-commands.hx duplicates some of the schema information, partly in
dumbed down form, and adds a bit more.


[*] Sorry Eric, could not resist poking you again :)



Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Markus Armbruster
Eric Blake  writes:

> On 01/29/2015 08:15 AM, Daniel P. Berrange wrote:
>
>>> ./bin/qemu-system-x86_64 -nographic -nodefaults -qmp-command
>> {"execute": "migrate-set-capabilities",
>> "arguments":{"capabilities":[{"capability":"xbzrle","state":true}]}}'
>> -qmp-command '{"execute": "query-migrate-capabilities"}' -incoming
>> tcp::444
>> 
>> I'm unclear how we'd easily deal with the response from commands
>> invoked this way, to get replies and/or errors. Also, it might
>> be the case that we need to conditionally run certain commands
>> depending on the result of earlier commands.
>> 
>> Wouldn't it make more sense to simply add a 'migrate_incoming' QMP
>> command, and stop using -incoming altogether, so we just have normal
>> QMP access ?
>
> I agree - shoving more into the command line is the wrong direction;
> full power is better obtained by making the command line be the minimal
> needed to get into QMP control, and let QMP kick things off.

Seconded.

>> 
>> eg,
>> 
>> #  qemu-system-x86_64 device args...  -S
>> (qmp) arbitrary QMP commands ..
>> (qmp) {"execute":"migrate-incoming", "arguments": { "uri": "tcp::44" }}
>
> The idea of a QMP command to trigger incoming migration looks
> reasonable.  We can probably use a qapi union for a nicer syntax,
> something like:
>
> {"execute": "migrate-incoming", "arguments": {
>   "type": "tcp", "port": 44 } }
> vs.
> {"execute": "migrate-incoming", "arguments": {
>   "type": "fd", "fd": 0 } }
> vs.
> {"execute": "migrate-incoming", "arguments": {
>   "type": "exec", "command": [ "cat", "/path/to/file" ] } }
>
> and so forth.

Yup.



[Qemu-devel] [PATCH 6/7] block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes

2015-01-30 Thread Denis V. Lunev
This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported.
Unfortunately, FALLOC_FL_ZERO_RANGE is supported on really modern systems
and only for a couple of filesystems. FALLOC_FL_PUNCH_HOLE is much more
mature.

The sequence of 2 operations FALLOC_FL_PUNCH_HOLE and 0 is necessary due
to the following reasons:
- FALLOC_FL_PUNCH_HOLE creates a hole in the file, the file becomes
  sparse. In order to retain original functionality we must allocate
  disk space afterwards. This is done using fallocate(0) call
- fallocate(0) without preceeding FALLOC_FL_PUNCH_HOLE will do nothing
  if called above already allocated areas of the file, i.e. the content
  will not be zeroed

This should increase the performance a bit for not-so-modern kernels.

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 1c88ad8..7b42f37 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -967,6 +967,25 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+if (s->has_discard && s->has_fallocate) {
+int ret = do_fallocate(s->fd,
+   FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0) {
+ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0 || ret != -ENOTSUP) {
+return ret;
+}
+s->has_fallocate = false;
+} else if (ret != -ENOTSUP) {
+return ret;
+} else {
+s->has_discard = false;
+}
+}
+#endif
+
 #ifdef CONFIG_FALLOCATE
 if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
 int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
-- 
1.9.1




[Qemu-devel] [PATCH 4/7] block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes

2015-01-30 Thread Denis V. Lunev
This efficiently writes zeroes on Linux if the kernel is capable enough.
FALLOC_FL_ZERO_RANGE correctly handles all cases, including and not
including file expansion.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 15 +--
 configure | 19 +++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index d3910fd..5a777e7 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,7 +60,7 @@
 #define FS_NOCOW_FL 0x0080 /* Do not cow file */
 #endif
 #endif
-#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
 #include 
 #endif
 #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -902,7 +902,7 @@ static int translate_err(int err)
 return err;
 }
 
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
 {
 do {
@@ -954,6 +954,17 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE_ZERO_RANGE
+if (s->has_write_zeroes) {
+int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0 || ret != -ENOTSUP) {
+return ret;
+}
+s->has_write_zeroes = false;
+}
+#endif
+
 return -ENOTSUP;
 }
 
diff --git a/configure b/configure
index f185dd0..e00e03a 100755
--- a/configure
+++ b/configure
@@ -3335,6 +3335,22 @@ if compile_prog "" "" ; then
   fallocate_punch_hole=yes
 fi
 
+# check that fallocate supports range zeroing inside the file
+fallocate_zero_range=no
+cat > $TMPC << EOF
+#include 
+#include 
+
+int main(void)
+{
+fallocate(0, FALLOC_FL_ZERO_RANGE, 0, 0);
+return 0;
+}
+EOF
+if compile_prog "" "" ; then
+  fallocate_zero_range=yes
+fi
+
 # check for posix_fallocate
 posix_fallocate=no
 cat > $TMPC << EOF
@@ -4567,6 +4583,9 @@ fi
 if test "$fallocate_punch_hole" = "yes" ; then
   echo "CONFIG_FALLOCATE_PUNCH_HOLE=y" >> $config_host_mak
 fi
+if test "$fallocate_zero_range" = "yes" ; then
+  echo "CONFIG_FALLOCATE_ZERO_RANGE=y" >> $config_host_mak
+fi
 if test "$posix_fallocate" = "yes" ; then
   echo "CONFIG_POSIX_FALLOCATE=y" >> $config_host_mak
 fi
-- 
1.9.1




[Qemu-devel] [PATCH 2/7] block/raw-posix: create do_fallocate helper

2015-01-30 Thread Denis V. Lunev
The pattern
do {
if (fallocate(s->fd, mode, offset, len) == 0) {
return 0;
}
} while (errno == EINTR);
ret = translate_err(-errno);
will be commonly useful in next patches. Create helper for it.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 24300d0..2aa268a 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -902,6 +902,18 @@ static int translate_err(int err)
 return err;
 }
 
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)
+static int do_fallocate(int fd, int mode, off_t offset, off_t len)
+{
+do {
+if (fallocate(fd, mode, offset, len) == 0) {
+return 0;
+}
+} while (errno == EINTR);
+return translate_err(-errno);
+}
+#endif
+
 static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
@@ -965,14 +977,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 #endif
 
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
-do {
-if (fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
-  aiocb->aio_offset, aiocb->aio_nbytes) == 0) {
-return 0;
-}
-} while (errno == EINTR);
-
-ret = -errno;
+ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+   aiocb->aio_offset, aiocb->aio_nbytes);
 #endif
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH 5/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-30 Thread Denis V. Lunev
There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply call
fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Before the patch do_fallocate was used when either
CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
Now the story is different. CONFIG_FALLOCATE is defined when Linux
fallocate is defined, posix_fallocate is completely different story
(CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
thus we are on the safe side.

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 5a777e7..1c88ad8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -147,6 +147,7 @@ typedef struct BDRVRawState {
 bool has_discard:1;
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
+bool has_fallocate;
 bool needs_alignment;
 } BDRVRawState;
 
@@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 }
 if (S_ISREG(st.st_mode)) {
 s->discard_zeroes = true;
+s->has_fallocate = true;
 }
 if (S_ISBLK(st.st_mode)) {
 #ifdef BLKDISCARDZEROES
@@ -902,7 +904,7 @@ static int translate_err(int err)
 return err;
 }
 
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || 
defined(CONFIG_FALLOCATE_ZERO_RANGE)
+#ifdef CONFIG_FALLOCATE
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
 {
 do {
@@ -965,6 +967,16 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 }
 #endif
 
+#ifdef CONFIG_FALLOCATE
+if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
+int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == 0 || ret != -ENOTSUP) {
+return ret;
+}
+s->has_fallocate = false;
+}
+#endif
+
 return -ENOTSUP;
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH v6 0/7] eliminate data write in bdrv_write_zeroes on Linux in raw-posix.c

2015-01-30 Thread Denis V. Lunev
I have performed several tests with non-aligned fallocate calls and
in all cases (with non-aligned fallocates) Linux performs fine, i.e.
areas are zeroed correctly. Checks were made on
Linux 3.16.0-28-generic #38-Ubuntu SMP

This should seriously increase performance of bdrv_write_zeroes

Changes from v5:
- changed order between patches 5/6
- check has_fallocate in FALLOC_FL_PUNCH_HOLE branch of the code
- minor comment tweaks as pointed out by Max Reitz

Changes from v4:
- comments to patches are improved by Max Reitz suggestions
- replaced 'return ret' with 'return -ENOTSUP' handle_aiocb_write_zeroes
  The idea is that we can reach that point only if original ret was
  equal to -ENOTSUP
- added has_fallocate flag to indicate that fallocate is working for
  given BDS
- checked file length with bdrv_getlength

Changes from v3:
- dropped original patch 1, equivalent stuff was merged already
- reordered patches as suggested by Fam
- fixes spelling errors as suggested by Fam
- fixed not initialized value in handle_aiocb_write_zeroes
- fixed wrong error processing from do_fallocate(FALLOC_FL_ZERO_RANGE)

Changes from v2:
- added Peter Lieven to CC
- added CONFIG_FALLOCATE check to call do_fallocate in patch 7
- dropped patch 1 as NACK-ed
- added processing of very large data areas in bdrv_co_write_zeroes (new
  patch 1)
- set bl.max_write_zeroes to INT_MAX in raw-posix.c for regular files
  (new patch 8)

Signed-off-by: Denis V. Lunev 
CC: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 




[Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-01-30 Thread Denis V. Lunev
fallocate() works fine and could handle properly with arbitrary size
requests. There is no sense to reduce the amount of space to fallocate.
The bigger is the size, the better is the performance as the amount of
journal updates is reduced.

The patch changes behavior for both generic filesystem and XFS codepaths,
which are different in handle_aiocb_write_zeroes. The implementation
of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
thus the change is fine for both ways.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7b42f37..933c778 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 }
 }
 
+static void raw_probe_max_write_zeroes(BlockDriverState *bs)
+{
+BDRVRawState *s = bs->opaque;
+struct stat st;
+
+if (fstat(s->fd, &st) < 0) {
+return; /* no problem, keep default value */
+}
+if (!S_ISREG(st.st_mode) || !s->discard_zeroes) {
+return;
+}
+bs->bl.max_write_zeroes = INT_MAX;
+}
+
 static void raw_parse_flags(int bdrv_flags, int *open_flags)
 {
 assert(open_flags != NULL);
@@ -600,6 +614,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 /* Fail already reopen_prepare() if we can't get a working O_DIRECT
  * alignment with the new fd. */
 if (raw_s->fd != -1) {
+raw_probe_max_write_zeroes(state->bs);
 raw_probe_alignment(state->bs, raw_s->fd, &local_err);
 if (local_err) {
 qemu_close(raw_s->fd);
@@ -653,6 +668,8 @@ static void raw_refresh_limits(BlockDriverState *bs, Error 
**errp)
 
 raw_probe_alignment(bs, s->fd, errp);
 bs->bl.opt_mem_alignment = s->buf_align;
+
+raw_probe_max_write_zeroes(bs);
 }
 
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
-- 
1.9.1




[Qemu-devel] [PATCH 1/7] block/raw-posix: create translate_err helper to merge errno values

2015-01-30 Thread Denis V. Lunev
actually the code
if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
ret == -ENOTTY) {
ret = -ENOTSUP;
}
is present twice and will be added a couple more times. Create helper
for this.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e51293a..24300d0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -893,6 +893,15 @@ static int xfs_discard(BDRVRawState *s, int64_t offset, 
uint64_t bytes)
 }
 #endif
 
+static int translate_err(int err)
+{
+if (err == -ENODEV || err == -ENOSYS || err == -EOPNOTSUPP ||
+err == -ENOTTY) {
+err = -ENOTSUP;
+}
+return err;
+}
+
 static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
@@ -921,10 +930,9 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 #endif
 }
 
-if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
-ret == -ENOTTY) {
+ret = translate_err(ret);
+if (ret == -ENOTSUP) {
 s->has_write_zeroes = false;
-ret = -ENOTSUP;
 }
 return ret;
 }
@@ -968,10 +976,9 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 #endif
 }
 
-if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
-ret == -ENOTTY) {
+ret = translate_err(ret);
+if (ret == -ENOTSUP) {
 s->has_discard = false;
-ret = -ENOTSUP;
 }
 return ret;
 }
-- 
1.9.1




[Qemu-devel] [PATCH 3/7] block/raw-posix: refactor handle_aiocb_write_zeroes a bit

2015-01-30 Thread Denis V. Lunev
move code dealing with a block device to a separate function. This will
allow to implement additional processing for ordinary files.

Please note, that xfs_code has been moved before checking for
s->has_write_zeroes as xfs_write_zeroes does not touch this flag inside.
This makes code a bit more consistent.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
 block/raw-posix.c | 46 +++---
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2aa268a..d3910fd 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -914,41 +914,49 @@ static int do_fallocate(int fd, int mode, off_t offset, 
off_t len)
 }
 #endif
 
-static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
+static ssize_t handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
 {
-int ret = -EOPNOTSUPP;
+int ret = -ENOTSUP;
 BDRVRawState *s = aiocb->bs->opaque;
 
-if (s->has_write_zeroes == 0) {
+if (!s->has_write_zeroes) {
 return -ENOTSUP;
 }
 
-if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
 #ifdef BLKZEROOUT
-do {
-uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
-if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
-return 0;
-}
-} while (errno == EINTR);
-
-ret = -errno;
-#endif
-} else {
-#ifdef CONFIG_XFS
-if (s->is_xfs) {
-return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
+do {
+uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
+if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
+return 0;
 }
+} while (errno == EINTR);
+
+ret = translate_err(-errno);
 #endif
-}
 
-ret = translate_err(ret);
 if (ret == -ENOTSUP) {
 s->has_write_zeroes = false;
 }
 return ret;
 }
 
+static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
+{
+BDRVRawState *s = aiocb->bs->opaque;
+
+if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
+return handle_aiocb_write_zeroes_block(aiocb);
+}
+
+#ifdef CONFIG_XFS
+if (s->is_xfs) {
+return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
+}
+#endif
+
+return -ENOTSUP;
+}
+
 static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
-- 
1.9.1




Re: [Qemu-devel] [PATCH 1/7] block/raw-posix: create translate_err helper to merge errno values

2015-01-30 Thread Peter Lieven
Am 30.01.2015 um 09:42 schrieb Denis V. Lunev:
> actually the code
> if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
> ret == -ENOTTY) {
> ret = -ENOTSUP;
> }
> is present twice and will be added a couple more times. Create helper
> for this.
>
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Max Reitz 
> CC: Kevin Wolf 
> CC: Stefan Hajnoczi 
> CC: Peter Lieven 
> CC: Fam Zheng 
> ---
>  block/raw-posix.c | 19 +--
>  1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index e51293a..24300d0 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -893,6 +893,15 @@ static int xfs_discard(BDRVRawState *s, int64_t offset, 
> uint64_t bytes)
>  }
>  #endif
>  
> +static int translate_err(int err)
> +{
> +if (err == -ENODEV || err == -ENOSYS || err == -EOPNOTSUPP ||
> +err == -ENOTTY) {
> +err = -ENOTSUP;
> +}
> +return err;
> +}
> +
>  static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>  {
>  int ret = -EOPNOTSUPP;
> @@ -921,10 +930,9 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
> *aiocb)
>  #endif
>  }
>  
> -if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
> -ret == -ENOTTY) {
> +ret = translate_err(ret);
> +if (ret == -ENOTSUP) {
>  s->has_write_zeroes = false;
> -ret = -ENOTSUP;
>  }
>  return ret;
>  }
> @@ -968,10 +976,9 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData 
> *aiocb)
>  #endif
>  }
>  
> -if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
> -ret == -ENOTTY) {
> +ret = translate_err(ret);
> +if (ret == -ENOTSUP) {
>  s->has_discard = false;
> -ret = -ENOTSUP;
>  }
>  return ret;
>  }

Reviewed-by: Peter Lieven 



Re: [Qemu-devel] [PATCH 2/7] block/raw-posix: create do_fallocate helper

2015-01-30 Thread Peter Lieven
Am 30.01.2015 um 09:42 schrieb Denis V. Lunev:
> The pattern
> do {
> if (fallocate(s->fd, mode, offset, len) == 0) {
> return 0;
> }
> } while (errno == EINTR);
> ret = translate_err(-errno);
> will be commonly useful in next patches. Create helper for it.
>
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Max Reitz 
> CC: Kevin Wolf 
> CC: Stefan Hajnoczi 
> CC: Peter Lieven 
> CC: Fam Zheng 
> ---
>  block/raw-posix.c | 22 ++
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 24300d0..2aa268a 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -902,6 +902,18 @@ static int translate_err(int err)
>  return err;
>  }
>  
> +#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)
> +static int do_fallocate(int fd, int mode, off_t offset, off_t len)
> +{
> +do {
> +if (fallocate(fd, mode, offset, len) == 0) {
> +return 0;
> +}
> +} while (errno == EINTR);
> +return translate_err(-errno);
> +}
> +#endif
> +
>  static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>  {
>  int ret = -EOPNOTSUPP;
> @@ -965,14 +977,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData 
> *aiocb)
>  #endif
>  
>  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> -do {
> -if (fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> -  aiocb->aio_offset, aiocb->aio_nbytes) == 0) {
> -return 0;
> -}
> -} while (errno == EINTR);
> -
> -ret = -errno;
> +ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> +   aiocb->aio_offset, aiocb->aio_nbytes);

You also introduce the conversion via translate_err here. Is that ok?

Peter




Re: [Qemu-devel] [PATCH 2/7] block/raw-posix: create do_fallocate helper

2015-01-30 Thread Denis V. Lunev

On 30/01/15 11:47, Peter Lieven wrote:

Am 30.01.2015 um 09:42 schrieb Denis V. Lunev:

The pattern
 do {
 if (fallocate(s->fd, mode, offset, len) == 0) {
 return 0;
 }
 } while (errno == EINTR);
 ret = translate_err(-errno);
will be commonly useful in next patches. Create helper for it.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Peter Lieven 
CC: Fam Zheng 
---
  block/raw-posix.c | 22 ++
  1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 24300d0..2aa268a 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -902,6 +902,18 @@ static int translate_err(int err)
  return err;
  }
  
+#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)

+static int do_fallocate(int fd, int mode, off_t offset, off_t len)
+{
+do {
+if (fallocate(fd, mode, offset, len) == 0) {
+return 0;
+}
+} while (errno == EINTR);
+return translate_err(-errno);
+}
+#endif
+
  static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
  {
  int ret = -EOPNOTSUPP;
@@ -965,14 +977,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
  #endif
  
  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE

-do {
-if (fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
-  aiocb->aio_offset, aiocb->aio_nbytes) == 0) {
-return 0;
-}
-} while (errno == EINTR);
-
-ret = -errno;
+ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+   aiocb->aio_offset, aiocb->aio_nbytes);

You also introduce the conversion via translate_err here. Is that ok?

Peter


yes, this is redundant but harmless



Re: [Qemu-devel] [PATCH 2/7] block/raw-posix: create do_fallocate helper

2015-01-30 Thread Peter Lieven
Am 30.01.2015 um 09:49 schrieb Denis V. Lunev:
> On 30/01/15 11:47, Peter Lieven wrote:
>> Am 30.01.2015 um 09:42 schrieb Denis V. Lunev:
>>> The pattern
>>>  do {
>>>  if (fallocate(s->fd, mode, offset, len) == 0) {
>>>  return 0;
>>>  }
>>>  } while (errno == EINTR);
>>>  ret = translate_err(-errno);
>>> will be commonly useful in next patches. Create helper for it.
>>>
>>> Signed-off-by: Denis V. Lunev 
>>> Reviewed-by: Max Reitz 
>>> CC: Kevin Wolf 
>>> CC: Stefan Hajnoczi 
>>> CC: Peter Lieven 
>>> CC: Fam Zheng 
>>> ---
>>>   block/raw-posix.c | 22 ++
>>>   1 file changed, 14 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>>> index 24300d0..2aa268a 100644
>>> --- a/block/raw-posix.c
>>> +++ b/block/raw-posix.c
>>> @@ -902,6 +902,18 @@ static int translate_err(int err)
>>>   return err;
>>>   }
>>>   +#if defined(CONFIG_FALLOCATE_PUNCH_HOLE)
>>> +static int do_fallocate(int fd, int mode, off_t offset, off_t len)
>>> +{
>>> +do {
>>> +if (fallocate(fd, mode, offset, len) == 0) {
>>> +return 0;
>>> +}
>>> +} while (errno == EINTR);
>>> +return translate_err(-errno);
>>> +}
>>> +#endif
>>> +
>>>   static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>>>   {
>>>   int ret = -EOPNOTSUPP;
>>> @@ -965,14 +977,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData 
>>> *aiocb)
>>>   #endif
>>> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>>> -do {
>>> -if (fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | 
>>> FALLOC_FL_KEEP_SIZE,
>>> -  aiocb->aio_offset, aiocb->aio_nbytes) == 0) {
>>> -return 0;
>>> -}
>>> -} while (errno == EINTR);
>>> -
>>> -ret = -errno;
>>> +ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | 
>>> FALLOC_FL_KEEP_SIZE,
>>> +   aiocb->aio_offset, aiocb->aio_nbytes);
>> You also introduce the conversion via translate_err here. Is that ok?
>>
>> Peter
>>
> yes, this is redundant but harmless

Ok, then please add:

Reviewed-by: Peter Lieven 






Re: [Qemu-devel] [RFC PATCH v1 05/13] spapr: Support ibm, lrdr-capacity device tree property

2015-01-30 Thread Bharata B Rao
On Thu, Jan 22, 2015 at 03:55:40PM -0600, Michael Roth wrote:
> Quoting Bharata B Rao (2015-01-08 00:10:12)
> > Add support for ibm,lrdr-capacity since this is needed by the guest
> > kernel to know about the possible hot-pluggable CPUs and Memory.
> > 
> > Define minimum hotpluggable memory size as 256MB and start storing maximum
> > possible memory for the guest in sPAPREnvironment.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  hw/ppc/spapr.c |  3 ++-
> >  hw/ppc/spapr_rtas.c| 28 ++--
> >  include/hw/ppc/spapr.h |  6 --
> >  3 files changed, 32 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index f49b0fa..515d770 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -775,7 +775,7 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
> >  }
> > 
> >  /* RTAS */
> > -ret = spapr_rtas_device_tree_setup(fdt, rtas_addr, rtas_size);
> > +ret = spapr_rtas_device_tree_setup(spapr, fdt, rtas_addr, rtas_size);
> >  if (ret < 0) {
> >  fprintf(stderr, "Couldn't set up RTAS device tree properties\n");
> >  }
> > @@ -1473,6 +1473,7 @@ static void ppc_spapr_init(MachineState *machine)
> > 
> >  /* allocate RAM */
> >  spapr->ram_limit = ram_size;
> > +spapr->maxram_limit = machine->maxram_size;
> >  memory_region_allocate_system_memory(ram, NULL, "ppc_spapr.ram",
> >   spapr->ram_limit);
> >  memory_region_add_subregion(sysmem, 0, ram);
> > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > index d847f45..e8a0f21 100644
> > --- a/hw/ppc/spapr_rtas.c
> > +++ b/hw/ppc/spapr_rtas.c
> > @@ -29,6 +29,7 @@
> >  #include "sysemu/char.h"
> >  #include "hw/qdev.h"
> >  #include "sysemu/device_tree.h"
> > +#include "sysemu/cpus.h"
> > 
> >  #include "hw/ppc/spapr.h"
> >  #include "hw/ppc/spapr_vio.h"
> > @@ -551,11 +552,12 @@ void spapr_rtas_register(int token, const char *name, 
> > spapr_rtas_fn fn)
> >  rtas_table[token].fn = fn;
> >  }
> > 
> > -int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
> > - hwaddr rtas_size)
> > +int spapr_rtas_device_tree_setup(sPAPREnvironment *spapr, void *fdt,
> > + hwaddr rtas_addr, hwaddr rtas_size)
> >  {
> >  int ret;
> >  int i;
> > +uint32_t lrdr_capacity[5];
> > 
> >  ret = fdt_add_mem_rsv(fdt, rtas_addr, rtas_size);
> >  if (ret < 0) {
> > @@ -604,6 +606,28 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr 
> > rtas_addr,
> >  }
> > 
> >  }
> > +
> > +ret = qemu_fdt_setprop_cell(fdt, "/rtas", "#address-cells", 0x2);
> > +if (ret < 0) {
> > +fprintf(stderr, "Couldn't add #address-cells rtas property\n");
> > +}
> > +
> > +ret = qemu_fdt_setprop_cell(fdt, "/rtas", "#size-cells", 0x2);
> > +if (ret < 0) {
> > +fprintf(stderr, "Couldn't add #size-cells rtas property\n");
> > +}
> > +
> > +lrdr_capacity[0] = cpu_to_be32(spapr->maxram_limit >> 32);
> > +lrdr_capacity[1] = cpu_to_be32(spapr->maxram_limit & 0x);
> > +lrdr_capacity[2] = 0;
> > +lrdr_capacity[3] = cpu_to_be32(SPAPR_MIN_MEMORY_BLOCK_SIZE);
> > +lrdr_capacity[4] = cpu_to_be32(max_cpus/smp_threads);
> > +ret = qemu_fdt_setprop(fdt, "/rtas", "ibm,lrdr-capacity", 
> > lrdr_capacity,
> > + sizeof(lrdr_capacity));
> > +if (ret < 0) {
> > +fprintf(stderr, "Couldn't add ibm,lrdr-capacity rtas property\n");
> > +}
> > +
> 
> The property seems simple enough, but would be worthwhile to add a description
> of how/when it's used in a new section of docs/specs/ppc-spapr-hotplug.txt to
> keep the documentation complete

Sure.

> 
> >  return 0;
> >  }
> > 
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 831db6b..ae8b4e1 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -18,6 +18,7 @@ typedef struct sPAPREnvironment {
> >  XICSState *icp;
> > 
> >  hwaddr ram_limit;
> > +hwaddr maxram_limit;
> >  void *htab;
> >  uint32_t htab_shift;
> >  hwaddr rma_size;
> > @@ -444,8 +445,8 @@ void spapr_rtas_register(int token, const char *name, 
> > spapr_rtas_fn fn);
> >  target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >   uint32_t token, uint32_t nargs, target_ulong 
> > args,
> >   uint32_t nret, target_ulong rets);
> > -int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
> > - hwaddr rtas_size);
> > +int spapr_rtas_device_tree_setup(sPAPREnvironment *spapr, void *fdt,
> > + hwaddr rtas_addr, hwaddr rtas_size);
> > 
> >  #define SPAPR_TCE_PAGE_SHIFT   12
> >  #define SPAPR_TCE_PAGE_SIZE(1ULL << SPAPR_TCE_PAGE_SHIFT)
> > @@ -479,6 +480,7 @@ struct sPAPRTCETable {
> >  };
> > 
> >  #define TIMEBASE_

Re: [Qemu-devel] [PATCH 7/8] qmp: print dirty bitmap

2015-01-30 Thread Vladimir Sementsov-Ogievskiy
is it better to add qmp_query_dirty_bitmap with underlying 
bdrv_query_dirty_bitmap, or to modify (add dirty regions information) 
existing qmp_query_block/qmp_query_dirty_bitmapS?


Best regards,
Vladimir

On 27.01.2015 18:53, Eric Blake wrote:

On 01/13/2015 10:02 AM, Vladimir Sementsov-Ogievskiy wrote:

Adds qmp and hmp commands to print dirty bitmap. This is needed only for
testing persistent dirty bitmap feature.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
+++ b/block.c
@@ -5445,6 +5445,39 @@ int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap)
  return res;
  }
  
+void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap)

+{
+unsigned long a = 0, b = 0;
+
+printf("bitmap '%s'\n", bitmap->name ? bitmap->name : "no name");
+printf("enabled: %s\n", bitmap->enabled ? "true" : "false");
+printf("size: %" PRId64 "\n", bitmap->size);
+printf("granularity: %" PRId64 "\n", bitmap->granularity);
+printf("dirty regions begin:\n");
+++ b/blockdev.c
@@ -2079,6 +2079,19 @@ void qmp_block_dirty_bitmap_add(const char *node_ref, 
const char *name,
  aio_context_release(aio_context);
  }
  
+void qmp_block_dirty_bitmap_print(const char *node_ref, const char *name,

+  Error **errp)
+{
+BdrvDirtyBitmap *bitmap;
+
+bitmap = block_dirty_bitmap_lookup(node_ref, name, NULL, errp);
+if (!bitmap) {
+return;
+}
+
+bdrv_print_dirty_bitmap(bitmap);

Eww.  This assumes that stdout is usable.  But that is not the case for
QMP commands.  It would be better to package up the output in a
structured format and return it to the caller, and let the caller decide
how to print it.


+++ b/qapi/block-core.json
@@ -982,6 +982,9 @@
  {'command': 'block-dirty-bitmap-disable',
'data': 'BlockDirtyBitmap' }
  
+{'command': 'block-dirty-bitmap-print',

+  'data': 'BlockDirtyBitmap' }

Missing documentation, if we even want this command.  As mentioned
above, this should return ALL of the information necessary for the
client to then decide how to print the information, rather than directly
printing information to stdout itself.  And it might be better to name
this command in the 'query-' namespace.






Re: [Qemu-devel] [PATCH RFC 00/10] pci: Partial conversion to realize

2015-01-30 Thread Markus Armbruster
Markus Armbruster  writes:

> Markus Armbruster  writes:
>
>> Andreas Färber  writes:
>>
>>> Hi Markus,
>>>
>>> Am 28.10.2014 um 08:35 schrieb Markus Armbruster:
 While discussing Gonglei's "[PATCH v2 00/19] usb: convert device init
 to realize", Paolo called the PCI conversion job "Gargantuan".  This
 series attempts to crack it into manageable jobs.
>>>
>>> Thanks for giving this a stab! What kept me from diving into the PCI
>>> converstion was that I first invested into qtests for the non-default
>>> PCI devices. How many of the converted devices are actually covered in
>>> qtest?
>>
>> Can't tell offhand, but I can find out.
>
> The vast majority of my conversions are utterly trivial [PATCH 03]:
> change return type to void, rename, put into ->realize instead of
> ->init.  I could enumerate the devices so changed and look for qtests,
> but I feel it's rather pointless busywork.  But if you should insist...
>
> The remaining conversions are all very, very simple:
>
> * PATCH 05: "pcnet"
>
>   Utterly trivial after trivial PATCH 04 changed a helper's return type
>   to void.
>
>   There's pcnet-test.c, which basically tests "-device pcnet doesn't
>   explode right away".
>
> * PATCH 06: "pci-serial", "pci-serial-2x", "pci-serial-4x"
>
>   Two error paths trivially converted from qerror_report_err() to
>   error_propagate().
>
>   Not covered in qtest as far as I can tell.
>
> * PATCH 07: "ich9-ahci"
>
>   One pci_add_capability() replaced by pci_add_capability2().  The
>   former is a wrapper around the latter which additionally passes any
>   error to error_report(), then frees it.  Straightforward conversion of
>   the error path to error_setg().
>
>   Not covered in qtest as far as I can tell.
>
> * PATCH 08: "cirrus-vga"
>
>   One error path trivially converted from error_report() to
>   error_setg().
>
>   There's display-vga-test.c, which tests "-device cirrus-vga doesn't
>   explode right away".
>
> * PATCH 09: "qxl-vga", "qxl"
>
>   Two error paths trivially converted from error_report() to
>   error_setg().
>
>   Not covered in qtest as far as I can tell.
>
> * PATCH 10: "kvm-pci-assign"
>
>   Two error paths trivially converted from qerror_report_err() to
>   error_propagate().
>
>   Not covered in qtest as far as I can tell.
>
> I'm prepared to drop conversions you consider risky without test
> coverage (although I have a hard time seeing risks, considering how
> silly-simple my conversions are).  But I'd really like to get the core
> changes plus some examples in.

This series passes all tests added by my "qtest: Generic PCI device
test" series.  That series covers realizing every PCI device, except for

* blacklisted devices, and

* devices not included in the x86 targets (because the test runs only
  for the x86 targets, just like all of $(check-qtest-pci-y)).

Devices in the above list of not entirely trivial conversions that
aren't covered:

* "kvm-pci-assign" is blacklisted because it requires a suitable host
  device to pass through.

* "qxl" is blacklisted because "-device qxl" crashes.  However,
  "qxl-vga" executes the patched code as well, and *is* covered.

Andreas, if you object to any of my conversions without additional test
coverage, please enumerate the devices whose conversion you want me to
drop.



Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Dr. David Alan Gilbert
* Markus Armbruster (arm...@redhat.com) wrote:
> "Dr. David Alan Gilbert"  writes:
> 
> > * Eric Blake (ebl...@redhat.com) wrote:
> >> On 01/29/2015 09:28 AM, Dr. David Alan Gilbert wrote:
> >> > * Eric Blake (ebl...@redhat.com) wrote:
> >> > > On 01/29/2015 08:54 AM, Dr. David Alan Gilbert wrote:
> >> > > >> The idea of a QMP command to trigger incoming migration looks
> >> > > >> reasonable.  We can probably use a qapi union for a nicer syntax,
> >> > > >> something like:
> >> > > >>
> >> > > >> {"execute": "migrate-incoming", "arguments": {
> >> > > >>   "type": "tcp", "port": 44 } }
> >> > > >> vs.
> >> > > >> {"execute": "migrate-incoming", "arguments": {
> >> > > >>   "type": "fd", "fd": 0 } }
> >> > > >> vs.
> >> > > >> {"execute": "migrate-incoming", "arguments": {
> >> > > >>   "type": "exec", "command": [ "cat", "/path/to/file" ] } }
> >> > > >>
> >> > > >> and so forth.
> >> > > >
> >> > > > Compared to just taking a URI argument that Dan suggested, that's 
> >> > > > quite a
> >> > > > bit of rework to do the reworking of each transport which is pretty
> >> > > > trivial.
> >> > >
> >> > > Yes, but getting the interface right means that adding future 
> >> > > extensions
> >> > > will be easier, with less string parsing hacks.
> 
> We have a general rule for QMP: no syntax embedded in string arguments,
> use JSON.
> 
> >> > I guess so, but I still have to maintain the -incoming string interface
> >> > and an HMP equivalent of whatever we come up with here.
> 
> The HMP equivalent may or may not be needed.  If we decide we want it,

I treat HMP as important as QMP, I don't break it or lose functionality on it.

> reusing the command line's parser there probably makes more sense than
> inventing yet another syntax.
> 
> >> > So what would the .args_type look like in qmp-commands.hx;
> >> > something like this?
> >> >
> >> >   .args-type = "type:s,port:-i,host:-s,command:-s"
> >>
> >> No, it would be more like the blockdev-add interface, where one command
> >> accepts a dictionary object containing a union of valid values, where
> >> the set of valid values is determined by the discriminator field.
> >> .args_type = "options:q".
> 
> Note that blockdev-add has wraps its arguments rather inelegantly: it
> takes a single argument 'options' of union type 'BlockdevOptions'.
> Because of that, you have to write
> 
> "arguments": { "options" : { ... } }
> 
> instead of just
> 
> "arguments": { ... }
> 
> I'd love to get that cleaned up, but Kevin is already worrying about
> backwards compatibility.  He has a point in theory, because we neglected
> to mark blockdev-add as unstable.  I have a point in practice, because
> blockdev-add hasn't been usable for real work (as some of our poor users
> discovered the hard way) due to numerous restrictions we're still busy
> lifting.  Anyway, I digressed, back to the topic at hand.
> 
> > What causes the parser to generate a 'BlockdevOptions' as opposed to any
> > standard options type for the parameter of qmp_blockdev_add?
> 
> qmp-commands.hx is a relict.  It's still around because we still haven't
> completed the conversion of QMP to QAPI.  We suck :)
> 
> The QAPI schema defines QMP commands.  The marshalling / unmarshalling
> code mapping between QMP the text protocol and actual QMP command
> handlers written in C gets generated from the schema.
> 
> docs/qapi-code-gen.txt explains this.  A much improved version is
> sitting in Eric's queue[*].  Perhaps Eric can provide a pointer to his
> current version.
> 
> qmp-commands.hx duplicates some of the schema information, partly in
> dumbed down form, and adds a bit more.

OK, to summarise how I'm hearing things so far:
  a) We want this as a QMP command
  b) With nice structured json arguments
  c) But the QMP parser is broken and the example that Eric wants me
 to follow isn't pretty.

Am I missing something from that? Because at the moment I seem to
be walking into a minefield of QMP parsers to add a simple bit
of migration functionality.

Dave

> 
> 
> [*] Sorry Eric, could not resist poking you again :)
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v2 16/17] ahci: Recompute cur_cmd on migrate post load

2015-01-30 Thread Paolo Bonzini


On 17/12/2014 02:36, John Snow wrote:
> When the AHCI HBA device is migrated, all of the information that
> led to the request being created is stored in the AHCIDevice
> structures, except for pointers into guest data where return
> information needs to be stored.
> 
> The "cur_cmd" field is usually responsible for this.
> 
> To rebuild the cur_cmd pointer post-migration, we can utilize
> the busy_slot index to figure out where the command header
> we are still processing is.
> 
> This allows a machine in a halted state from rerror=stop or
> werror=stop to be migrated and resume operations without issue.
> 
> Signed-off-by: John Snow 

Reviewed-by: Paolo Bonzini 


> ---
>  hw/ide/ahci.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index c153228..8078d3e 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -1373,6 +1373,10 @@ static int ahci_state_post_load(void *opaque, int 
> version_id)
>   */
>  if (ad->busy_slot == -1) {
>  check_cmd(s, i);
> +} else {
> +/* We are in the middle of a command, and may need to access
> + * the command header in guest memory again. */
> +ad->cur_cmd = &((AHCICmdHdr *)ad->lst)[ad->busy_slot];
>  }
>  }
>  
> 



Re: [Qemu-devel] [PATCH v2 17/17] qtest/ide: Test flush / retry for ISA and PCI

2015-01-30 Thread Paolo Bonzini


On 17/12/2014 02:36, John Snow wrote:
> This patch adds tests for werror and rerror functionality
> for the PCI and ISA ide buses.
> 
> Tests for the AHCI device are to be included at a later
> date after requisite patches have been merged upstream
> to support needed functionality by the tests.
> 
> Signed-off-by: Paolo Bonzini 
> Signed-off-by: John Snow 
> ---
>  tests/ide-test.c | 20 +++-
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/ide-test.c b/tests/ide-test.c
> index 29f4039..b28a302 100644
> --- a/tests/ide-test.c
> +++ b/tests/ide-test.c
> @@ -118,7 +118,6 @@ static void ide_test_start(const char *cmdline_fmt, ...)
>  va_end(ap);
>  
>  qtest_start(cmdline);
> -qtest_irq_intercept_in(global_qtest, "ioapic");
>  guest_malloc = pc_alloc_init();
>  
>  g_free(cmdline);
> @@ -388,6 +387,7 @@ static void test_bmdma_setup(void)
>  "-drive file=%s,if=ide,serial=%s,cache=writeback,format=raw "
>  "-global ide-hd.ver=%s",
>  tmp_path, "testdisk", "version");
> +qtest_irq_intercept_in(global_qtest, "ioapic");
>  }
>  
>  static void test_bmdma_teardown(void)
> @@ -516,7 +516,7 @@ static void prepare_blkdebug_script(const char *debug_fn, 
> const char *event)
>  g_assert(ret == 0);
>  }
>  
> -static void test_retry_flush(void)
> +static void test_retry_flush(const char *machine)
>  {
>  uint8_t data;
>  const char *s;
> @@ -580,6 +580,16 @@ static void test_flush_nodev(void)
>  ide_test_quit();
>  }
>  
> +static void test_pci_retry_flush(const char *machine)
> +{
> +test_retry_flush("pc");
> +}
> +
> +static void test_isa_retry_flush(const char *machine)
> +{
> +test_retry_flush("isapc");
> +}
> +
>  int main(int argc, char **argv)
>  {
>  const char *arch = qtest_get_arch();
> @@ -617,9 +627,9 @@ int main(int argc, char **argv)
>  qtest_add_func("/ide/bmdma/teardown", test_bmdma_teardown);
>  
>  qtest_add_func("/ide/flush", test_flush);
> -qtest_add_func("/ide/flush_nodev", test_flush_nodev);
> -
> -qtest_add_func("/ide/retry/flush", test_retry_flush);
> +qtest_add_func("/ide/flush/nodev", test_flush_nodev);
> +qtest_add_func("/ide/flush/retry_pci", test_pci_retry_flush);
> +qtest_add_func("/ide/flush/retry_isa", test_isa_retry_flush);
>  
>  ret = g_test_run();
>  
> 

Quite different from my original version, so:

Reviewed-by: Paolo Bonzini 



Re: [Qemu-devel] [PATCH v2 00/17] ide: rerror and werror support for IDE and AHCI

2015-01-30 Thread Paolo Bonzini


On 30/01/2015 01:44, John Snow wrote:
> Post-holiday bump that this is sitting out there, awaiting love.
> If this gets merged, we should be able to enable Q35 migration soon,
> which would be nice.

Not sure how valuable my opinion is as the author of around 85% of the
series...  So I reviewed the rest.

Paolo



Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Dr. David Alan Gilbert
* Eric Blake (ebl...@redhat.com) wrote:
> On 01/29/2015 01:21 PM, Dr. David Alan Gilbert wrote:
> > * Eric Blake (ebl...@redhat.com) wrote:
> >> On 01/29/2015 09:28 AM, Dr. David Alan Gilbert wrote:
> >>>
> >>> So what would the .args_type look like in qmp-commands.hx; something like 
> >>> this?
> >>>
> >>>   .args-type = "type:s,port:-i,host:-s,command:-s"
> >>
> >> No, it would be more like the blockdev-add interface, where one command
> >> accepts a dictionary object containing a union of valid values, where
> >> the set of valid values is determined by the discriminator field.
> >> .args_type = "options:q".
> > 
> > What causes the parser to generate a 'BlockdevOptions' as opposed to any
> > standard options type for the parameter of qmp_blockdev_add?
> 
> Kevin Wolf has the most experience here, as he was the one that figured
> out how to correlate command line and QMP as part of adding blockdev-add.

OK, this is getting more complicated than I'd expected; how about a simpler
suggestion.

The current suggestion is:
   Modify  -incoming to take   'pause' as an argument
   Add migrate-incoming command that takes parsed URI

New suggestion:
   Modify -incoming to take a pause: prefix (e.g. -incoming pause:tcp:host:port 
)
   Add migrate-incoming-start command (takes no arguments).

It seems simpler.

Dave

> 
> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v2 14/17] ahci: Migrate IDEStatus

2015-01-30 Thread Paolo Bonzini


On 17/12/2014 02:36, John Snow wrote:
> Amazingly, we weren't doing this before.
> 
> Make sure we migrate the IDEState structure that belongs to
> the AHCIDevice.IDEBus structure during migrations.
> 
> No version numbering changes because AHCI is not officially
> migratable (and we can all see with good reason why) so we
> do not impact any official builds by altering the stream and
> leaving it at version 1.
> 
> This fixes the rerror=stop/werror=stop test case where we wish
> to migrate a halted job. Previously, the error code would not
> migrate, so even if the job completed successfully, AHCI would
> report an error because it would still have the placeholder
> error code from initialization time.
> 
> Signed-off-by: John Snow 

Reviewed-by: Paolo Bonzini 

> ---
>  hw/ide/ahci.c | 1 +
>  hw/ide/internal.h | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index bc6d5ce..3f4fc77 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -1321,6 +1321,7 @@ static const VMStateDescription vmstate_ahci_device = {
>  .version_id = 1,
>  .fields = (VMStateField[]) {
>  VMSTATE_IDE_BUS(port, AHCIDevice),
> +VMSTATE_IDE_DRIVE(port.ifs[0], AHCIDevice),
>  VMSTATE_UINT32(port_state, AHCIDevice),
>  VMSTATE_UINT32(finished, AHCIDevice),
>  VMSTATE_UINT32(port_regs.lst_addr, AHCIDevice),
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 0beba43..c278dea 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -524,6 +524,9 @@ extern const VMStateDescription vmstate_ide_drive;
>  #define VMSTATE_IDE_DRIVES(_field, _state) \
>  VMSTATE_STRUCT_ARRAY(_field, _state, 2, 3, vmstate_ide_drive, IDEState)
>  
> +#define VMSTATE_IDE_DRIVE(_field, _state) \
> +VMSTATE_STRUCT(_field, _state, 1, vmstate_ide_drive, IDEState)
> +
>  void ide_bus_reset(IDEBus *bus);
>  int64_t ide_get_sector(IDEState *s);
>  void ide_set_sector(IDEState *s, int64_t sector_num);
> 



Re: [Qemu-devel] [PATCH] target-arm: Squash input denormals in FRECPS and FRSQRTS

2015-01-30 Thread Laurent Desnogues
On Thu, Jan 29, 2015 at 8:31 PM, Peter Maydell  wrote:
> The helper functions for FRECPS and FRSQRTS have special case
> handling that includes checks for zero inputs, so squash input
> denormals if necessary before those checks. This fixes incorrect
> output when the FPCR DZ bit is set to enable squashing of input
> denormals.
>
> Signed-off-by: Peter Maydell 

Tested-by: Laurent Desnogues 

Thanks,

Laurent

> ---
> A quick eyeball of helper-a64.c suggests that these are the only
> other insns we needed to fix, and a risu test of these insns
> confirms that (a) they're buggy and (b) this patch fixes them.
> I haven't done an exhaustive coverage test of the whole instruction
> set with the DZ bit set, though...
>
>  target-arm/helper-a64.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index ebd9247..8aa40e9 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -229,6 +229,9 @@ float32 HELPER(recpsf_f32)(float32 a, float32 b, void 
> *fpstp)
>  {
>  float_status *fpst = fpstp;
>
> +a = float32_squash_input_denormal(a, fpst);
> +b = float32_squash_input_denormal(b, fpst);
> +
>  a = float32_chs(a);
>  if ((float32_is_infinity(a) && float32_is_zero(b)) ||
>  (float32_is_infinity(b) && float32_is_zero(a))) {
> @@ -241,6 +244,9 @@ float64 HELPER(recpsf_f64)(float64 a, float64 b, void 
> *fpstp)
>  {
>  float_status *fpst = fpstp;
>
> +a = float64_squash_input_denormal(a, fpst);
> +b = float64_squash_input_denormal(b, fpst);
> +
>  a = float64_chs(a);
>  if ((float64_is_infinity(a) && float64_is_zero(b)) ||
>  (float64_is_infinity(b) && float64_is_zero(a))) {
> @@ -253,6 +259,9 @@ float32 HELPER(rsqrtsf_f32)(float32 a, float32 b, void 
> *fpstp)
>  {
>  float_status *fpst = fpstp;
>
> +a = float32_squash_input_denormal(a, fpst);
> +b = float32_squash_input_denormal(b, fpst);
> +
>  a = float32_chs(a);
>  if ((float32_is_infinity(a) && float32_is_zero(b)) ||
>  (float32_is_infinity(b) && float32_is_zero(a))) {
> @@ -265,6 +274,9 @@ float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, void 
> *fpstp)
>  {
>  float_status *fpst = fpstp;
>
> +a = float64_squash_input_denormal(a, fpst);
> +b = float64_squash_input_denormal(b, fpst);
> +
>  a = float64_chs(a);
>  if ((float64_is_infinity(a) && float64_is_zero(b)) ||
>  (float64_is_infinity(b) && float64_is_zero(a))) {
> --
> 1.9.1
>
>



Re: [Qemu-devel] [RFC 0/1] Incoming migration vs early monitor commands

2015-01-30 Thread Dr. David Alan Gilbert
* Paolo Bonzini (pbonz...@redhat.com) wrote:
> 
> 
> On 29/01/2015 16:06, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > The attached patch allows you to execute QMP commands from the command
> > line prior to -incoming or loadvm.
> 
> What about doing the opposite: if you specify -S, and thus the runstate
> is PRELAUNCH, allow starting incoming migration from the monitor through
> a new QMP command or a new option to the "migrate" command?

Yep, see the other half of the thread.

Dave
> 
> Paolo
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Markus Armbruster
"Dr. David Alan Gilbert"  writes:

> * Markus Armbruster (arm...@redhat.com) wrote:
>> "Dr. David Alan Gilbert"  writes:
>> 
>> > * Eric Blake (ebl...@redhat.com) wrote:
>> >> On 01/29/2015 09:28 AM, Dr. David Alan Gilbert wrote:
>> >> > * Eric Blake (ebl...@redhat.com) wrote:
>> >> > > On 01/29/2015 08:54 AM, Dr. David Alan Gilbert wrote:
>> >> > > >> The idea of a QMP command to trigger incoming migration looks
>> >> > > >> reasonable.  We can probably use a qapi union for a nicer syntax,
>> >> > > >> something like:
>> >> > > >>
>> >> > > >> {"execute": "migrate-incoming", "arguments": {
>> >> > > >>   "type": "tcp", "port": 44 } }
>> >> > > >> vs.
>> >> > > >> {"execute": "migrate-incoming", "arguments": {
>> >> > > >>   "type": "fd", "fd": 0 } }
>> >> > > >> vs.
>> >> > > >> {"execute": "migrate-incoming", "arguments": {
>> >> > > >>   "type": "exec", "command": [ "cat", "/path/to/file" ] } }
>> >> > > >>
>> >> > > >> and so forth.
>> >> > > >
>> >> > > > Compared to just taking a URI argument that Dan suggested,
>> >> > > > that's quite a
>> >> > > > bit of rework to do the reworking of each transport which is pretty
>> >> > > > trivial.
>> >> > >
>> >> > > Yes, but getting the interface right means that adding future
>> >> > > extensions
>> >> > > will be easier, with less string parsing hacks.
>> 
>> We have a general rule for QMP: no syntax embedded in string arguments,
>> use JSON.
>> 
>> >> > I guess so, but I still have to maintain the -incoming string interface
>> >> > and an HMP equivalent of whatever we come up with here.
>> 
>> The HMP equivalent may or may not be needed.  If we decide we want it,
>
> I treat HMP as important as QMP, I don't break it or lose functionality on it.
>
>> reusing the command line's parser there probably makes more sense than
>> inventing yet another syntax.
>> 
>> >> > So what would the .args_type look like in qmp-commands.hx;
>> >> > something like this?
>> >> >
>> >> >   .args-type = "type:s,port:-i,host:-s,command:-s"
>> >>
>> >> No, it would be more like the blockdev-add interface, where one command
>> >> accepts a dictionary object containing a union of valid values, where
>> >> the set of valid values is determined by the discriminator field.
>> >> .args_type = "options:q".
>> 
>> Note that blockdev-add has wraps its arguments rather inelegantly: it
>> takes a single argument 'options' of union type 'BlockdevOptions'.
>> Because of that, you have to write
>> 
>> "arguments": { "options" : { ... } }
>> 
>> instead of just
>> 
>> "arguments": { ... }
>> 
>> I'd love to get that cleaned up, but Kevin is already worrying about
>> backwards compatibility.  He has a point in theory, because we neglected
>> to mark blockdev-add as unstable.  I have a point in practice, because
>> blockdev-add hasn't been usable for real work (as some of our poor users
>> discovered the hard way) due to numerous restrictions we're still busy
>> lifting.  Anyway, I digressed, back to the topic at hand.
>> 
>> > What causes the parser to generate a 'BlockdevOptions' as opposed to any
>> > standard options type for the parameter of qmp_blockdev_add?
>> 
>> qmp-commands.hx is a relict.  It's still around because we still haven't
>> completed the conversion of QMP to QAPI.  We suck :)
>> 
>> The QAPI schema defines QMP commands.  The marshalling / unmarshalling
>> code mapping between QMP the text protocol and actual QMP command
>> handlers written in C gets generated from the schema.
>> 
>> docs/qapi-code-gen.txt explains this.  A much improved version is
>> sitting in Eric's queue[*].  Perhaps Eric can provide a pointer to his
>> current version.
>> 
>> qmp-commands.hx duplicates some of the schema information, partly in
>> dumbed down form, and adds a bit more.
>
> OK, to summarise how I'm hearing things so far:
>   a) We want this as a QMP command
>   b) With nice structured json arguments
>   c) But the QMP parser is broken and the example that Eric wants me
>  to follow isn't pretty.
>
> Am I missing something from that? Because at the moment I seem to
> be walking into a minefield of QMP parsers to add a simple bit
> of migration functionality.

There is just one QMP parser.  I wouldn't call it "broken".  It parses
fine.  It's use by blockdev-add is inelegant.

Would you like me to prototype a "migrate-incoming" command for you that
does nothing?



Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Paolo Bonzini


On 30/01/2015 10:38, Dr. David Alan Gilbert wrote:
> * Eric Blake (ebl...@redhat.com) wrote:
>> On 01/29/2015 01:21 PM, Dr. David Alan Gilbert wrote:
>>> * Eric Blake (ebl...@redhat.com) wrote:
 On 01/29/2015 09:28 AM, Dr. David Alan Gilbert wrote:
>
> So what would the .args_type look like in qmp-commands.hx; something like 
> this?
>
>   .args-type = "type:s,port:-i,host:-s,command:-s"

 No, it would be more like the blockdev-add interface, where one command
 accepts a dictionary object containing a union of valid values, where
 the set of valid values is determined by the discriminator field.
 .args_type = "options:q".
>>>
>>> What causes the parser to generate a 'BlockdevOptions' as opposed to any
>>> standard options type for the parameter of qmp_blockdev_add?
>>
>> Kevin Wolf has the most experience here, as he was the one that figured
>> out how to correlate command line and QMP as part of adding blockdev-add.
> 
> OK, this is getting more complicated than I'd expected; how about a simpler
> suggestion.
> 
> The current suggestion is:
>Modify  -incoming to take   'pause' as an argument

-S is just the same.

>Add migrate-incoming command that takes parsed URI
> 
> New suggestion:
>Modify -incoming to take a pause: prefix (e.g. -incoming 
> pause:tcp:host:port )
>Add migrate-incoming-start command (takes no arguments).

Another suggestion:

Add incoming argument to the existing URI-based command migrate.

Later on, if ever, add start-migration command that takes structured
options, make HMP and QMP migrate wrappers for start-migration.

Paolo

> It seems simpler.
> 
> Dave
> 
>>
>> -- 
>> Eric Blake   eblake redhat com+1-919-301-3266
>> Libvirt virtualization library http://libvirt.org
>>
> 
> 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 



Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Daniel P. Berrange
On Fri, Jan 30, 2015 at 09:38:50AM +, Dr. David Alan Gilbert wrote:
> * Eric Blake (ebl...@redhat.com) wrote:
> > On 01/29/2015 01:21 PM, Dr. David Alan Gilbert wrote:
> > > * Eric Blake (ebl...@redhat.com) wrote:
> > >> On 01/29/2015 09:28 AM, Dr. David Alan Gilbert wrote:
> > >>>
> > >>> So what would the .args_type look like in qmp-commands.hx; something 
> > >>> like this?
> > >>>
> > >>>   .args-type = "type:s,port:-i,host:-s,command:-s"
> > >>
> > >> No, it would be more like the blockdev-add interface, where one command
> > >> accepts a dictionary object containing a union of valid values, where
> > >> the set of valid values is determined by the discriminator field.
> > >> .args_type = "options:q".
> > > 
> > > What causes the parser to generate a 'BlockdevOptions' as opposed to any
> > > standard options type for the parameter of qmp_blockdev_add?
> > 
> > Kevin Wolf has the most experience here, as he was the one that figured
> > out how to correlate command line and QMP as part of adding blockdev-add.
> 
> OK, this is getting more complicated than I'd expected; how about a simpler
> suggestion.
> 
> The current suggestion is:
>Modify  -incoming to take   'pause' as an argument
>Add migrate-incoming command that takes parsed URI

FWIW, the existing source side  'migrate' command already expects URI string,
so personally I'm fine with the dest side 'migrate-incoming' also taking
a URI string. I don't think we need to explode the URI into its pieces. URI
is a well defined data format we should feel free to use when appropriate.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Paolo Bonzini


On 29/01/2015 16:22, Dr. David Alan Gilbert wrote:
> I'm a bit worried about whether starting an incoming migrate afterwards is
> different in any subtle way.  I can see there are a handful of devices that
> have 'runstate_check(RUN_STATE_INMIGRATE)' calls in, and thus I'm not sure
> that starting a paused VM and then flipping to RUN_STATE_INMIGRATE would
> be quite the same.

Ouch, you're right.  Especially BDRV_O_INCOMING is scary.

Paolo



Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Daniel P. Berrange
On Thu, Jan 29, 2015 at 08:05:50PM +, Peter Maydell wrote:
> On 29 January 2015 at 19:47, Laszlo Ersek  wrote:
> > On 01/29/15 20:12, Laszlo Ersek wrote:
> >> If the guest kernel changed its "assignment strategy" at some point, but
> >> earlier it used to match the comment (and the code), then whichever way
> >> we shape the comment will be wrong for the other kernel strategy. :) So,
> >> in that case this code is probably best left undisturbed.
> >>
> >> I'll try to dig out some kernel commit for more evidence.
> >
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=70161ff3
> 
> Thanks for digging that up -- I was half-thinking we might
> just have got it wrong two years ago when we wrote the code :-)
> 
> We should probably update the comment to say (a) what we were
> trying to do (b) that we don't want to change it now because
> it would break existing setups.

While it is clear there is no solution that works correctly with all
kernels, I hate to think that we're going to stick with an ordering
that is clearly wrong for modern kernels, forever going forward. The
aarch64 world is only just starting out, so on balance I think we
should optimize for the future rather than the past, since that gives
right behaviour for orders of magnitude more people in the long term.

Also can we start using a versioned machine type for ARM, and make the
new machine type have the correct ordering for current kernels.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Dr. David Alan Gilbert
* Paolo Bonzini (pbonz...@redhat.com) wrote:
> 
> 
> On 30/01/2015 10:38, Dr. David Alan Gilbert wrote:
> > * Eric Blake (ebl...@redhat.com) wrote:
> >> On 01/29/2015 01:21 PM, Dr. David Alan Gilbert wrote:
> >>> * Eric Blake (ebl...@redhat.com) wrote:
>  On 01/29/2015 09:28 AM, Dr. David Alan Gilbert wrote:
> >
> > So what would the .args_type look like in qmp-commands.hx; something 
> > like this?
> >
> >   .args-type = "type:s,port:-i,host:-s,command:-s"
> 
>  No, it would be more like the blockdev-add interface, where one command
>  accepts a dictionary object containing a union of valid values, where
>  the set of valid values is determined by the discriminator field.
>  .args_type = "options:q".
> >>>
> >>> What causes the parser to generate a 'BlockdevOptions' as opposed to any
> >>> standard options type for the parameter of qmp_blockdev_add?
> >>
> >> Kevin Wolf has the most experience here, as he was the one that figured
> >> out how to correlate command line and QMP as part of adding blockdev-add.
> > 
> > OK, this is getting more complicated than I'd expected; how about a simpler
> > suggestion.
> > 
> > The current suggestion is:
> >Modify  -incoming to take   'pause' as an argument
> 
> -S is just the same.

No it's not; it goes into a different state; see previous part of thread;
it's the difference between RUN_STATE_PAUSED and RUN_STATE_INMIGRATE.

> >Add migrate-incoming command that takes parsed URI
> > 
> > New suggestion:
> >Modify -incoming to take a pause: prefix (e.g. -incoming 
> > pause:tcp:host:port )
> >Add migrate-incoming-start command (takes no arguments).
> 
> Another suggestion:
> 
> Add incoming argument to the existing URI-based command migrate.

Yep, that's also possible if others are happy with it.

> Later on, if ever, add start-migration command that takes structured
> options, make HMP and QMP migrate wrappers for start-migration.

Dave

> Paolo
> 
> > It seems simpler.
> > 
> > Dave
> > 
> >>
> >> -- 
> >> Eric Blake   eblake redhat com+1-919-301-3266
> >> Libvirt virtualization library http://libvirt.org
> >>
> > 
> > 
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > 
> > 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Dr. David Alan Gilbert
* Paolo Bonzini (pbonz...@redhat.com) wrote:
> 
> 
> On 29/01/2015 16:22, Dr. David Alan Gilbert wrote:
> > I'm a bit worried about whether starting an incoming migrate afterwards is
> > different in any subtle way.  I can see there are a handful of devices that
> > have 'runstate_check(RUN_STATE_INMIGRATE)' calls in, and thus I'm not sure
> > that starting a paused VM and then flipping to RUN_STATE_INMIGRATE would
> > be quite the same.
> 
> Ouch, you're right.  Especially BDRV_O_INCOMING is scary.

Yes; I've not got a clue what it's effect is, but given it's glued into
the block code I'm sure that breaking it will be subtle and nasty.  I'm
also reasonably sure it's probably going to do something different in the loadvm
case which is probably unintended.

Dave

> Paolo
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Laszlo Ersek
On 01/30/15 10:54, Daniel P. Berrange wrote:
> On Thu, Jan 29, 2015 at 08:05:50PM +, Peter Maydell wrote:
>> On 29 January 2015 at 19:47, Laszlo Ersek  wrote:
>>> On 01/29/15 20:12, Laszlo Ersek wrote:
 If the guest kernel changed its "assignment strategy" at some point, but
 earlier it used to match the comment (and the code), then whichever way
 we shape the comment will be wrong for the other kernel strategy. :) So,
 in that case this code is probably best left undisturbed.

 I'll try to dig out some kernel commit for more evidence.
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=70161ff3
>>
>> Thanks for digging that up -- I was half-thinking we might
>> just have got it wrong two years ago when we wrote the code :-)
>>
>> We should probably update the comment to say (a) what we were
>> trying to do (b) that we don't want to change it now because
>> it would break existing setups.
> 
> While it is clear there is no solution that works correctly with all
> kernels, I hate to think that we're going to stick with an ordering
> that is clearly wrong for modern kernels, forever going forward. The
> aarch64 world is only just starting out, so on balance I think we
> should optimize for the future rather than the past, since that gives
> right behaviour for orders of magnitude more people in the long term.
> 
> Also can we start using a versioned machine type for ARM, and make the
> new machine type have the correct ordering for current kernels.

Wei recently posted a patch that introduced versioning for machvirt. (I
didn't see the patch, only know about it.) If Peter agrees, I guess both
Wei's patch and mine could be applied. (Although, mine should be
reworked so that it affect only the new machtype.)

Thanks
Laszlo



Re: [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT

2015-01-30 Thread Paul Durrant
> -Original Message-
> From: Don Slutz [mailto:dsl...@verizon.com]
> Sent: 29 January 2015 19:41
> To: Paul Durrant; Don Slutz; qemu-devel@nongnu.org; Stefano Stabellini
> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini;
> xen-devel
> Subject: New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
> 
> >> On 01/29/15 07:09, Paul Durrant wrote:
> ...
> >> Given that IIRC you are using a new dedicated IOREQ type, I
> >> think there needs to be something that allows an emulator to
> >> register for this IOREQ type. How about adding a new type to
> >> those defined for HVMOP_map_io_range_to_ioreq_server for your
> >> case? (In your case the start and end values in the hypercall
> >> would be meaningless but it could be used to steer
> >> hvm_select_ioreq_server() into sending all emulation requests or
> >> your new type to QEMU.
> >>
> 
> This is an interesting idea.  Will need to spend more time on it.
> 
> 
> >> Actually such a mechanism could be used
> >> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
> >> patches, they are going nowhere. Upstream QEMU (as default) used
> >> to ignore them anyway, which is why I didn't bother with such a
> >> patch to Xen before but since you now need one maybe you could
> >> add that too?
> >>
> 
> I think it would not be that hard.  Will look into adding it.
> 
> 
> Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
> It looks like to me that if a vpcu (like 0) needs to wait for the
> 2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
> and return as if the ioreq is done.  What am I missing?
> 

hvm_do_resume() walks the ioreq server list looking at the IOREQ state in the 
shared page of each server in turn. If no IOREQ was sent to that server then 
then state will be IOREQ_NONE and hvm_wait_for_io() will return 1 immediately 
so the outer loop in hvm_do_resume() will move on to the next server. If a 
state of IOREQ_READY or IOREQ_INPROCESS is found then the vcpu blocks on the 
relevant event channel until the state transitions to IORESP_READY. The IOREQ 
is then completed and the loop moves on to the next server.
Normally an IOREQ would only be directed at one server and indeed IOREQs that 
are issued for emulation requests (i.e. when io_state != HVMIO_none) fall into 
this category but there is one case of a broadcast IOREQ, which is the 
INVALIDATE IOREQ (sent to tell emulators to invalidate any mappings of guest 
memory they may have cached) and that is why hvm_do_resume() has to iterate 
over all servers.

Does that make sense?

  Paul

>-Don Slutz




Re: [Qemu-devel] [PATCH v11 00/13] block: Incremental backup series

2015-01-30 Thread Vladimir Sementsov-Ogievskiy

About added functions for BdrvDirtyBitmap:

some functions has needless BlockDriverState* parameter, and others - 
doesn't:


with needless *bs:
bdrv_dirty_bitmap_make_anon
bdrv_dirty_bitmap_granularity
bdrv_clear_dirty_bitmap

without *bs:
bdrv_dirty_bitmap_enabled
bdrv_disable_dirty_bitmap
bdrv_enable_dirty_bitmap

I think, to be consistent, onlly one of two ways should be chosen:
1) all bdrv_dirty_bitmap_* functions has "BlockDriverState* bs" 
parameter, maybe needless
2) only bdrv_dirty_bitmap_* function that need "BlockDriverState* bs" 
parameter has it


Personally, I prefer the second one.

Best regards,
Vladimir

On 12.01.2015 19:30, John Snow wrote:

Welcome to version 11. I hope you are enjoying our regular newsletter.

This patchset enables the in-memory part of the incremental backup
feature. A patchset by Vladimir Sementsov-Ogievskiy enables the
migration of in-memory dirty bitmaps, and a future patchset will
enable the storage and retrieval of dirty bitmaps to and from permanent
storage.

Enough changes have been made that most Reviewed-By lines from
previous iterations have been removed. (Sorry!)

This series was originally authored by Fam Zheng;
his cover letter is included below.

~John Snow

=

This is the in memory part of the incremental backup feature.

With the added commands, we can create a bitmap on a block
backend, from which point of time all the writes are tracked by
the bitmap, marking sectors as dirty. Later, we call drive-backup
and pass the bitmap to it, to do an incremental backup.

See the last patch which adds some tests for this use case.

Fam

=

For convenience, this patchset is available on github:
 https://github.com/jnsnow/qemu/commits/dbm-backup

v11:

  - Instead of copying BdrvDirtyBitmaps and keeping a pointer to the
object we were copied from, we instead "freeze" a bitmap in-place
without copying it. On success, we thaw and delete the bitmap.
On failure, we merge the bitmap with a "successor," which is an
anonymous bitmap attached as a child that records writes for us
for the duration of the backup operation.

This means that incremental backups can NEVER BE RETRIED in a
deterministic fashion. If an incremental backup fails on a set
of dirty sectors {x}, and a new set of dirty sectors {y} are
introduced during the backup, then any possible retry action
on an incremental backup can only operate on {x,y}. There is no
way to get an incremental backup "as it would have been."

So, the failure mode for incremental backup is to try again,
and the resulting image will simply be a differential from the
last successful dirty bitmap backup.

  - Removed hbitmap_copy and bdrv_dirty_bitmap_copy.

  - Added a small fixup patch:
- Update all granularity fields to be uint64_t.
- Update documentation around BdrvDirtyBitmap structure.

  - Modified transactions to obey frozen attribute of BdrvDirtyBitmaps.

  - Added frozen attribute to the info query.

v10 (Overview):

  - I've included one of Vladimir's bitmap fixes as patch #1.

  - QMP commands and transactions are now protected via
aio_context functions.

  - QMP commands use "node-ref" as a parameter name now. Reasoning
is thus: Either a "device name" or a "node name" can be used to
reference a BDS, which is the key item we are actually acting
on with these bitmap commands. Thus, I refer to this unified
parameter as a "Node Reference," or "node-ref."

We could also argue for "backend-ref" or "device-ref" for the
reverse semantics: where we accept a unified parameter, but we
intend to resolve it to the BlockBackend instead of resolving
the parameter given to the BlockDriverState.

Or, we could use "reference" for both cases, and to match the
existing BlockdevRef command.

  - All QMP commands added are now per-node via a unified parameter
name. Though backup only operates on "devices," you are free to
create bitmaps for any arbitrary node you wish. It is obviously
only useful for the root node, currently.

  - Bitmap Sync modes (CONSUME, RESET) are removed. See below
(changelog, RFC questions) for more details.

  - Code to recover the bitmap after a failure has been added,
but I have some major questions about drive_backup guarantees.
See RFC questions.

v10 (Detailed Changelog):

(1/13) New Patch:
  - Included Vladimir Sementsov-Ogievskiy's patch that clarifies
the semantics of the bitmap primitives.

(3/13):
  - Edited function names for consistency (Stefanha)
  - Removed compile-time constants (Stefanha)
  - Acquire aio_context for bitmap add/remove (Stefanha)
  - Documented optional granularity for bitmap-add in
qmp-commands.hx file (Eric Blake)
  - block_dirty_bitmap_lookup moved forward to this patch to allow
more re-use. (S

Re: [Qemu-devel] [PATCH v2 09/11] target-arm: Use mmu_idx in get_phys_addr()

2015-01-30 Thread Peter Maydell
On 30 January 2015 at 02:03, Edgar E. Iglesias  wrote:
> On Thu, Jan 29, 2015 at 06:55:15PM +, Peter Maydell wrote:
>> Now we have the mmu_idx in get_phys_addr(), use it correctly to
>> determine the behaviour of virtual to physical address translations,
>> rather than using just an is_user flag and the current CPU state.
>>
>> Some TODO comments have been added to indicate where changes will
>> need to be made to add EL2 and 64-bit EL3 support.
>> -/* This will go away when we handle mmu_idx properly here */
>> -int is_user = (mmu_idx == ARMMMUIdx_S12NSE0 ||
>> -   mmu_idx == ARMMMUIdx_S1SE0 ||
>> -   mmu_idx == ARMMMUIdx_S1NSE0);
>> +if (mmu_idx == ARMMMUIdx_S12NSE0 || mmu_idx == ARMMMUIdx_S12NSE1) {
>> +/* TODO: when we support EL2 we should here call ourselves 
>> recursively
>> + * to do the stage 1 and then stage 2 translations. The ldl_phys
>> + * calls for stage 1 will also need changing.
>> + * For non-EL2 CPUs a stage1+stage2 translation is just stage 1.
>> + */
>> +assert(!arm_feature(env, ARM_FEATURE_EL2));
>> +mmu_idx += ARMMMUIdx_S1NSE0;
>
> I'm not sure I understand this. Did you mean the following?
> mmu_idx = ARMMMUIdx_S1NSE0;

No. This code is handling "we asked for a stage1+2 EL0 or EL1
lookup but we don't have EL2". In this case these degrade
to the equivalent stage-1-only lookups:
 S12NSE0 -> S1NSE0
 S12NSE1 -> S1NSE1
We're relying on S12NSE0 being zero and the E0/E1 indexes
being consecutive on both sides.

> Maybe you can relax the assert to check for FEATURE_EL2 and hcr_el2 & HCR_VM ?
> And not change the mmu_idx.

The assert is here to say "if you want to implement EL2 there
is work to do here". For EL2, this is going to look
something like:
if (arm_feature(env, ARM_FEATURE_EL2 && (hcr_el2 & HCR_VM)) {
/* stage 2 exists and is enabled */
hwaddr ipa;
get_phys_addr(env, addr, &ipa, ..., stage 1 mmuidx, ...);
handle stage 1 faults;
get_phys_addr(env, ipa, &physaddr, , stage 2 mmuidx, ...);
handle stage 2 faults;
combine protection etc info from stage 1 and stage 2;
return final physaddr for combined lookup;
}

That's quite a bit of extra code, so it's deferred til we
actually implement EL2, and in the meantime we assert as a
marker for "if you hit this you need to implement all that".

-- PMM



Re: [Qemu-devel] [PATCH v2 2/4] pci: Add generic PCIe host bridge

2015-01-30 Thread Paolo Bonzini


On 29/01/2015 15:25, Peter Maydell wrote:
> > Ah, right, I forgot about that. Unfortunately I can't reserve PCI IDs in
> > the Red Hat PCI range.
> >
> > Michael, what's the process here?
>
> Last time this came up I think Paolo said the official registration
> was getting a patch into QEMU master that added a line to the
> list in pci.h :-) [ie QEMU is the master source for these IDs]

Yeah, rocker was the first case of someone not a QEMU developer needing
an id.  So let's make QEMU the master source.

Paolo



Re: [Qemu-devel] QEMU and Real Time OS

2015-01-30 Thread Marc Marí
El Fri, 30 Jan 2015 08:37:47 +0100
Jan Kiszka  escribió:
> On 2015-01-30 00:06, Paolo Bonzini wrote:
> > 
> > 
> > On 29/01/2015 20:37, Marc Marí wrote:
> >> Is this an expected behaviour? I can't see why.
> >>
> >> I'd like to know if there is a certain reason why it doesn't work.
> >> Or if it should work and the problem is too much I/O overhead. Or
> >> any other hint to understand it.
> > 
> > It is due to latencies in the host.  You need at least to use
> > preempt-rt kernels in the host as well.
> 
> That alone won't help much. You also need to fine-tune the guest to
> avoid running into QEMU locks that continuously synchronizes the guest
> on things like VGA or disk I/O emulation.
> 
> When using KVM, thus being able to run VCPUs widely independent of
> each other and the device models, you need to push cyclictest on an
> isolated second virtual CPU of the guest. Luiz and Marcelo can
> probably confirm this based on their ongoing experiments.
> 
> With TCG, we would first of all have to make it true SMP and
> independent of the I/O device lock. That's what Frederic is working
> on [1].
> 
> Jan
> 
> [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/314406
> 

Thanks for the answers. I think I'm stuck with ARM926, which I think is
not prepared for SMP. I'll have to look if I can use Cortex for my
experiments.

I'll continue interested with the improvements for RT on TCG, but for
the moment I'll go to work on real harware, even though is easier to
run and debug on an emulator.

Thanks
Marc



Re: [Qemu-devel] [RFC 1/1] Execute arbitrary QMP commands from command line

2015-01-30 Thread Paolo Bonzini


On 30/01/2015 11:02, Dr. David Alan Gilbert wrote:
> Yes; I've not got a clue what it's effect is, but given it's glued into
> the block code I'm sure that breaking it will be subtle and nasty.  I'm
> also reasonably sure it's probably going to do something different in the 
> loadvm
> case which is probably unintended.

It avoids metadata writes that would be concurrent with the source's.

loadvm is okay because it doesn't have a concurrently running source,
but it doesn't hurt either.

Paolo



Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Peter Maydell
On 30 January 2015 at 09:54, Daniel P. Berrange  wrote:
> While it is clear there is no solution that works correctly with all
> kernels, I hate to think that we're going to stick with an ordering
> that is clearly wrong for modern kernels, forever going forward. The
> aarch64 world is only just starting out, so on balance I think we
> should optimize for the future rather than the past, since that gives
> right behaviour for orders of magnitude more people in the long term.

Yeah, I agree it's awkward. But I hate breaking people's
working setups, and we have no guarantee the kernel won't
change again in the future.

You could try asking the kernel folk to revert that patch on
the basis that it breaks things...

> Also can we start using a versioned machine type for ARM, and make the
> new machine type have the correct ordering for current kernels.

No, not yet. We are miles and miles and miles away from being
able to do cross-version migration correctly, and adding version
machine types is accepting a huge maintenance burden from now
extending indefinitely into the future. (It only works for x86,
as I understand it, because RedHat do a lot of testing of various
migration scenarios including cross-version, and even then we miss
things. Nobody has as yet suggested they're doing or even planning
to do that level of testing on the ARM boards. I'm not going to
add versioned boards until it becomes absolutely required. Having
working within-version migration would be a start.

-- PMM



Re: [Qemu-devel] QEMU and Real Time OS

2015-01-30 Thread Frederic Konrad

On 30/01/2015 11:26, Marc Marí wrote:

El Fri, 30 Jan 2015 08:37:47 +0100
Jan Kiszka  escribió:

On 2015-01-30 00:06, Paolo Bonzini wrote:


On 29/01/2015 20:37, Marc Marí wrote:

Is this an expected behaviour? I can't see why.

I'd like to know if there is a certain reason why it doesn't work.
Or if it should work and the problem is too much I/O overhead. Or
any other hint to understand it.

It is due to latencies in the host.  You need at least to use
preempt-rt kernels in the host as well.

That alone won't help much. You also need to fine-tune the guest to
avoid running into QEMU locks that continuously synchronizes the guest
on things like VGA or disk I/O emulation.

When using KVM, thus being able to run VCPUs widely independent of
each other and the device models, you need to push cyclictest on an
isolated second virtual CPU of the guest. Luiz and Marcelo can
probably confirm this based on their ongoing experiments.

With TCG, we would first of all have to make it true SMP and
independent of the I/O device lock. That's what Frederic is working
on [1].

Jan

[1] http://permalink.gmane.org/gmane.comp.emulators.qemu/314406


Thanks for the answers. I think I'm stuck with ARM926, which I think is
not prepared for SMP. I'll have to look if I can use Cortex for my
experiments.

I'll continue interested with the improvements for RT on TCG, but for
the moment I'll go to work on real harware, even though is easier to
run and debug on an emulator.

Thanks
Marc

Hi Marc,

I think the important point here is "TCG thread independent of the I/O 
device lock".

I need it for multithread TCG but that doesn't mean you need an SMP guest
platform for that.

Fred




Re: [Qemu-devel] QEMU and Real Time OS

2015-01-30 Thread Marc Marí
El Fri, 30 Jan 2015 11:36:37 +0100
Frederic Konrad  escribió:
> On 30/01/2015 11:26, Marc Marí wrote:
> > El Fri, 30 Jan 2015 08:37:47 +0100
> > Jan Kiszka  escribió:
> >> On 2015-01-30 00:06, Paolo Bonzini wrote:
> >>>
> >>> On 29/01/2015 20:37, Marc Marí wrote:
>  Is this an expected behaviour? I can't see why.
> 
>  I'd like to know if there is a certain reason why it doesn't
>  work. Or if it should work and the problem is too much I/O
>  overhead. Or any other hint to understand it.
> >>> It is due to latencies in the host.  You need at least to use
> >>> preempt-rt kernels in the host as well.
> >> That alone won't help much. You also need to fine-tune the guest to
> >> avoid running into QEMU locks that continuously synchronizes the
> >> guest on things like VGA or disk I/O emulation.
> >>
> >> When using KVM, thus being able to run VCPUs widely independent of
> >> each other and the device models, you need to push cyclictest on an
> >> isolated second virtual CPU of the guest. Luiz and Marcelo can
> >> probably confirm this based on their ongoing experiments.
> >>
> >> With TCG, we would first of all have to make it true SMP and
> >> independent of the I/O device lock. That's what Frederic is working
> >> on [1].
> >>
> >> Jan
> >>
> >> [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/314406
> >>
> > Thanks for the answers. I think I'm stuck with ARM926, which I
> > think is not prepared for SMP. I'll have to look if I can use
> > Cortex for my experiments.
> >
> > I'll continue interested with the improvements for RT on TCG, but
> > for the moment I'll go to work on real harware, even though is
> > easier to run and debug on an emulator.
> >
> > Thanks
> > Marc
> Hi Marc,
> 
> I think the important point here is "TCG thread independent of the
> I/O device lock".
> I need it for multithread TCG but that doesn't mean you need an SMP
> guest platform for that.
> 
> Fred
> 

I read too fast, sorry. What has to be multicore is the host, so the
I/O can run independently of the TCG. But the guest can be multi- or
uni-core.

Marc



Re: [Qemu-devel] [PATCH v2 00/11] target-arm: handle mmu_idx/translation regimes properly

2015-01-30 Thread Peter Maydell
On 30 January 2015 at 01:36, Edgar E. Iglesias  wrote:
> IIRC, last time the dedicated S-EL0 and S-EL1 MMU idx came up the
> discussion went around flushing the qemu tlbs when switching between
> S/NS. Having the dedicated MMU-idx is faster but for Aarch64 I think
> we would need logic in at least the TTBRx access handlers to make use
> of the dedicated secure MMU idx as Aarch64 secure monitors need to
> reprogram the MMU when world switching.
>
> Another thing around the ARMMMUIdx_S2NS index.
> From what I've seen, what would really help is having a fast
> way to go from VM mode to non-vm mode. In particular for KVM.
> For example when a guest writes to a virtio console there is alot
> of ping-ponging between NS-S12(Guest) and NS-S1(Linux/KVM).
>
> Similary for XEN, it would really help to have that ASID/VMID indexed TLB I
> think you suggested at some point. In XEN's case the ping-ponging
> goes between two guests, domUs and dom0.

Yes, the lack of ASID/VMID restrictions on the TLB hurts us.
It's tricky to implement in s/w without it being slower than
what we have now, though.

> I'm not try to indicate that you should add any of that now,
> I'm just not sure sure it's worth adding the ARMMMUIdx_S2NS without
> trying if it will actually give any real life improvements in
> QEMU.

Mostly my aim here was to make sure we were actually treating
separately the various virt-to-phys mappings which the architecture
says are separate, so we don't have nasty "hit something stale
in the TLB" bugs to track down.

-- PMM



Re: [Qemu-devel] [RFC PATCH v8 15/21] aio: replace stack of bottom halves with queue

2015-01-30 Thread Paolo Bonzini


On 22/01/2015 09:52, Pavel Dovgalyuk wrote:
> Bottom halves in AIO context are stored and removes
> in LIFO order. It makes their execution non-deterministic.
> This patch replaces the stack with queue to preserve the
> order of bottom halves processing.
> 
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  async.c  |   25 +
>  include/block/aio.h  |4 ++--
>  include/qemu/queue.h |7 +++
>  3 files changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/async.c b/async.c
> index 2be88cc..bc6e83b 100644
> --- a/async.c
> +++ b/async.c
> @@ -35,7 +35,7 @@ struct QEMUBH {
>  AioContext *ctx;
>  QEMUBHFunc *cb;
>  void *opaque;
> -QEMUBH *next;
> +QSIMPLEQ_ENTRY(QEMUBH) next;
>  bool scheduled;
>  bool idle;
>  bool deleted;
> @@ -51,10 +51,7 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void 
> *opaque)
>  .opaque = opaque,
>  };
>  qemu_mutex_lock(&ctx->bh_lock);
> -bh->next = ctx->first_bh;
> -/* Make sure that the members are ready before putting bh into list */
> -smp_wmb();
> -ctx->first_bh = bh;
> +QSIMPLEQ_INSERT_TAIL_RCU(&ctx->bh_queue, bh, next);
>  qemu_mutex_unlock(&ctx->bh_lock);
>  return bh;
>  }
> @@ -62,16 +59,15 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void 
> *opaque)
>  /* Multiple occurrences of aio_bh_poll cannot be called concurrently */
>  int aio_bh_poll(AioContext *ctx)
>  {
> -QEMUBH *bh, **bhp, *next;
> +QEMUBH *bh, *next;
>  int ret;
>  
>  ctx->walking_bh++;
>  
>  ret = 0;
> -for (bh = ctx->first_bh; bh; bh = next) {
> +QSIMPLEQ_FOREACH_SAFE(bh, &ctx->bh_queue, next, next) {
>  /* Make sure that fetching bh happens before accessing its members */
>  smp_read_barrier_depends();
> -next = bh->next;

Must use QSIMPLEQ_FOREACH.  Otherwise, the access of next is before the
smp_read_barrier_depends().

>  if (!bh->deleted && bh->scheduled) {
>  bh->scheduled = 0;
>  /* Paired with write barrier in bh schedule to ensure reading for
> @@ -90,14 +86,10 @@ int aio_bh_poll(AioContext *ctx)
>  /* remove deleted bhs */
>  if (!ctx->walking_bh) {
>  qemu_mutex_lock(&ctx->bh_lock);
> -bhp = &ctx->first_bh;
> -while (*bhp) {
> -bh = *bhp;
> +QSIMPLEQ_FOREACH_SAFE(bh, &ctx->bh_queue, next, next) {
>  if (bh->deleted) {
> -*bhp = bh->next;
> +QSIMPLEQ_REMOVE(&ctx->bh_queue, bh, QEMUBH, next);
>  g_free(bh);
> -} else {
> -bhp = &bh->next;
>  }
>  }
>  qemu_mutex_unlock(&ctx->bh_lock);
> @@ -161,7 +153,7 @@ aio_compute_timeout(AioContext *ctx)
>  int timeout = -1;
>  QEMUBH *bh;
>  
> -for (bh = ctx->first_bh; bh; bh = bh->next) {
> +QSIMPLEQ_FOREACH(bh, &ctx->bh_queue, next) {
>  if (!bh->deleted && bh->scheduled) {
>  if (bh->idle) {
>  /* idle bottom halves will be polled at least
> @@ -204,7 +196,7 @@ aio_ctx_check(GSource *source)
>  AioContext *ctx = (AioContext *) source;
>  QEMUBH *bh;
>  
> -for (bh = ctx->first_bh; bh; bh = bh->next) {
> +QSIMPLEQ_FOREACH(bh, &ctx->bh_queue, next) {
>  if (!bh->deleted && bh->scheduled) {
>  return true;
>   }
> @@ -311,6 +303,7 @@ AioContext *aio_context_new(Error **errp)
>  qemu_mutex_init(&ctx->bh_lock);
>  rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
>  timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
> +QSIMPLEQ_INIT(&ctx->bh_queue);
>  
>  return ctx;
>  }
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 7d1e26b..82cdf78 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -71,8 +71,8 @@ struct AioContext {
>  /* lock to protect between bh's adders and deleter */
>  QemuMutex bh_lock;
>  
> -/* Anchor of the list of Bottom Halves belonging to the context */
> -struct QEMUBH *first_bh;
> +/* List of Bottom Halves belonging to the context */
> +QSIMPLEQ_HEAD(, QEMUBH) bh_queue;
>  
>  /* A simple lock used to protect the first_bh list, and ensure that
>   * no callbacks are removed while we're walking and dispatching 
> callbacks.
> diff --git a/include/qemu/queue.h b/include/qemu/queue.h
> index a98eb3a..b94c4d4 100644
> --- a/include/qemu/queue.h
> +++ b/include/qemu/queue.h
> @@ -268,6 +268,13 @@ struct { 
>\
>  (head)->sqh_last = &(elm)->field.sqe_next;  \
>  } while (/*CONSTCOND*/0)
>  
> +#define QSIMPLEQ_INSERT_TAIL_RCU(head, elm, field) do { \
> +(elm)->field.sqe_next = NULL;   \
> +*(head)->sqh_last = (elm);  \
> +(head)->sqh_last = &(elm)->field.sqe_next;  

Re: [Qemu-devel] QEMU and Real Time OS

2015-01-30 Thread Jan Kiszka
On 2015-01-30 11:36, Frederic Konrad wrote:
> On 30/01/2015 11:26, Marc Marí wrote:
>> El Fri, 30 Jan 2015 08:37:47 +0100
>> Jan Kiszka  escribió:
>>> On 2015-01-30 00:06, Paolo Bonzini wrote:

 On 29/01/2015 20:37, Marc Marí wrote:
> Is this an expected behaviour? I can't see why.
>
> I'd like to know if there is a certain reason why it doesn't work.
> Or if it should work and the problem is too much I/O overhead. Or
> any other hint to understand it.
 It is due to latencies in the host.  You need at least to use
 preempt-rt kernels in the host as well.
>>> That alone won't help much. You also need to fine-tune the guest to
>>> avoid running into QEMU locks that continuously synchronizes the guest
>>> on things like VGA or disk I/O emulation.
>>>
>>> When using KVM, thus being able to run VCPUs widely independent of
>>> each other and the device models, you need to push cyclictest on an
>>> isolated second virtual CPU of the guest. Luiz and Marcelo can
>>> probably confirm this based on their ongoing experiments.
>>>
>>> With TCG, we would first of all have to make it true SMP and
>>> independent of the I/O device lock. That's what Frederic is working
>>> on [1].
>>>
>>> Jan
>>>
>>> [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/314406
>>>
>> Thanks for the answers. I think I'm stuck with ARM926, which I think is
>> not prepared for SMP. I'll have to look if I can use Cortex for my
>> experiments.
>>
>> I'll continue interested with the improvements for RT on TCG, but for
>> the moment I'll go to work on real harware, even though is easier to
>> run and debug on an emulator.
>>
>> Thanks
>> Marc
> Hi Marc,
> 
> I think the important point here is "TCG thread independent of the I/O
> device lock".
> I need it for multithread TCG but that doesn't mean you need an SMP guest
> platform for that.

Provided the guest is not stressing any costly device emulation. That
can cause priority inversions when an urgent event should rather be
delivered in the meantime. That can easily happen when running real-time
Linux as guest, therefore the SMP proposal.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Daniel P. Berrange
On Fri, Jan 30, 2015 at 10:29:46AM +, Peter Maydell wrote:
> On 30 January 2015 at 09:54, Daniel P. Berrange  wrote:
> > While it is clear there is no solution that works correctly with all
> > kernels, I hate to think that we're going to stick with an ordering
> > that is clearly wrong for modern kernels, forever going forward. The
> > aarch64 world is only just starting out, so on balance I think we
> > should optimize for the future rather than the past, since that gives
> > right behaviour for orders of magnitude more people in the long term.
> 
> Yeah, I agree it's awkward. But I hate breaking people's
> working setups, and we have no guarantee the kernel won't
> change again in the future.
> 
> You could try asking the kernel folk to revert that patch on
> the basis that it breaks things...

Might be worth a shot - the patch is only a month old. Or at least do a
followup patch to put the ordering back the way it was, rather than plain
revert

Long term though it will be much better of AArch64 would just do PCI
instead of MMIO bus. Then we have proper device addressing which we
can control in a predictable manner that will be stable across hotplug
and unplug and migration.  I hear there's work on PCI for AArch64 but
is there a near term ETA yet ?

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [RFC PATCH v8 16/21] replay: bottom halves

2015-01-30 Thread Paolo Bonzini


On 22/01/2015 09:53, Pavel Dovgalyuk wrote:
> This patch introduces bottom half event for replay queue. It saves the events
> into the queue and process them at the checkpoints and instructions execution.

Which bottom halves must _not_ go through aio/qemu_bh_new_replay?

> +QEMUBH *aio_bh_new_replay(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
> +  uint64_t id)

id is superfluous, it can always be replay_get_current_step().

It seems to me that the device models can always use qemu_bh_new_replay.
 There are few if any uses of qemu_bh_new outside device models.
Perhaps convert those to aio_bh_new, and make qemu_bh_new always do
special replay treatment?

> +{
> +QEMUBH *bh = aio_bh_new(ctx, cb, opaque);
> +bh->replay = true;
> +bh->id = id;
> +return bh;
> +}

Paolo



Re: [Qemu-devel] [RFC PATCH v8 20/21] replay: command line options

2015-01-30 Thread Paolo Bonzini


On 22/01/2015 09:53, Pavel Dovgalyuk wrote:
> This patch introduces command line options for enabling recording or replaying
> virtual machine behavior. "-record" option starts recording of the execution
> and saves it into the log, specified with "fname" parameter. "-replay" option
> is intended for replaying previously saved log.
> 
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  cpus.c  |3 ++-
>  qemu-options.hx |8 ++--
>  vl.c|   27 ++-
>  3 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 9c32491..7689cec 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -912,9 +912,10 @@ static void qemu_wait_io_event_common(CPUState *cpu)
>  static void qemu_tcg_wait_io_event(void)
>  {
>  CPUState *cpu;
> +GMainContext *context = g_main_context_default();
>  
>  while (all_cpu_threads_idle()) {
> -   /* Start accounting real time to the virtual clock if the CPUs
> +/* Start accounting real time to the virtual clock if the CPUs
>are idle.  */
>  qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
>  qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);

Extraneous change.

> diff --git a/qemu-options.hx b/qemu-options.hx
> index 10b9568..cb4b577 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3037,11 +3037,11 @@ re-inject them.
>  ETEXI
>  
>  DEF("icount", HAS_ARG, QEMU_OPTION_icount, \
> -"-icount [shift=N|auto][,align=on|off]\n" \
> +"-icount 
> [shift=N|auto][,align=on|off][,rr=record|replay,rrfname=]\n" \

rrfile?

>  "enable virtual instruction counter with 2^N clock ticks 
> per\n" \
>  "instruction and enable aligning the host and virtual 
> clocks\n", QEMU_ARCH_ALL)
>  STEXI
> -@item -icount [shift=@var{N}|auto]
> +@item -icount [shift=@var{N}|auto][,rr=record|replay,rrfname=@var{filename}]
>  @findex -icount
>  Enable virtual instruction counter.  The virtual cpu will execute one
>  instruction every 2^@var{N} ns of virtual time.  If @code{auto} is specified
> @@ -3063,6 +3063,10 @@ Currently this option does not work when 
> @option{shift} is @code{auto}.
>  Note: The sync algorithm will work for those shift values for which
>  the guest clock runs ahead of the host clock. Typically this happens
>  when the shift value is high (how high depends on the host machine).
> +
> +When @option{rr} option is specified deterministic record/replay is enabled.
> +Replay log is written into @var{filename} file in record mode and
> +read from this file in replay mode.
>  ETEXI
>  
>  DEF("watchdog", HAS_ARG, QEMU_OPTION_watchdog, \
> diff --git a/vl.c b/vl.c
> index ae3e97e..39d9024 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -475,6 +475,12 @@ static QemuOptsList qemu_icount_opts = {
>  }, {
>  .name = "align",
>  .type = QEMU_OPT_BOOL,
> +}, {
> +.name = "rr",
> +.type = QEMU_OPT_STRING,
> +}, {
> +.name = "rrfname",
> +.type = QEMU_OPT_STRING,
>  },
>  { /* end of list */ }
>  },
> @@ -2752,6 +2758,8 @@ int main(int argc, char **argv, char **envp)
>  {
>  int i;
>  int snapshot, linux_boot;
> +int not_compatible_replay_param = 0;
> +const char *icount_option = NULL;
>  const char *initrd_filename;
>  const char *kernel_filename, *kernel_cmdline;
>  const char *boot_order;
> @@ -2949,6 +2957,7 @@ int main(int argc, char **argv, char **envp)
>  break;
>  case QEMU_OPTION_pflash:
>  drive_add(IF_PFLASH, -1, optarg, PFLASH_OPTS);
> +not_compatible_replay_param++;

Why not compatible?

>  break;
>  case QEMU_OPTION_snapshot:
>  snapshot = 1;
> @@ -3105,6 +3114,7 @@ int main(int argc, char **argv, char **envp)
>  #endif
>  case QEMU_OPTION_bt:
>  add_device_config(DEV_BT, optarg);
> +not_compatible_replay_param++;

Could it be enough to add a migration blocker?

>  break;
>  case QEMU_OPTION_audio_help:
>  AUD_help ();
> @@ -3244,6 +3254,7 @@ int main(int argc, char **argv, char **envp)
>  if (!opts) {
>  exit(1);
>  }
> +not_compatible_replay_param++;

Why not compatible?

>  break;
>  case QEMU_OPTION_fsdev:
>  olist = qemu_find_opts("fsdev");
> @@ -3372,6 +3383,7 @@ int main(int argc, char **argv, char **envp)
>  if (strncmp(optarg, "mon:", 4) == 0) {
>  default_monitor = 0;
>  }
> +not_compatible_replay_param++;

Add a migration blocker?

>  break;
>  case QEMU_OPTION_debugcon:
>  add_device_config(DEV_DEBUGCON, optarg);
> @@ -3489,6 +3501,7 @@ int main(int argc, char **argv, char **envp)
>

Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Peter Maydell
On 30 January 2015 at 10:48, Daniel P. Berrange  wrote:
> Long term though it will be much better of AArch64 would just do PCI
> instead of MMIO bus. Then we have proper device addressing which we
> can control in a predictable manner that will be stable across hotplug
> and unplug and migration.  I hear there's work on PCI for AArch64 but
> is there a near term ETA yet ?

I need to review Alex's v3 patchset; if it doesn't have any issues
it'll probably go into target-arm.next next week and hit mainline
shortly after that.

-- PMM



Re: [Qemu-devel] [RFC PATCH v8 19/21] replay: initialization and deinitialization

2015-01-30 Thread Paolo Bonzini


On 22/01/2015 09:53, Pavel Dovgalyuk wrote:
> This patch introduces the functions for enabling the record/replay and for
> freeing the resources when simulator closes.
> 
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  block.c  |2 -
>  exec.c   |1 
>  replay/replay-internal.c |1 
>  replay/replay-internal.h |4 +
>  replay/replay.c  |  140 
> ++
>  replay/replay.h  |   13 
>  stubs/replay.c   |1 
>  vl.c |5 ++
>  8 files changed, 165 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 7f6fa8b..9923a80 100644
> --- a/block.c
> +++ b/block.c
> @@ -2010,7 +2010,7 @@ void bdrv_drain_all(void)
>  aio_context_release(aio_context);
>  }
>  }
> -if (replay_mode == REPLAY_MODE_PLAY) {
> +if (replay_mode == REPLAY_MODE_PLAY && replay_checkpoints) {
>  /* Skip checkpoints from the log */
>  while (replay_checkpoint(CHECKPOINT_BDRV_DRAIN)) {
>  /* Nothing */
> diff --git a/exec.c b/exec.c
> index 410371d..f70f03d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -783,6 +783,7 @@ void cpu_abort(CPUState *cpu, const char *fmt, ...)
>  }
>  va_end(ap2);
>  va_end(ap);
> +replay_finish();
>  #if defined(CONFIG_USER_ONLY)
>  {
>  struct sigaction act;
> diff --git a/replay/replay-internal.c b/replay/replay-internal.c
> index 49b37a6..d608b71 100755
> --- a/replay/replay-internal.c
> +++ b/replay/replay-internal.c
> @@ -33,6 +33,7 @@ void replay_put_byte(uint8_t byte)
>  
>  void replay_put_event(uint8_t event)
>  {
> +assert(event <= EVENT_END);

Please move (way) earlier in the series.

>  replay_put_byte(event);
>  }
>  
> diff --git a/replay/replay-internal.h b/replay/replay-internal.h
> index 4d242aa..1e5d037 100755
> --- a/replay/replay-internal.h
> +++ b/replay/replay-internal.h
> @@ -34,7 +34,9 @@ enum ReplayEvents {
>  EVENT_CLOCK,
>  /* for checkpoint event */
>  /* some of grteater codes are reserved for checkpoints */
> -EVENT_CHECKPOINT = EVENT_CLOCK + REPLAY_CLOCK_COUNT
> +EVENT_CHECKPOINT = EVENT_CLOCK + REPLAY_CLOCK_COUNT,
> +/* end of log event */
> +EVENT_END = EVENT_CHECKPOINT + CHECKPOINT_COUNT

Please add EVENT_END in the beginning of the series.  If you do:

EVENT_CLOCK,
EVENT_CLOCK_LAST = EVENT_CLOCK + REPLAY_CLOCK_COUNT - 1,
EVENT_CHECKPOINT,
EVENT_CHECKPOINT_LAST = EVENT_CHECKPOINT + CHECKPOINT_COUNT - 1,
EVENT_END,

the patches look quite nice.

>  };
>  
>  /* Asynchronous events IDs */
> diff --git a/replay/replay.c b/replay/replay.c
> index 7c4a801..fa738c2 100755
> --- a/replay/replay.c
> +++ b/replay/replay.c
> @@ -15,9 +15,18 @@
>  #include "qemu/timer.h"
>  #include "sysemu/sysemu.h"
>  
> +/* Current version of the replay mechanism.
> +   Increase it when file format changes. */
> +#define REPLAY_VERSION  0xe02002

Any meaning?  Perhaps make it 'QRR\0' or something identifiable?

> +/* Size of replay log header */
> +#define HEADER_SIZE (sizeof(uint32_t) + sizeof(uint64_t))
> +
>  ReplayMode replay_mode = REPLAY_MODE_NONE;
>  
> +/* Name of replay file  */
> +static char *replay_filename;
>  ReplayState replay_state;
> +bool replay_checkpoints;

Why is replay_checkpoints only defined and used in this patch?  Is it a
duplicate of replay_events_enabled?

>  bool skip_async_events(int stop_event)
>  {
> @@ -167,6 +176,10 @@ void replay_shutdown_request(void)
>  bool replay_checkpoint(ReplayCheckpoint checkpoint)
>  {
>  bool res = false;
> +if (!replay_checkpoints) {
> +return true;
> +}
> +
>  replay_save_instructions();
>  
>  if (replay_file) {
> @@ -199,3 +212,130 @@ bool replay_checkpoint(ReplayCheckpoint checkpoint)
>  
>  return true;
>  }
> +
> +static void replay_enable(const char *fname, int mode)
> +{
> +const char *fmode = NULL;
> +if (replay_file) {
> +fprintf(stderr,
> +"Replay: some record/replay operation is already started\n");

Would this have to be an assertion?

> +return;
> +}
> +
> +switch (mode) {
> +case REPLAY_MODE_RECORD:
> +fmode = "wb";
> +break;
> +case REPLAY_MODE_PLAY:
> +fmode = "rb";
> +break;
> +default:
> +fprintf(stderr, "Replay: internal error: invalid replay mode\n");
> +exit(1);
> +}
> +
> +atexit(replay_finish);
> +
> +replay_mutex_init();
> +
> +replay_file = fopen(fname, fmode);
> +if (replay_file == NULL) {
> +fprintf(stderr, "Replay: open %s: %s\n", fname, strerror(errno));
> +exit(1);
> +}
> +
> +replay_filename = g_strdup(fname);
> +
> +replay_mode = mode;
> +replay_has_unread_data = 0;
> +replay_data_kind = -1;
> +replay_state.instructions_count = 0;
> +replay_state.current_step = 0;
> +
> +/* skip file header for RECOR

Re: [Qemu-devel] [RFC PATCH v8 14/21] replay: checkpoints

2015-01-30 Thread Paolo Bonzini


On 22/01/2015 09:52, Pavel Dovgalyuk wrote:
> This patch introduces checkpoints that synchronize cpu thread and iothread.
> When checkpoint is met in the code all asynchronous events from the queue
> are executed.
> 
> Signed-off-by: Pavel Dovgalyuk 

I cannot understand this patch.

Please document the design in docs/ and explain why you have done the
checkpoints this way.

In particular I cannot understand why you have these six checkpoints:

+CHECKPOINT_CLOCK_VIRTUAL,
+CHECKPOINT_CLOCK_VIRTUAL_ALL,
+CHECKPOINT_CLOCK_HOST,
+CHECKPOINT_CLOCK_HOST_ALL,
+CHECKPOINT_CLOCK_VIRTUAL_RT,
+CHECKPOINT_CLOCK_VIRTUAL_RT_ALL,

1) why do you need separate checkpoints for each clock?

2) why do you need separate checkpoints for _ALL and single-aio-context?

Paolo


> ---
>  block.c  |   11 ++
>  cpus.c   |7 ++-
>  include/qemu/timer.h |6 --
>  main-loop.c  |5 +
>  qemu-timer.c |   49 
> ++
>  replay/replay-internal.h |5 -
>  replay/replay.c  |   36 ++
>  replay/replay.h  |   21 
>  stubs/replay.c   |   11 ++
>  vl.c |4 +++-
>  10 files changed, 142 insertions(+), 13 deletions(-)
> 
> diff --git a/block.c b/block.c
> index cbe4a32..a4f45c3 100644
> --- a/block.c
> +++ b/block.c
> @@ -1994,6 +1994,11 @@ void bdrv_drain_all(void)
>  BlockDriverState *bs;
>  
>  while (busy) {
> +if (!replay_checkpoint(CHECKPOINT_BDRV_DRAIN)) {
> +/* Do not wait anymore, we stopped at some place in
> +   the middle of execution during replay */
> +return;
> +}
>  busy = false;
>  
>  QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
> @@ -2004,6 +2009,12 @@ void bdrv_drain_all(void)
>  aio_context_release(aio_context);
>  }
>  }
> +if (replay_mode == REPLAY_MODE_PLAY) {
> +/* Skip checkpoints from the log */
> +while (replay_checkpoint(CHECKPOINT_BDRV_DRAIN)) {
> +/* Nothing */
> +}
> +}
>  }
>  
>  /* make a BlockDriverState anonymous by removing from bdrv_state and
> diff --git a/cpus.c b/cpus.c
> index 01d89aa..9c32491 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -388,7 +388,7 @@ void qtest_clock_warp(int64_t dest)
>  timers_state.qemu_icount_bias += warp;
>  seqlock_write_unlock(&timers_state.vm_clock_seqlock);
>  
> -qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
> +qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL, false);
>  clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
>  }
>  qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> @@ -408,6 +408,11 @@ void qemu_clock_warp(QEMUClockType type)
>  return;
>  }
>  
> +/* warp clock deterministically in record/replay mode */
> +if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP)) {
> +return;
> +}
> +
>  /*
>   * If the CPUs have been sleeping, advance QEMU_CLOCK_VIRTUAL timer now.
>   * This ensures that the deadline for the timer is computed correctly 
> below.
> diff --git a/include/qemu/timer.h b/include/qemu/timer.h
> index 0c2472c..26927b0 100644
> --- a/include/qemu/timer.h
> +++ b/include/qemu/timer.h
> @@ -240,13 +240,14 @@ void qemu_clock_unregister_reset_notifier(QEMUClockType 
> type,
>  /**
>   * qemu_clock_run_timers:
>   * @type: clock on which to operate
> + * @run_all: true, when called from qemu_clock_run_all_timers
>   *
>   * Run all the timers associated with the default timer list
>   * of a clock.
>   *
>   * Returns: true if any timer ran.
>   */
> -bool qemu_clock_run_timers(QEMUClockType type);
> +bool qemu_clock_run_timers(QEMUClockType type, bool run_all);
>  
>  /**
>   * qemu_clock_run_all_timers:
> @@ -337,12 +338,13 @@ QEMUClockType timerlist_get_clock(QEMUTimerList 
> *timer_list);
>  /**
>   * timerlist_run_timers:
>   * @timer_list: the timer list to use
> + * @run_all: true, when called from qemu_clock_run_all_timers
>   *
>   * Call all expired timers associated with the timer list.
>   *
>   * Returns: true if any timer expired
>   */
> -bool timerlist_run_timers(QEMUTimerList *timer_list);
> +bool timerlist_run_timers(QEMUTimerList *timer_list, bool run_all);
>  
>  /**
>   * timerlist_notify:
> diff --git a/main-loop.c b/main-loop.c
> index 981bcb5..d6e93c3 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -497,6 +497,11 @@ int main_loop_wait(int nonblocking)
>  slirp_pollfds_poll(gpollfds, (ret < 0));
>  #endif
>  
> +/* CPU thread can infinitely wait for event after
> +   missing the warp */
> +if (replay_mode == REPLAY_MODE_PLAY) {
> +qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
> +}
>  qemu_clock_run_all_timers();
>  
>  return ret;
> diff --git a/qemu-timer.c b/qemu-timer.c
> index bc981a2..b6eb7b3 100644
> --- a/qemu-timer.c
> +++ b/qemu-timer.

Re: [Qemu-devel] [RFC PATCH v8 17/21] replay: replay aio requests

2015-01-30 Thread Paolo Bonzini


On 22/01/2015 09:53, Pavel Dovgalyuk wrote:
> This patch adds identifier to aio requests. ID is used for creating bottom
> halves and identifying them while replaying.
> The patch also introduces several functions that make possible replaying
> of the aio requests.
> 
> Signed-off-by: Pavel Dovgalyuk 

Ah, some obscure parts of patch 16 now get clearer. :)

But it's still not clear what is the design.  You have to document this
too, then I (or others) can start reviewing the patches.

Paolo

> ---
>  block.c|   81 
> 
>  block/block-backend.c  |   30 ++-
>  block/qcow2.c  |4 ++
>  dma-helpers.c  |6 ++-
>  hw/block/virtio-blk.c  |   10 ++---
>  hw/ide/atapi.c |   10 +++--
>  hw/ide/core.c  |   14 ---
>  include/block/block.h  |   15 +++
>  include/qemu-common.h  |2 +
>  include/sysemu/block-backend.h |   10 +
>  qemu-io-cmds.c |2 -
>  stubs/replay.c |5 ++
>  trace-events   |2 +
>  util/iov.c |4 ++
>  14 files changed, 167 insertions(+), 28 deletions(-)
> 
> diff --git a/block.c b/block.c
> index a4f45c3..7f6fa8b 100644
> --- a/block.c
> +++ b/block.c
> @@ -83,7 +83,8 @@ static BlockAIOCB *bdrv_co_aio_rw_vector(BlockDriverState 
> *bs,
>   BdrvRequestFlags flags,
>   BlockCompletionFunc *cb,
>   void *opaque,
> - bool is_write);
> + bool is_write,
> + bool aio_replay);
>  static void coroutine_fn bdrv_co_do_rw(void *opaque);
>  static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>  int64_t sector_num, int nb_sectors, BdrvRequestFlags flags);
> @@ -4425,7 +4426,19 @@ BlockAIOCB *bdrv_aio_readv(BlockDriverState *bs, 
> int64_t sector_num,
>  trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
>  
>  return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
> - cb, opaque, false);
> + cb, opaque, false, false);
> +}
> +
> +BlockAIOCB *bdrv_aio_readv_replay(BlockDriverState *bs,
> +  int64_t sector_num,
> +  QEMUIOVector *qiov, int nb_sectors,
> +  BlockCompletionFunc *cb,
> +  void *opaque)
> +{
> +trace_bdrv_aio_readv_replay(bs, sector_num, nb_sectors, opaque);
> +
> +return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
> + cb, opaque, false, true);
>  }
>  
>  BlockAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
> @@ -4435,7 +4448,19 @@ BlockAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
> int64_t sector_num,
>  trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
>  
>  return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
> - cb, opaque, true);
> + cb, opaque, true, false);
> +}
> +
> +BlockAIOCB *bdrv_aio_writev_replay(BlockDriverState *bs,
> +   int64_t sector_num,
> +   QEMUIOVector *qiov, int nb_sectors,
> +   BlockCompletionFunc *cb,
> +   void *opaque)
> +{
> +trace_bdrv_aio_writev_replay(bs, sector_num, nb_sectors, opaque);
> +
> +return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
> + cb, opaque, true, true);
>  }
>  
>  BlockAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs,
> @@ -4446,7 +4471,7 @@ BlockAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs,
>  
>  return bdrv_co_aio_rw_vector(bs, sector_num, NULL, nb_sectors,
>   BDRV_REQ_ZERO_WRITE | flags,
> - cb, opaque, true);
> + cb, opaque, true, true);
>  }
>  
>  
> @@ -4593,7 +4618,8 @@ static int multiwrite_merge(BlockDriverState *bs, 
> BlockRequest *reqs,
>   * requests. However, the fields opaque and error are left unmodified as they
>   * are used to signal failure for a single request to the caller.
>   */
> -int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int 
> num_reqs)
> +int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int 
> num_reqs,
> +bool replay)
>  {
>  MultiwriteCB *mcb;
>  int i;
> @@ -4631,7 +4657,7 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
> BlockRequest *reqs, int num_reqs)
>  bdrv_co_aio_rw_vector(bs, reqs[i].sector, reqs[i].qiov,
>

Re: [Qemu-devel] [RFC PATCH v8 18/21] replay: thread pool

2015-01-30 Thread Paolo Bonzini


On 22/01/2015 09:53, Pavel Dovgalyuk wrote:
> -return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
> +return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque,
> +  qiov ? qiov->replay : false,
> +  qiov ? qiov->replay_step : 0);

Adding the replay/replay_step to the QEMUIOVector is... weird.  You're
paying for the fact that we do not have a BlockRequest struct that is
used throughout.

I guess we can live with this until other problems are solved, then we
can discuss this with the block folks and go for a more complete solution.

Paolo



Re: [Qemu-devel] [RFC PATCH v8 21/21] replay: recording of the user input

2015-01-30 Thread Paolo Bonzini


On 22/01/2015 09:53, Pavel Dovgalyuk wrote:
> +void replay_input_event(QemuConsole *src, InputEvent *evt)
> +{
> +if (replay_mode == REPLAY_MODE_PLAY) {
> +/* Nothing */
> +} else if (replay_mode == REPLAY_MODE_RECORD) {
> +replay_add_input_event(evt);

Does replay_add_input_event ultimately call qemu_input_event_send_impl?

> +} else {
> +qemu_input_event_send_impl(src, evt);
> +}
> +}
> +

Perhaps make this and replay_input_sync_event return a bool and in the
caller do:

if (replay_input_event(src, evt)) {
qemu_input_event_send_impl(src, evt):
}

> +if (replay_mode != REPLAY_MODE_PLAY) {
> +evt = qemu_input_event_new_key(key, down);
> +if (QTAILQ_EMPTY(&kbd_queue)) {
> +qemu_input_event_send(src, evt);
> +qemu_input_event_sync();
> +if (replay_mode != REPLAY_MODE_RECORD) {
> +qapi_free_InputEvent(evt);
> +}

This is wrong.  You have different lifetimes for different modes. Please
make a copy of the event in the implementation of record mode.

Also, you do not need the "if" for replay mode.  The functions would
just do nothing.

> +} else {
> +if (replay_mode != REPLAY_MODE_NONE) {
> +fprintf(stderr, "Input queue is not supported "
> +"in record/replay mode\n");
> +exit(1);

Why?  For record mode should just work since qemu_input_event_send is
called in qemu_input_queue_process.

Replay mode can just do nothing, by returning early from
qemu_input_queue_event/qemu_input_queue_sync.

Paolo

> +}
> +qemu_input_queue_event(&kbd_queue, src, evt);
> +qemu_input_queue_sync(&kbd_queue);
> +}
>  }




Re: [Qemu-devel] [RFC PATCH v8 00/21] Deterministic replay core

2015-01-30 Thread Paolo Bonzini


On 29/01/2015 11:21, Paolo Bonzini wrote:
> 
> 
> On 28/01/2015 12:45, Pavel Dovgaluk wrote:
>> Ping?
> 
> Reviewed 13 patches out of 21.  Made some comments, but overall I'm
> really pleased.  Thanks for persisting!
> 
> Will continue tomorrow.

The second part has a few more problems:

1) some simple and stylistic (patch 15, 19-21)

2) the invasive changes in patches 16-18 are, well, invasive.  Lacking a
design and example document it is hard to review them.  The block
maintainers, also, probably have better ideas of what to do.

3) A design document is also needed for checkpoints (patch 14)

Paolo



Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Peter Maydell
On 30 January 2015 at 10:54, Peter Maydell  wrote:
> On 30 January 2015 at 10:48, Daniel P. Berrange  wrote:
>> Long term though it will be much better of AArch64 would just do PCI
>> instead of MMIO bus. Then we have proper device addressing which we
>> can control in a predictable manner that will be stable across hotplug
>> and unplug and migration.  I hear there's work on PCI for AArch64 but
>> is there a near term ETA yet ?
>
> I need to review Alex's v3 patchset; if it doesn't have any issues
> it'll probably go into target-arm.next next week and hit mainline
> shortly after that.

Incidentally if you have opinions about sensible minimum sizes of
PCI ECAM and MMIO windows in the address space now would be a good
time to air them.

-- PMM



Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Laszlo Ersek
On 01/30/15 11:48, Daniel P. Berrange wrote:
> On Fri, Jan 30, 2015 at 10:29:46AM +, Peter Maydell wrote:
>> On 30 January 2015 at 09:54, Daniel P. Berrange  wrote:
>>> While it is clear there is no solution that works correctly with all
>>> kernels, I hate to think that we're going to stick with an ordering
>>> that is clearly wrong for modern kernels, forever going forward. The
>>> aarch64 world is only just starting out, so on balance I think we
>>> should optimize for the future rather than the past, since that gives
>>> right behaviour for orders of magnitude more people in the long term.
>>
>> Yeah, I agree it's awkward. But I hate breaking people's
>> working setups, and we have no guarantee the kernel won't
>> change again in the future.
>>
>> You could try asking the kernel folk to revert that patch on
>> the basis that it breaks things...
> 
> Might be worth a shot - the patch is only a month old. Or at least do a
> followup patch to put the ordering back the way it was, rather than plain
> revert

Please note that (as far as I understand) the patch that I referenced is
indeed very new, it's not part of v3.18, but the reversal can easily be
seen with v3.18. In other words, the kernel patch I referenced
introduces no functional change, it just reorganizes stuff in the kernel
(AIUI), with the benefit of killing a superfluous field.

The reason I referenced it because its *commit message* gives good
background. If we really wanted to find the kernel change that reversed
the traversal, we'd have to talk to Grant and/or bisect the kernel.

Thanks
Laszlo

> Long term though it will be much better of AArch64 would just do PCI
> instead of MMIO bus. Then we have proper device addressing which we
> can control in a predictable manner that will be stable across hotplug
> and unplug and migration.  I hear there's work on PCI for AArch64 but
> is there a near term ETA yet ?
> 
> Regards,
> Daniel
> 




Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Laszlo Ersek
On 01/30/15 11:54, Peter Maydell wrote:
> On 30 January 2015 at 10:48, Daniel P. Berrange  wrote:
>> Long term though it will be much better of AArch64 would just do PCI
>> instead of MMIO bus. Then we have proper device addressing which we
>> can control in a predictable manner that will be stable across hotplug
>> and unplug and migration.  I hear there's work on PCI for AArch64 but
>> is there a near term ETA yet ?
> 
> I need to review Alex's v3 patchset; if it doesn't have any issues
> it'll probably go into target-arm.next next week and hit mainline
> shortly after that.

(And then a root bridge driver for it will be necessary in edk2. (I'm
unsure if the guest kernel driver is already being worked on.))

Thanks
Laszlo




Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Peter Maydell
On 30 January 2015 at 11:38, Laszlo Ersek  wrote:
> Please note that (as far as I understand) the patch that I referenced is
> indeed very new, it's not part of v3.18, but the reversal can easily be
> seen with v3.18. In other words, the kernel patch I referenced
> introduces no functional change, it just reorganizes stuff in the kernel
> (AIUI), with the benefit of killing a superfluous field.
>
> The reason I referenced it because its *commit message* gives good
> background. If we really wanted to find the kernel change that reversed
> the traversal, we'd have to talk to Grant and/or bisect the kernel.

Just as a sanity check, do we actually have an old kernel that
enumerates in the opposite order (as opposed to my two-year-old
"I'm sure this used to be right" memories...) ?

-- PMM



Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Peter Maydell
On 30 January 2015 at 11:39, Laszlo Ersek  wrote:
> On 01/30/15 11:54, Peter Maydell wrote:
>> On 30 January 2015 at 10:48, Daniel P. Berrange  wrote:
>>> Long term though it will be much better of AArch64 would just do PCI
>>> instead of MMIO bus. Then we have proper device addressing which we
>>> can control in a predictable manner that will be stable across hotplug
>>> and unplug and migration.  I hear there's work on PCI for AArch64 but
>>> is there a near term ETA yet ?
>>
>> I need to review Alex's v3 patchset; if it doesn't have any issues
>> it'll probably go into target-arm.next next week and hit mainline
>> shortly after that.
>
> (And then a root bridge driver for it will be necessary in edk2. (I'm
> unsure if the guest kernel driver is already being worked on.))

Guest kernel doesn't need any particular support because it uses
the generic pcie-driven-by-device-tree. For 64-bit ARM there's some
minor missing kernel cleanup/patch, but for 32-bit it works as is.
[see Alex's notes in patch 3/4 for detail.]

-- PMM



Re: [Qemu-devel] [PATCH v6 3/5] acpi: add build_append_namestring() helper

2015-01-30 Thread Igor Mammedov
On Wed, 28 Jan 2015 17:16:17 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jan 28, 2015 at 02:34:48PM +, Igor Mammedov wrote:
> > Use build_append_namestring() instead of build_append_nameseg()
> > So user won't have to care whether name is NameSeg, NamePath or
> > NameString.
> > 
> > See for refernce ACPI 5.0: 20.2.2 Name Objects Encoding
> > 
> 
> typo
> 
> > Signed-off-by: Igor Mammedov 
> > Reviewed-by: Claudio Fontana 
> > Acked-by: Marcel Apfelbaum 
> > ---
> > v4:
> >  * fix codding style of ACPI_DualNamePrefix & ACPI_MultiNamePrefix
> >it's imposible to use literals as suggested due to
> >g_array_append_val() requiring reference value
> > 
> > v3:
> >  assert on wrong Segcount earlier and extend condition to
> >   seg_count > 0 && seg_count <= 255
> >  as suggested by Claudio Fontana 
> > ---
> >  hw/acpi/aml-build.c | 92
> > -
> > hw/i386/acpi-build.c| 35 -
> > include/hw/acpi/aml-build.h |  2 +- 3 files changed, 108
> > insertions(+), 21 deletions(-)
> > 
> > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > index b28c1f4..ed4ab56 100644
> > --- a/hw/acpi/aml-build.c
> > +++ b/hw/acpi/aml-build.c
> > @@ -52,7 +52,7 @@ void build_append_array(GArray *array, GArray
> > *val) 
> >  #define ACPI_NAMESEG_LEN 4
> >  
> > -void GCC_FMT_ATTR(2, 3)
> > +static void GCC_FMT_ATTR(2, 3)
> >  build_append_nameseg(GArray *array, const char *format, ...)
> >  {
> >  /* It would be nicer to use g_string_vprintf but it's only
> > there in 2.22 */ @@ -71,6 +71,96 @@ build_append_nameseg(GArray
> > *array, const char *format, ...) g_array_append_vals(array, "",
> > ACPI_NAMESEG_LEN - len); }
> >  
> > +static void
> > +build_append_namestringv(GArray *array, const char *format,
> > va_list ap) +{
> > +/* It would be nicer to use g_string_vprintf but it's only
> > there in 2.22 */
> > +char *s;
> > +int len;
> > +va_list va_len;
> > +char **segs;
> > +char **segs_iter;
> > +int seg_count = 0;
> > +
> > +va_copy(va_len, ap);
> > +len = vsnprintf(NULL, 0, format, va_len);
> > +va_end(va_len);
> > +len += 1;
> > +s = g_new(typeof(*s), len);
> > +
> > +len = vsnprintf(s, len, format, ap);
> > +
> > +segs = g_strsplit(s, ".", 0);
> > +g_free(s);
> > +
> > +/* count segments */
> > +segs_iter = segs;
> > +while (*segs_iter) {
> > +++segs_iter;
> > +++seg_count;
> >
> How about we split out formatting?
> Make +build_append_namestringv acceptconst char **segs, int nsegs,
> put all code above to build_append_namestring.
Can't do this, AML API calls which accept format string will
use build_append_namestringv()



Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Laszlo Ersek
On 01/30/15 12:40, Peter Maydell wrote:
> On 30 January 2015 at 11:38, Laszlo Ersek  wrote:
>> Please note that (as far as I understand) the patch that I referenced is
>> indeed very new, it's not part of v3.18, but the reversal can easily be
>> seen with v3.18. In other words, the kernel patch I referenced
>> introduces no functional change, it just reorganizes stuff in the kernel
>> (AIUI), with the benefit of killing a superfluous field.
>>
>> The reason I referenced it because its *commit message* gives good
>> background. If we really wanted to find the kernel change that reversed
>> the traversal, we'd have to talk to Grant and/or bisect the kernel.
> 
> Just as a sanity check, do we actually have an old kernel that
> enumerates in the opposite order (as opposed to my two-year-old
> "I'm sure this used to be right" memories...) ?

I never saw one. :) That's why I asked in my first message if anyone had
actually tested that loop / comment.

Thanks,
Laszlo




Re: [Qemu-devel] address order of virtio-mmio devices

2015-01-30 Thread Peter Maydell
On 30 January 2015 at 11:48, Laszlo Ersek  wrote:
> On 01/30/15 12:40, Peter Maydell wrote:
>> Just as a sanity check, do we actually have an old kernel that
>> enumerates in the opposite order (as opposed to my two-year-old
>> "I'm sure this used to be right" memories...) ?
>
> I never saw one. :) That's why I asked in my first message if anyone had
> actually tested that loop / comment.

Well, I *think* I did, because I remember the go-round about
it and I wouldn't have written weird code like that without
a reason. But it's two years ago now...

-- PMM



[Qemu-devel] [PATCH] Fix comparisons between implicit booleans and integers.

2015-01-30 Thread Richard W.M. Jones
In GCC 5 there is a new warning (-Wlogical-not-parentheses) which
prevents you from writing:

  if ( == ) ...

A typical error would be:

kvm-all.c: In function ‘kvm_set_migration_log’:
kvm-all.c:383:54: error: logical not is only applied to the left hand side of 
comparison [-Werror=logical-not-parentheses]
 if (!!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) == enable) {

Fix a few places where the build is now broken in GCC 5.

The warning isn't even consistent, as with the place in
hw/net/virtio-net.c where I had to cast an obviously boolean
expression to bool.

Signed-off-by: Richard W.M. Jones 
---
 hw/net/virtio-net.c | 2 +-
 kvm-all.c   | 2 +-
 qemu-img.c  | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 45da34a..ba6afb8 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -121,7 +121,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t 
status)
 }
 
 if (!!n->vhost_started ==
-(virtio_net_started(n, status) && !nc->peer->link_down)) {
+(bool) (virtio_net_started(n, status) && !nc->peer->link_down)) {
 return;
 }
 if (!n->vhost_started) {
diff --git a/kvm-all.c b/kvm-all.c
index 2f21a4e..b6d4387 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -366,7 +366,7 @@ static void kvm_log_stop(MemoryListener *listener,
 }
 }
 
-static int kvm_set_migration_log(int enable)
+static int kvm_set_migration_log(bool enable)
 {
 KVMState *s = kvm_state;
 KVMSlot *mem;
diff --git a/qemu-img.c b/qemu-img.c
index 4e9a7f5..acf70fa 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -977,7 +977,8 @@ static int is_allocated_sectors_min(const uint8_t *buf, int 
n, int *pnum,
 static int compare_sectors(const uint8_t *buf1, const uint8_t *buf2, int n,
 int *pnum)
 {
-int res, i;
+bool res;
+int i;
 
 if (n <= 0) {
 *pnum = 0;
-- 
2.2.2




Re: [Qemu-devel] [PATCH 0/7] MIPS: IEEE 754-2008 features support

2015-01-30 Thread Peter Maydell
On 9 December 2014 at 01:54, Maciej W. Rozycki  wrote:
>  This patch series comprises changes to QEMU, both the MIPS backend and
> generic SoftFloat support code, to support IEEE 754-2008 features
> introduced to revision 3.50 of the MIPS Architecture as follows.

Just to let you know that:
(1) the softfloat relicensing has hit master, so this patchset isn't
blocked by anything now
(2) I would like to see a definite "we are happy to license this
patchset under the SoftFloat-2a license" for these changes, because
they were submitted before we applied the relicensing, and therefore
the "changes after $DATE will be -2a license unless otherwise stated"
note in the sourcecode can't be assumed to apply to them.

thanks
-- PMM



Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()

2015-01-30 Thread Markus Armbruster
Gonglei  writes:

> On 2015/1/30 15:46, Markus Armbruster wrote:
>
>> Gonglei  writes:
>> 
>>> On 2015/1/30 0:03, Alexander Graf wrote:
>>>


 On 29.01.15 14:29, arei.gong...@huawei.com wrote:
> From: Gonglei 
>
> If boot order is invaild or is set failed,
> exit qemu.
>
> Signed-off-by: Gonglei 

 Do we really want to kill the machine only because the boot device
 string doesn't validate?

>>>
>>> Not all of the situation. If people want to change boot order by qmp/hmp
>>> command, it just report an error, please see do_boot_set(). But if the boot
>>> order is set in qemu command line, it will exit qemu if the boot
>>> device string
>>> is invalidate, as this patch's situation, which follow the original
>>> processing
>>> way (commit ef3adf68).
>> 
>> I think Alex isn't concerned about the monitor command, but what happens
>> when boot order "once" is reset to "order" on system reset.
>> 
>> -boot errors should have been detected during command line processing
>> (strongly preferred) or initial startup (acceptable).  Detecting
>
> Yes, and it had done it just like that, please see main() of vl.c. So, 
> actually
> it wouldn't fail in the check of restore_boot_order function's calling.
> The only possible fails will happen to call boot_set_handler(). Take
> x86 pc machine example, set_boot_dev() callback  may return errors.

I don't like unreachable error messages.  If qemu_boot_set() can't fail
in restore_boot_order(), then simply assert it doesn't fail, by passing
&error_abort.

>> configuration errors during operation is nasty.  In cases where we can't
>> avoid it (and I'm not sure this is one), we need to consider very
>> carefully whether the error should be fatal.
>
> Indeed, maybe we only need to set boot order failed  and report
> an error message in this scenario, do you agree?
>
> Regards,
> -Gonglei



Re: [Qemu-devel] [PATCH] Fix comparisons between implicit booleans and integers.

2015-01-30 Thread Paolo Bonzini


On 30/01/2015 12:52, Richard W.M. Jones wrote:
> In GCC 5 there is a new warning (-Wlogical-not-parentheses) which
> prevents you from writing:
> 
>   if ( == ) ...
> 
> A typical error would be:
> 
> kvm-all.c: In function ‘kvm_set_migration_log’:
> kvm-all.c:383:54: error: logical not is only applied to the left hand side of 
> comparison [-Werror=logical-not-parentheses]
>  if (!!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) == enable) {
> 
> Fix a few places where the build is now broken in GCC 5.
> 
> The warning isn't even consistent, as with the place in
> hw/net/virtio-net.c where I had to cast an obviously boolean
> expression to bool.

The other two places are fine cleanups, but really I would just suppress
the warning in configure...

Paolo

> Signed-off-by: Richard W.M. Jones 
> ---
>  hw/net/virtio-net.c | 2 +-
>  kvm-all.c   | 2 +-
>  qemu-img.c  | 3 ++-
>  3 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 45da34a..ba6afb8 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -121,7 +121,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t 
> status)
>  }
>  
>  if (!!n->vhost_started ==
> -(virtio_net_started(n, status) && !nc->peer->link_down)) {
> +(bool) (virtio_net_started(n, status) && !nc->peer->link_down)) {
>  return;
>  }
>  if (!n->vhost_started) {
> diff --git a/kvm-all.c b/kvm-all.c
> index 2f21a4e..b6d4387 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -366,7 +366,7 @@ static void kvm_log_stop(MemoryListener *listener,
>  }
>  }
>  
> -static int kvm_set_migration_log(int enable)
> +static int kvm_set_migration_log(bool enable)
>  {
>  KVMState *s = kvm_state;
>  KVMSlot *mem;
> diff --git a/qemu-img.c b/qemu-img.c
> index 4e9a7f5..acf70fa 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -977,7 +977,8 @@ static int is_allocated_sectors_min(const uint8_t *buf, 
> int n, int *pnum,
>  static int compare_sectors(const uint8_t *buf1, const uint8_t *buf2, int n,
>  int *pnum)
>  {
> -int res, i;
> +bool res;
> +int i;
>  
>  if (n <= 0) {
>  *pnum = 0;
> 



Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()

2015-01-30 Thread Gonglei
On 2015/1/30 20:01, Markus Armbruster wrote:

> Gonglei  writes:
> 
>> On 2015/1/30 15:46, Markus Armbruster wrote:
>>
>>> Gonglei  writes:
>>>
 On 2015/1/30 0:03, Alexander Graf wrote:

>
>
> On 29.01.15 14:29, arei.gong...@huawei.com wrote:
>> From: Gonglei 
>>
>> If boot order is invaild or is set failed,
>> exit qemu.
>>
>> Signed-off-by: Gonglei 
>
> Do we really want to kill the machine only because the boot device
> string doesn't validate?
>

 Not all of the situation. If people want to change boot order by qmp/hmp
 command, it just report an error, please see do_boot_set(). But if the boot
 order is set in qemu command line, it will exit qemu if the boot
 device string
 is invalidate, as this patch's situation, which follow the original
 processing
 way (commit ef3adf68).
>>>
>>> I think Alex isn't concerned about the monitor command, but what happens
>>> when boot order "once" is reset to "order" on system reset.
>>>
>>> -boot errors should have been detected during command line processing
>>> (strongly preferred) or initial startup (acceptable).  Detecting
>>
>> Yes, and it had done it just like that, please see main() of vl.c. So, 
>> actually
>> it wouldn't fail in the check of restore_boot_order function's calling.
>> The only possible fails will happen to call boot_set_handler(). Take
>> x86 pc machine example, set_boot_dev() callback  may return errors.
> 
> I don't like unreachable error messages.  If qemu_boot_set() can't fail
> in restore_boot_order(), then simply assert it doesn't fail, by passing
> &error_abort.
> 

Sorry, I meant the validate_bootdevices() can't fail in restore_boot_order(),
but boot_set_handler(boot_set_opaque, boot_order, errp) may fail, such as
set_boot_dev(). For example:
x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -boot 
menu=on,order=nbcdep,once=c -monitor stdio -vnc :0
QEMU 2.2.50 monitor - type 'help' for more information
(qemu) system_reset
(qemu) qemu-system-x86_64: Too many boot devices for PC

Regards,
-Gonglei

>>> configuration errors during operation is nasty.  In cases where we can't
>>> avoid it (and I'm not sure this is one), we need to consider very
>>> carefully whether the error should be fatal.
>>
>> Indeed, maybe we only need to set boot order failed  and report
>> an error message in this scenario, do you agree?
>>
>> Regards,
>> -Gonglei






Re: [Qemu-devel] [PATCH] vhost-scsi: introduce an ioctl to get the minimum tpgt

2015-01-30 Thread Gonglei
On 2015/1/26 21:13, Gonglei (Arei) wrote:

> From: Gonglei 
> 
> In order to support to assign a boot order for
> vhost-scsi device, we should get the tpgt for
> user level (such as Qemu). and at present, we
> only support the minimum tpgt can boot.
> 

Ping...

> Signed-off-by: Gonglei 
> Signed-off-by: Bo Su 
> ---
>  drivers/vhost/scsi.c   | 41 +
>  include/uapi/linux/vhost.h |  2 ++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index d695b16..12e79b9 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -1522,6 +1522,38 @@ err_dev:
>   return ret;
>  }
>  
> +static int vhost_scsi_get_first_tpgt(
> + struct vhost_scsi *vs,
> + struct vhost_scsi_target *t)
> +{
> + struct tcm_vhost_tpg *tv_tpg;
> + struct tcm_vhost_tport *tv_tport;
> + int tpgt = -1;
> +
> + mutex_lock(&tcm_vhost_mutex);
> + mutex_lock(&vs->dev.mutex);
> +
> + list_for_each_entry(tv_tpg, &tcm_vhost_list, tv_tpg_list) {
> + tv_tport = tv_tpg->tport;
> +
> + if (!strcmp(tv_tport->tport_name, t->vhost_wwpn)) {
> + if (tpgt < 0)
> + tpgt = tv_tpg->tport_tpgt;
> + else if (tpgt > tv_tpg->tport_tpgt)
> + tpgt = tv_tpg->tport_tpgt;
> + }
> + }
> +
> + mutex_unlock(&vs->dev.mutex);
> + mutex_unlock(&tcm_vhost_mutex);
> +
> + if (tpgt < 0)
> + return -ENXIO;
> +
> + t->vhost_tpgt = tpgt;
> + return 0;
> +}
> +
>  static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
>  {
>   struct vhost_virtqueue *vq;
> @@ -1657,6 +1689,15 @@ vhost_scsi_ioctl(struct file *f,
>   if (put_user(events_missed, eventsp))
>   return -EFAULT;
>   return 0;
> + case VHOST_SCSI_GET_TPGT:
> + if (copy_from_user(&backend, argp, sizeof(backend)))
> + return -EFAULT;
> + r = vhost_scsi_get_first_tpgt(vs, &backend);
> + if (r < 0)
> + return r;
> + if (copy_to_user(argp, &backend, sizeof(backend)))
> + return -EFAULT;
> + return 0;
>   case VHOST_GET_FEATURES:
>   features = VHOST_SCSI_FEATURES;
>   if (copy_to_user(featurep, &features, sizeof features))
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index bb6a5b4..5d350f7 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -155,4 +155,6 @@ struct vhost_scsi_target {
>  #define VHOST_SCSI_SET_EVENTS_MISSED _IOW(VHOST_VIRTIO, 0x43, __u32)
>  #define VHOST_SCSI_GET_EVENTS_MISSED _IOW(VHOST_VIRTIO, 0x44, __u32)
>  
> +#define VHOST_SCSI_GET_TPGT _IOW(VHOST_VIRTIO, 0x45, struct 
> vhost_scsi_target)
> +
>  #endif






Re: [Qemu-devel] [RFC][PATCH 1/1] libxl: add one machine property to support IGD GFX passthrough

2015-01-30 Thread Wei Liu
On Fri, Jan 30, 2015 at 08:56:48AM +0800, Chen, Tiejun wrote:
[...]
> >>>
> >>>Just remember to handle old option in libxl if your old option is already
> >>>released by some older version of QEMUs.
> >>
> >>I just drop that old option, -gfx_passthru, if we're under qemu upstream
> >>circumstance, like this,
> >>
> >
> >The question is, is there any version of qemu upstream that has
> >been released that has the old option (-gfx-passthru)?
> 
> No. Just now we're starting to support IGD passthrough in qemu upstream.
> 

Right, as of QEMU 2.2.0 there's no support of IGD passthrough in QMEU
upstream.

> >
> >This gives us a situation that we need to support both the old
> >(-gfx-passthru) and new (-igd-passthru) options. Presumably we (libxl)
> >would need to fork a qemu process to determine which option it has and
> >pass the right one.
> >
> >Or you can try to keep both old and new option at the same time but
> 
> Yeah, actually I also have considered to keep both two options at the same
> time. Its really friendly to any qemu version.
> 
> >deprecate the old one. Then in a few qemu release cycles later (or
> 
> This should be like 'accel=kvm' versus 'enable-kvm' in qemu upstream.
> They're coexisted now but just the former is a modern option.
> 
> >probably one year or two?) you can finally remove the old one. The point
> >is that to give downstream (in this case, Xen) time to cope with the
> >change.
> 
> Here I'm fine to this way.
> 
> So Gerd,
> 

So you don't actually need to ask Gerd this question because there is no
old option to keep in qemu upstream.

Libxl (or any sensible toolstack) will just do the right thing to either
pass -igd-passthru (or whatever you guys agree upon) to qemu upstream or
pass -gfx-passthru to qemu traditional. :-)

Wei.

> What about this?
> 





Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()

2015-01-30 Thread Markus Armbruster
Gonglei  writes:

> On 2015/1/30 20:01, Markus Armbruster wrote:
>
>> Gonglei  writes:
>> 
>>> On 2015/1/30 15:46, Markus Armbruster wrote:
>>>
 Gonglei  writes:

> On 2015/1/30 0:03, Alexander Graf wrote:
>
>>
>>
>> On 29.01.15 14:29, arei.gong...@huawei.com wrote:
>>> From: Gonglei 
>>>
>>> If boot order is invaild or is set failed,
>>> exit qemu.
>>>
>>> Signed-off-by: Gonglei 
>>
>> Do we really want to kill the machine only because the boot device
>> string doesn't validate?
>>
>
> Not all of the situation. If people want to change boot order by qmp/hmp
> command, it just report an error, please see do_boot_set(). But if the 
> boot
> order is set in qemu command line, it will exit qemu if the boot
> device string
> is invalidate, as this patch's situation, which follow the original
> processing
> way (commit ef3adf68).

 I think Alex isn't concerned about the monitor command, but what happens
 when boot order "once" is reset to "order" on system reset.

 -boot errors should have been detected during command line processing
 (strongly preferred) or initial startup (acceptable).  Detecting
>>>
>>> Yes, and it had done it just like that, please see main() of
>>> vl.c. So, actually
>>> it wouldn't fail in the check of restore_boot_order function's calling.
>>> The only possible fails will happen to call boot_set_handler(). Take
>>> x86 pc machine example, set_boot_dev() callback  may return errors.
>> 
>> I don't like unreachable error messages.  If qemu_boot_set() can't fail
>> in restore_boot_order(), then simply assert it doesn't fail, by passing
>> &error_abort.
>> 
>
> Sorry, I meant the validate_bootdevices() can't fail in restore_boot_order(),
> but boot_set_handler(boot_set_opaque, boot_order, errp) may fail, such as
> set_boot_dev(). For example:
> x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -boot
> menu=on,order=nbcdep,once=c -monitor stdio -vnc :0
> QEMU 2.2.50 monitor - type 'help' for more information
> (qemu) system_reset
> (qemu) qemu-system-x86_64: Too many boot devices for PC

The value of parameter order should be checked "during command line
processing (strongly preferred) or initial startup (acceptable)" if at
all possible.  Is it possible?



[Qemu-devel] [PATCH] target-arm: check that LSB <= MSB in BFI instruction

2015-01-30 Thread Kirill Batuzov
The documentation states that if LSB > MSB in BFI instruction behaviour
is unpredictable. Currently QEMU crashes because of assertion failure in
this case:

tcg/tcg-op.h:2061: tcg_gen_deposit_i32: Assertion `len <= 32' failed.

While assertion failure may meet the "unpredictable" definition this
behaviour is undesirable because it allows an unprivileged guest program
to crash the emulator with the OS and other programs.

This patch addresses the issue by throwing illegal instruction exception
if LSB > MSB. Only ARM decoder is affected because Thumb decoder already
has this check in place.

To reproduce issue run the following program

int main(void) {
asm volatile (".long 0x07c00c12" :: );
return 0;
}

compiled with
  gcc -marm -static badop_arm.c -o badop_arm

Signed-off-by: Kirill Batuzov 
---
 target-arm/translate.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index bdfcdf1..2821289 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -8739,6 +8739,8 @@ static void disas_arm_insn(DisasContext *s, unsigned int 
insn)
 ARCH(6T2);
 shift = (insn >> 7) & 0x1f;
 i = (insn >> 16) & 0x1f;
+if (i < shift)
+goto illegal_op;
 i = i + 1 - shift;
 if (rm == 15) {
 tmp = tcg_temp_new_i32();
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()

2015-01-30 Thread Gonglei
On 2015/1/30 20:32, Markus Armbruster wrote:

> Gonglei  writes:
> 
>> On 2015/1/30 20:01, Markus Armbruster wrote:
>>
>>> Gonglei  writes:
>>>
 On 2015/1/30 15:46, Markus Armbruster wrote:

> Gonglei  writes:
>
>> On 2015/1/30 0:03, Alexander Graf wrote:
>>
>>>
>>>
>>> On 29.01.15 14:29, arei.gong...@huawei.com wrote:
 From: Gonglei 

 If boot order is invaild or is set failed,
 exit qemu.

 Signed-off-by: Gonglei 
>>>
>>> Do we really want to kill the machine only because the boot device
>>> string doesn't validate?
>>>
>>
>> Not all of the situation. If people want to change boot order by qmp/hmp
>> command, it just report an error, please see do_boot_set(). But if the 
>> boot
>> order is set in qemu command line, it will exit qemu if the boot
>> device string
>> is invalidate, as this patch's situation, which follow the original
>> processing
>> way (commit ef3adf68).
>
> I think Alex isn't concerned about the monitor command, but what happens
> when boot order "once" is reset to "order" on system reset.
>
> -boot errors should have been detected during command line processing
> (strongly preferred) or initial startup (acceptable).  Detecting

 Yes, and it had done it just like that, please see main() of
 vl.c. So, actually
 it wouldn't fail in the check of restore_boot_order function's calling.
 The only possible fails will happen to call boot_set_handler(). Take
 x86 pc machine example, set_boot_dev() callback  may return errors.
>>>
>>> I don't like unreachable error messages.  If qemu_boot_set() can't fail
>>> in restore_boot_order(), then simply assert it doesn't fail, by passing
>>> &error_abort.
>>>
>>
>> Sorry, I meant the validate_bootdevices() can't fail in restore_boot_order(),
>> but boot_set_handler(boot_set_opaque, boot_order, errp) may fail, such as
>> set_boot_dev(). For example:
>> x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -boot
>> menu=on,order=nbcdep,once=c -monitor stdio -vnc :0
>> QEMU 2.2.50 monitor - type 'help' for more information
>> (qemu) system_reset
>> (qemu) qemu-system-x86_64: Too many boot devices for PC
> 
> The value of parameter order should be checked "during command line
> processing (strongly preferred) or initial startup (acceptable)" if at
> all possible.  Is it possible?

Either 'once' option or 'order' option can take effect for -boot at the same 
time,
that is say initial startup processing can check only one. Besides, the check 
is just for
corresponding machine type, so command line processing also can't do it.

Regards,
-Gonglei




Re: [Qemu-devel] [PATCH] target-arm: check that LSB <= MSB in BFI instruction

2015-01-30 Thread Peter Maydell
On 30 January 2015 at 12:36, Kirill Batuzov  wrote:
> The documentation states that if LSB > MSB in BFI instruction behaviour
> is unpredictable. Currently QEMU crashes because of assertion failure in
> this case:
>
> tcg/tcg-op.h:2061: tcg_gen_deposit_i32: Assertion `len <= 32' failed.
>
> While assertion failure may meet the "unpredictable" definition this
> behaviour is undesirable because it allows an unprivileged guest program
> to crash the emulator with the OS and other programs.

Thanks for this bug fix. Some minor nits:

> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index bdfcdf1..2821289 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -8739,6 +8739,8 @@ static void disas_arm_insn(DisasContext *s, unsigned 
> int insn)
>  ARCH(6T2);
>  shift = (insn >> 7) & 0x1f;
>  i = (insn >> 16) & 0x1f;
> +if (i < shift)
> +goto illegal_op;

This needs braces to comply with coding style. checkpatch.pl
should warn you about this.

I also like to comment this kind of check with
 /* UNPREDICTABLE; we choose to UNDEF */

>  i = i + 1 - shift;
>  if (rm == 15) {
>  tmp = tcg_temp_new_i32();

I checked the Thumb decoder, and that code seems to have
this test already, so it's just the ARM decoder that
needs fixing.

thanks
-- PMM



Re: [Qemu-devel] -device xen-platform crashes

2015-01-30 Thread Stefano Stabellini
On Fri, 30 Jan 2015, Markus Armbruster wrote:
> Stefano Stabellini  writes:
> 
> > On Thu, 29 Jan 2015, Markus Armbruster wrote:
> >> Stefano Stabellini  writes:
> >> 
> >> > On Thu, 29 Jan 2015, Markus Armbruster wrote:
> >> >> Reproducer: qemu -nodefaults -S -display none -device xen-platform
> >> >> 
> >> >> Yes, xen-platform makes no sense without Xen, but it shouldn't crash.
> >> >
> >> > Is it just a matter of doing the following?
> >> >
> >> > diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
> >> > index 28b324a..40ae1f3 100644
> >> > --- a/hw/i386/xen/xen_platform.c
> >> > +++ b/hw/i386/xen/xen_platform.c
> >> > @@ -177,6 +177,10 @@ static void platform_fixed_ioport_writeb(void 
> >> > *opaque, uint32_t addr, uint32_t v
> >> >  {
> >> >  PCIXenPlatformState *s = opaque;
> >> >  
> >> > +if (!xen_enabled()) {
> >> > +return;
> >> > +}
> >> > +
> >> >  switch (addr) {
> >> >  case 0: /* Platform flags */ {
> >> >  hvmmem_type_t mem_type = (val & PFFLAG_ROM_LOCK) ?
> >> 
> >> Fixes the crash for me.
> >> 
> >> Should Xen-only devices fail to realize when !xen_enabled()?
> >  
> > Accelerators are configured after device registration unfortunately.
> 
> Realization happens even later, doesn't it?

Yes, you are right.


> With this patch:
> 
> @@ -389,6 +393,11 @@ static int xen_platform_initfn(PCIDevice *dev)
>  PCIXenPlatformState *d = XEN_PLATFORM(dev);
>  uint8_t *pci_conf;
> 
> +if (!xen_enabled()) {
> +error_report("Nope");
> +return -1;
> +}
> +
>  pci_conf = dev->config;
> 
>  pci_set_word(pci_conf + PCI_COMMAND, PCI_COMMAND_IO | 
> PCI_COMMAND_MEMORY);
> 
> I get:
> 
> $ upstream-qemu -nodefaults -S -display none -device xen-platform
> upstream-qemu: -device xen-platform: Nope
> upstream-qemu: -device xen-platform: Device initialization failed.
> upstream-qemu: -device xen-platform: Device 'xen-platform' could not be 
> initialized
> [Exit 1 ]
> 
> Note that error_report() is a bit problematic in init methods.  The best
> solution is to convert the device to realize (requires my "pci: Partial
> conversion to realize" series) and use error_setg().

I agree that this is much better than breaking at runtime for no clear
reason. Are you going to submit a proper patch?



Re: [Qemu-devel] [RFC PATCH v8 04/21] replay: internal functions for replay log

2015-01-30 Thread Pavel Dovgaluk
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> On 22/01/2015 09:51, Pavel Dovgalyuk wrote:
> > This patch adds functions to perform read and write operations
> > with replay log.
> >
> > Signed-off-by: Pavel Dovgalyuk 
> > ---
> > +void replay_check_error(void)
> 
> Could this be static? (I haven't checked).

No, because it is used from several replay files.

> > +{
> > +if (replay_file) {
> > +if (feof(replay_file)) {
> > +fprintf(stderr, "replay file is over\n");
> > +exit(1);
> 
> Perhaps qemu_system_vmstop_request_prepare +
> qemu_system_vmstop_request(RUN_STATE_PAUSED) instead of exit?  Those two
> functions are thread-safe.

There is no need to stop when replay file is over (because we cannot replay 
more).
Should we send shutdown request instead?


Pavel Dovgalyuk




[Qemu-devel] [PATCH v2] target-arm: check that LSB <= MSB in BFI instruction

2015-01-30 Thread Kirill Batuzov
The documentation states that if LSB > MSB in BFI instruction behaviour
is unpredictable. Currently QEMU crashes because of assertion failure in
this case:

tcg/tcg-op.h:2061: tcg_gen_deposit_i32: Assertion `len <= 32' failed.

While assertion failure may meet the "unpredictable" definition this
behaviour is undesirable because it allows an unprivileged guest program
to crash the emulator with the OS and other programs.

This patch addresses the issue by throwing illegal instruction exception
if LSB > MSB. Only ARM decoder is affected because Thumb decoder already
has this check in place.

To reproduce issue run the following program

int main(void) {
asm volatile (".long 0x07c00c12" :: );
return 0;
}

compiled with
  gcc -marm -static badop_arm.c -o badop_arm

Signed-off-by: Kirill Batuzov 
---
 target-arm/translate.c |4 
 1 file changed, 4 insertions(+)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index bdfcdf1..2c1c2a7 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -8739,6 +8739,10 @@ static void disas_arm_insn(DisasContext *s, unsigned int 
insn)
 ARCH(6T2);
 shift = (insn >> 7) & 0x1f;
 i = (insn >> 16) & 0x1f;
+if (i < shift) {
+/* UNPREDICTABLE; we choose to UNDEF */
+goto illegal_op;
+}
 i = i + 1 - shift;
 if (rm == 15) {
 tmp = tcg_temp_new_i32();
-- 
1.7.10.4




Re: [Qemu-devel] [RFC PATCH v8 04/21] replay: internal functions for replay log

2015-01-30 Thread Paolo Bonzini


On 30/01/2015 13:56, Pavel Dovgaluk wrote:
> > Could this be static? (I haven't checked).
> 
> No, because it is used from several replay files.

I wonder if that's a layering violation.

> > Perhaps qemu_system_vmstop_request_prepare +
> > qemu_system_vmstop_request(RUN_STATE_PAUSED) instead of exit?  Those two
> > functions are thread-safe.
>
> There is no need to stop when replay file is over (because we cannot replay 
> more).
> Should we send shutdown request instead?

I thought about it.  I think no, because shutdown is irreversible (see
runstate_needs_reset).  Just pausing seemed to be the right compromise,
and then the next "cont" can run the VM out of replay mode.

Paolo



Re: [Qemu-devel] [RFC PATCH v8 04/21] replay: internal functions for replay log

2015-01-30 Thread Mark Burton
I believe thats what we concluded too
Cheers
Mark.

> On 30 Jan 2015, at 14:06, Paolo Bonzini  wrote:
> 
> 
> 
> On 30/01/2015 13:56, Pavel Dovgaluk wrote:
>>> Could this be static? (I haven't checked).
>> 
>> No, because it is used from several replay files.
> 
> I wonder if that's a layering violation.
> 
>>> Perhaps qemu_system_vmstop_request_prepare +
>>> qemu_system_vmstop_request(RUN_STATE_PAUSED) instead of exit?  Those two
>>> functions are thread-safe.
>> 
>> There is no need to stop when replay file is over (because we cannot replay 
>> more).
>> Should we send shutdown request instead?
> 
> I thought about it.  I think no, because shutdown is irreversible (see
> runstate_needs_reset).  Just pausing seemed to be the right compromise,
> and then the next "cont" can run the VM out of replay mode.
> 
> Paolo


 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

+33 (0)603762104
mark.burton




Re: [Qemu-devel] [PATCH v2 1/3] vnc: fix qemu crash when not configure vnc option

2015-01-30 Thread Don Slutz
On 01/29/15 21:14, arei.gong...@huawei.com wrote:
> From: Gonglei 
> 
> Add missing vnc options: to, ipv4, ipv6 and fix
> qemu crash.
> 
> Reproducer:
> $ x86_64-softmmu/qemu-system-x86_64
> qemu-system-x86_64: Invalid parameter 'to'
> Segmentation fault (core dumped)
> 
> BTW the patch fix the below bug:
> https://bugs.launchpad.net/qemu/+bug/1414222
> 
> Signed-off-by: Gonglei 
> ---

Looks good,

Reviewed-by: Don Slutz 

   -Don Slutz

>  ui/vnc.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/ui/vnc.c b/ui/vnc.c
> index a742c90..08b8b24 100644
> --- a/ui/vnc.c
> +++ b/ui/vnc.c
> @@ -3276,6 +3276,15 @@ static QemuOptsList qemu_vnc_opts = {
>  .name = "connections",
>  .type = QEMU_OPT_NUMBER,
>  },{
> +.name = "to",
> +.type = QEMU_OPT_NUMBER,
> +},{
> +.name = "ipv4",
> +.type = QEMU_OPT_BOOL,
> +},{
> +.name = "ipv6",
> +.type = QEMU_OPT_BOOL,
> +},{
>  .name = "password",
>  .type = QEMU_OPT_BOOL,
>  },{
> 



[Qemu-devel] [PATCH v7 3/5] acpi: drop min-bytes in build_package()

2015-01-30 Thread Igor Mammedov
Signed-off-by: Igor Mammedov 
Reviewed-by: Claudio Fontana 
Reviewed-by: Marcel Apfelbaum 
---
 hw/acpi/aml-build.c | 14 --
 hw/i386/acpi-build.c| 13 ++---
 include/hw/acpi/aml-build.h |  4 ++--
 3 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 6a02474..bcb288e 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -157,7 +157,7 @@ enum {
 PACKAGE_LENGTH_4BYTE_SHIFT = 20,
 };
 
-void build_prepend_package_length(GArray *package, unsigned min_bytes)
+void build_prepend_package_length(GArray *package)
 {
 uint8_t byte;
 unsigned length = package->len;
@@ -173,11 +173,6 @@ void build_prepend_package_length(GArray *package, 
unsigned min_bytes)
 length_bytes = 4;
 }
 
-/* Force length to at least min_bytes.
- * This wastes memory but that's how bios did it.
- */
-length_bytes = MAX(length_bytes, min_bytes);
-
 /* PkgLength is the length of the inclusive length of the data. */
 length += length_bytes;
 
@@ -210,15 +205,15 @@ void build_prepend_package_length(GArray *package, 
unsigned min_bytes)
 build_prepend_byte(package, byte);
 }
 
-void build_package(GArray *package, uint8_t op, unsigned min_bytes)
+void build_package(GArray *package, uint8_t op)
 {
-build_prepend_package_length(package, min_bytes);
+build_prepend_package_length(package);
 build_prepend_byte(package, op);
 }
 
 void build_extop_package(GArray *package, uint8_t op)
 {
-build_package(package, op, 1);
+build_package(package, op);
 build_prepend_byte(package, 0x5B); /* ExtOpPrefix */
 }
 
@@ -262,4 +257,3 @@ void build_append_int(GArray *table, uint32_t value)
 build_append_value(table, value, 4);
 }
 }
-
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4784756..6a1d9ab 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -296,7 +296,7 @@ static void build_append_and_cleanup_method(GArray *device, 
GArray *method)
 {
 uint8_t op = 0x14; /* MethodOp */
 
-build_package(method, op, 0);
+build_package(method, op);
 
 build_append_array(device, method);
 build_free_array(method);
@@ -317,7 +317,7 @@ static void build_append_notify_target_ifequal(GArray 
*method,
 build_append_byte(notify, 0x69); /* Arg1Op */
 
 /* Pack it up */
-build_package(notify, op, 1);
+build_package(notify, op);
 
 build_append_array(method, notify);
 
@@ -830,7 +830,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 build_append_byte(notify, 0x69); /* Arg1Op */
 
 /* Pack it up */
-build_package(notify, op, 0);
+build_package(notify, op);
 
 build_append_array(method, notify);
 
@@ -871,7 +871,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 if (bus->parent_dev) {
 build_extop_package(bus_table, op);
 } else {
-build_package(bus_table, op, 0);
+build_package(bus_table, op);
 }
 
 /* Append our bus description to parent table */
@@ -994,7 +994,7 @@ build_ssdt(GArray *table_data, GArray *linker,
 build_append_byte(package, b);
 }
 
-build_package(package, op, 2);
+build_package(package, op);
 build_append_array(sb_scope, package);
 build_free_array(package);
 }
@@ -1042,8 +1042,7 @@ build_ssdt(GArray *table_data, GArray *linker,
 build_append_array(sb_scope, hotplug_state.device_table);
 build_pci_bus_state_cleanup(&hotplug_state);
 }
-
-build_package(sb_scope, op, 3);
+build_package(sb_scope, op);
 build_append_array(table_data, sb_scope);
 build_free_array(sb_scope);
 }
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index fd50625..199f003 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -14,8 +14,8 @@ void build_append_array(GArray *array, GArray *val);
 void GCC_FMT_ATTR(2, 3)
 build_append_namestring(GArray *array, const char *format, ...);
 
-void build_prepend_package_length(GArray *package, unsigned min_bytes);
-void build_package(GArray *package, uint8_t op, unsigned min_bytes);
+void build_prepend_package_length(GArray *package);
+void build_package(GArray *package, uint8_t op);
 void build_append_value(GArray *table, uint32_t value, int size);
 void build_append_int(GArray *table, uint32_t value);
 void build_extop_package(GArray *package, uint8_t op);
-- 
1.8.3.1




[Qemu-devel] [PATCH v7 0/5] pc: acpi: various fixes and cleanups

2015-01-30 Thread Igor Mammedov
NOTE to maintainer: please update test data (ACPI blobs) in test cases

changes from v6:
 * drop "[PATCH v6 1/5] pc: acpi-build: cleanup AcpiPmInfo initialization"
 * fixup and cleanup build_append_namestring patch as Michael requested
 * add extra patch based on "acpi-build: skip hotplugged bridges"
   http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg04085.html
   which applies on top of this series

changes from v5:
 * fix codding style issue with var names
 * fix Copyright from 2014 to 2015
 * rename acpi-build-utils.[ch] to aml-build.[ch]

changes from v4:
 * rebased on top of PCI tree, dropping 2 patches
   that are already there

changes from v3:
 * rename acpi_gen_utils.[ch] to acpi-build-utils.[ch]
 * copy GLP license block from acpi-build.c
 * assert on wrong Segcount earlier and extend condition to
   seg_count > 0 && seg_count <= 255
 * drop "pc: acpi: decribe bridge device as not hotpluggable"
 * keep original logic of creating bridge devices as it was done
   in 133a2da48 "pc: acpi: generate AML only for PCI0 ..."
 * if bus is non hotpluggable, add child slots to bus as non
   hotpluggable as it was done in original code.

changes from v2:
 * codding style fixups
 * check for SegCount earlier
 * use hotpluggable device object instead of not hotpluggable
for non present devices, and add it only when bus itself is
hotpluggable

changes from v1:
 * drop: [PATCH 7/9] acpi: replace opencoded notify codes with named values
 * use Michael's suggestion to improve build_append_nameseg()
 * drop long scope names and go back to recursion,
   but still significantly simplify building of PCI tree

this series is an attempt to shave off a bunch of
not directly related patches from already big dynamic
AML series (although it's dependency for it)

Tested: on XPsp3 to WS2012R2 and REHL6/7 guests.

Git tree for testing:
 https://github.com/imammedo/qemu/commits/acpi_pci_gen_simplification_v7



Igor Mammedov (5):
  acpi: move generic aml building helpers into dedictated file
  acpi: add build_append_namestring() helper
  acpi: drop min-bytes in build_package()
  pc: acpi-build: simplify PCI bus tree generation
  acpi-build: skip hotplugged bridges

 hw/acpi/Makefile.objs   |   1 +
 hw/acpi/aml-build.c | 259 +
 hw/i386/acpi-build.c| 456 ++--
 include/hw/acpi/aml-build.h |  23 +++
 4 files changed, 380 insertions(+), 359 deletions(-)
 create mode 100644 hw/acpi/aml-build.c
 create mode 100644 include/hw/acpi/aml-build.h

-- 
1.8.3.1




[Qemu-devel] [PATCH v7 1/5] acpi: move generic aml building helpers into dedictated file

2015-01-30 Thread Igor Mammedov
the will be later used for composing AML primitives
and all that could be reused later for ARM machines
as well.

Signed-off-by: Igor Mammedov 
Acked-by: Marcel Apfelbaum 
---
v4:
  * rename acpi_gen_utils.[ch] to aml-build.[ch]
  * fix Copyright from 2014 to 2015
v3:
  * rename acpi_gen_utils.[ch] to acpi-build-utils.[ch]
  * copy GLP license block from acpi-build.c
v2:
  * fix wrong ident in moved code

fixup

rename to hw/acpi/aml-build.c
---
 hw/acpi/Makefile.objs   |   1 +
 hw/acpi/aml-build.c | 187 
 hw/i386/acpi-build.c| 162 +-
 include/hw/acpi/aml-build.h |  23 ++
 4 files changed, 213 insertions(+), 160 deletions(-)
 create mode 100644 hw/acpi/aml-build.c
 create mode 100644 include/hw/acpi/aml-build.h

diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index ee82073..b9fefa7 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -2,3 +2,4 @@ common-obj-$(CONFIG_ACPI) += core.o piix4.o ich9.o pcihp.o 
cpu_hotplug.o
 common-obj-$(CONFIG_ACPI) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI) += acpi_interface.o
 common-obj-$(CONFIG_ACPI) += bios-linker-loader.o
+common-obj-$(CONFIG_ACPI) += aml-build.o
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
new file mode 100644
index 000..b28c1f4
--- /dev/null
+++ b/hw/acpi/aml-build.c
@@ -0,0 +1,187 @@
+/* Support for generating ACPI tables and passing them to Guests
+ *
+ * Copyright (C) 2015 Red Hat Inc
+ *
+ * Author: Michael S. Tsirkin 
+ * Author: Igor Mammedov 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "hw/acpi/aml-build.h"
+
+GArray *build_alloc_array(void)
+{
+return g_array_new(false, true /* clear */, 1);
+}
+
+void build_free_array(GArray *array)
+{
+g_array_free(array, true);
+}
+
+void build_prepend_byte(GArray *array, uint8_t val)
+{
+g_array_prepend_val(array, val);
+}
+
+void build_append_byte(GArray *array, uint8_t val)
+{
+g_array_append_val(array, val);
+}
+
+void build_append_array(GArray *array, GArray *val)
+{
+g_array_append_vals(array, val->data, val->len);
+}
+
+#define ACPI_NAMESEG_LEN 4
+
+void GCC_FMT_ATTR(2, 3)
+build_append_nameseg(GArray *array, const char *format, ...)
+{
+/* It would be nicer to use g_string_vprintf but it's only there in 2.22 */
+char s[] = "";
+int len;
+va_list args;
+
+va_start(args, format);
+len = vsnprintf(s, sizeof s, format, args);
+va_end(args);
+
+assert(len <= ACPI_NAMESEG_LEN);
+
+g_array_append_vals(array, s, len);
+/* Pad up to ACPI_NAMESEG_LEN characters if necessary. */
+g_array_append_vals(array, "", ACPI_NAMESEG_LEN - len);
+}
+
+/* 5.4 Definition Block Encoding */
+enum {
+PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */
+PACKAGE_LENGTH_2BYTE_SHIFT = 4,
+PACKAGE_LENGTH_3BYTE_SHIFT = 12,
+PACKAGE_LENGTH_4BYTE_SHIFT = 20,
+};
+
+void build_prepend_package_length(GArray *package, unsigned min_bytes)
+{
+uint8_t byte;
+unsigned length = package->len;
+unsigned length_bytes;
+
+if (length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT)) {
+length_bytes = 1;
+} else if (length + 2 < (1 << PACKAGE_LENGTH_3BYTE_SHIFT)) {
+length_bytes = 2;
+} else if (length + 3 < (1 << PACKAGE_LENGTH_4BYTE_SHIFT)) {
+length_bytes = 3;
+} else {
+length_bytes = 4;
+}
+
+/* Force length to at least min_bytes.
+ * This wastes memory but that's how bios did it.
+ */
+length_bytes = MAX(length_bytes, min_bytes);
+
+/* PkgLength is the length of the inclusive length of the data. */
+length += length_bytes;
+
+switch (length_bytes) {
+case 1:
+byte = length;
+build_prepend_byte(package, byte);
+return;
+case 4:
+byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT;
+build_prepend_byte(package, byte);
+length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1;
+/* fall through */
+case 3:
+byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT;
+build_prepend_byte(package, byte);
+length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1;
+/* fall through */
+case 2:
+byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT;
+build_prepend_byte(package, byte);
+lengt

[Qemu-devel] [PATCH v7 5/5] acpi-build: skip hotplugged bridges

2015-01-30 Thread Igor Mammedov
Hotplugged bridges don't get bsel allocated so acpi hotplug doesn't
work for them anyway, also it causes ACPI tables size change across
reboot when bridge was hotplugged before reboot, which doesn't work
with immutable RSDP.
This patch works around static RSDP issue, where if ACPI tables blob
changes it size across reboots RSDT shifts up or down and RSDP no
longer ponts to it, as result guest can't find/initialize ACPI tables
correctly. Which causes BSOD for Windows guests.

With this patch slot where bridge was hotplugged will keep the same
description, i.e. as hotpluggable slot and also hotplugged bridge
subtree won't be build, keeping size of ACPI tables blob the same.

Subtree for bridge is build only for cold-plugged bridges.

based on "Michael S. Tsirkin"  patch
http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg04085.html
but a bit simpler

Signed-off-by: Igor Mammedov 
---
 hw/i386/acpi-build.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index f91b7cf..27adfb9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -723,7 +723,7 @@ static void build_append_pci_bus_devices(GArray 
*parent_scope, PCIBus *bus,
 if (pc->class_id == PCI_CLASS_BRIDGE_ISA) {
 continue;
 }
-bridge_in_acpi = pc->is_bridge && pcihp_bridge_en;
+bridge_in_acpi = pc->is_bridge && pcihp_bridge_en && !dc->hotpluggable;
 
 if (pc->class_id == PCI_CLASS_DISPLAY_VGA) {
 
@@ -751,9 +751,7 @@ static void build_append_pci_bus_devices(GArray 
*parent_scope, PCIBus *bus,
 memcpy(pcihp, ACPI_PCINOHP_AML, ACPI_PCINOHP_SIZEOF);
 patch_pcinohp(slot, pcihp);
 
-/* When hotplug for bridges is enabled, bridges that are
- * described in ACPI separately aren't themselves hot-pluggable.
- */
+/* Describe coldplugged bridges in ACPI */
 if (bridge_in_acpi) {
 PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(pdev));
 
-- 
1.8.3.1




[Qemu-devel] [PATCH v7 4/5] pc: acpi-build: simplify PCI bus tree generation

2015-01-30 Thread Igor Mammedov
it basicaly does the same as original approach,
* just without bus/notify tables tracking (less obscure)
  which is easier to follow.
* drops unnecessary loops and bitmaps,
  creating devices and notification method in the same loop.
* saves us ~100LOC

Signed-off-by: Igor Mammedov 
Reviewed-by: Marcel Apfelbaum 
---
v5:
  * drop wrong comment
v4:
  * keep original logic of creating bridge devices as it was done
in 133a2da48 "pc: acpi: generate AML only for PCI0 ..."
  * if bus is non hotpluggable, add child slots to bus as non
hotpluggable as it was done in original code.
v3:
  * use hotpluggable device object instead of not hotpluggable
for non present devices, and add it only when bus itself is
hotpluggable
---
 hw/i386/acpi-build.c | 272 ---
 1 file changed, 87 insertions(+), 185 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 6a1d9ab..f91b7cf 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -106,7 +106,6 @@ typedef struct AcpiPmInfo {
 typedef struct AcpiMiscInfo {
 bool has_hpet;
 bool has_tpm;
-DECLARE_BITMAP(slot_hotplug_enable, PCI_SLOT_MAX);
 const unsigned char *dsdt_code;
 unsigned dsdt_size;
 uint16_t pvpanic_port;
@@ -654,69 +653,37 @@ static void acpi_set_pci_info(void)
 }
 }
 
-static void build_pci_bus_state_init(AcpiBuildPciBusHotplugState *state,
- AcpiBuildPciBusHotplugState *parent,
- bool pcihp_bridge_en)
+static void build_append_pcihp_notify_entry(GArray *method, int slot)
 {
-state->parent = parent;
-state->device_table = build_alloc_array();
-state->notify_table = build_alloc_array();
-state->pcihp_bridge_en = pcihp_bridge_en;
-}
-
-static void build_pci_bus_state_cleanup(AcpiBuildPciBusHotplugState *state)
-{
-build_free_array(state->device_table);
-build_free_array(state->notify_table);
-}
+GArray *ifctx;
 
-static void *build_pci_bus_begin(PCIBus *bus, void *parent_state)
-{
-AcpiBuildPciBusHotplugState *parent = parent_state;
-AcpiBuildPciBusHotplugState *child = g_malloc(sizeof *child);
-
-build_pci_bus_state_init(child, parent, parent->pcihp_bridge_en);
+ifctx = build_alloc_array();
+build_append_byte(ifctx, 0x7B); /* AndOp */
+build_append_byte(ifctx, 0x68); /* Arg0Op */
+build_append_int(ifctx, 0x1U << slot);
+build_append_byte(ifctx, 0x00); /* NullName */
+build_append_byte(ifctx, 0x86); /* NotifyOp */
+build_append_namestring(ifctx, "S%.02X", PCI_DEVFN(slot, 0));
+build_append_byte(ifctx, 0x69); /* Arg1Op */
 
-return child;
+/* Pack it up */
+build_package(ifctx, 0xA0 /* IfOp */);
+build_append_array(method, ifctx);
+build_free_array(ifctx);
 }
 
-static void build_pci_bus_end(PCIBus *bus, void *bus_state)
+static void build_append_pci_bus_devices(GArray *parent_scope, PCIBus *bus,
+ bool pcihp_bridge_en)
 {
-AcpiBuildPciBusHotplugState *child = bus_state;
-AcpiBuildPciBusHotplugState *parent = child->parent;
 GArray *bus_table = build_alloc_array();
-DECLARE_BITMAP(slot_hotplug_enable, PCI_SLOT_MAX);
-DECLARE_BITMAP(slot_device_present, PCI_SLOT_MAX);
-DECLARE_BITMAP(slot_device_system, PCI_SLOT_MAX);
-DECLARE_BITMAP(slot_device_vga, PCI_SLOT_MAX);
-DECLARE_BITMAP(slot_device_qxl, PCI_SLOT_MAX);
-uint8_t op;
-int i;
+GArray *method = NULL;
 QObject *bsel;
-GArray *method;
-bool bus_hotplug_support = false;
-
-/*
- * Skip bridge subtree creation if bridge hotplug is disabled
- * to make acpi tables compatible with legacy machine types.
- */
-if (!child->pcihp_bridge_en && bus->parent_dev) {
-return;
-}
+PCIBus *sec;
+int i;
 
 if (bus->parent_dev) {
-op = 0x82; /* DeviceOp */
-build_append_namestring(bus_table, "S%.02X",
- bus->parent_dev->devfn);
-build_append_byte(bus_table, 0x08); /* NameOp */
-build_append_namestring(bus_table, "_SUN");
-build_append_value(bus_table, PCI_SLOT(bus->parent_dev->devfn), 1);
-build_append_byte(bus_table, 0x08); /* NameOp */
-build_append_namestring(bus_table, "_ADR");
-build_append_value(bus_table, (PCI_SLOT(bus->parent_dev->devfn) << 16) 
|
-   PCI_FUNC(bus->parent_dev->devfn), 4);
+build_append_namestring(bus_table, "S%.02X_", bus->parent_dev->devfn);
 } else {
-op = 0x10; /* ScopeOp */;
 build_append_namestring(bus_table, "PCI0");
 }
 
@@ -725,17 +692,9 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 build_append_byte(bus_table, 0x08); /* NameOp */
 build_append_namestring(bus_table, "BSEL");
 build_append_int(bus_table, qint_get_int(qobject_to_qint(bsel)));
-memset(slot_hotplug_enable, 0xff, sizeof slot_hotpl

[Qemu-devel] [PATCH v7 2/5] acpi: add build_append_namestring() helper

2015-01-30 Thread Igor Mammedov
Use build_append_namestring() instead of build_append_nameseg()
So user won't have to care whether name is NameSeg, NamePath or
NameString.

See for reference ACPI 5.0: 20.2.2 Name Objects Encoding

Signed-off-by: Igor Mammedov 
---
v5:
 * replace g_array_append_val() with build_append_byte() and
   use literals instead of temporary variables
 * fix indent in acpi-build.c
 * build_append_nameseg() no longer used with format string
   replace variable args prototype with fixed one that takes
   only "char *seg" as argument
v4:
 * fix codding style of ACPI_DualNamePrefix & ACPI_MultiNamePrefix
   it's imposible to use literals as suggested due to
   g_array_append_val() requiring reference value

v3:
 assert on wrong Segcount earlier and extend condition to
  seg_count > 0 && seg_count <= 255
 as suggested by Claudio Fontana 
---
 hw/acpi/aml-build.c | 96 -
 hw/i386/acpi-build.c| 37 -
 include/hw/acpi/aml-build.h |  2 +-
 3 files changed, 105 insertions(+), 30 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index b28c1f4..6a02474 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "hw/acpi/aml-build.h"
 
 GArray *build_alloc_array(void)
@@ -52,25 +53,102 @@ void build_append_array(GArray *array, GArray *val)
 
 #define ACPI_NAMESEG_LEN 4
 
-void GCC_FMT_ATTR(2, 3)
-build_append_nameseg(GArray *array, const char *format, ...)
+static void
+build_append_nameseg(GArray *array, const char *seg)
 {
 /* It would be nicer to use g_string_vprintf but it's only there in 2.22 */
-char s[] = "";
 int len;
-va_list args;
-
-va_start(args, format);
-len = vsnprintf(s, sizeof s, format, args);
-va_end(args);
 
+len = strlen(seg);
 assert(len <= ACPI_NAMESEG_LEN);
 
-g_array_append_vals(array, s, len);
+g_array_append_vals(array, seg, len);
 /* Pad up to ACPI_NAMESEG_LEN characters if necessary. */
 g_array_append_vals(array, "", ACPI_NAMESEG_LEN - len);
 }
 
+static void
+build_append_namestringv(GArray *array, const char *format, va_list ap)
+{
+/* It would be nicer to use g_string_vprintf but it's only there in 2.22 */
+char *s;
+int len;
+va_list va_len;
+char **segs;
+char **segs_iter;
+int seg_count = 0;
+
+va_copy(va_len, ap);
+len = vsnprintf(NULL, 0, format, va_len);
+va_end(va_len);
+len += 1;
+s = g_new(typeof(*s), len);
+
+len = vsnprintf(s, len, format, ap);
+
+segs = g_strsplit(s, ".", 0);
+g_free(s);
+
+/* count segments */
+segs_iter = segs;
+while (*segs_iter) {
+++segs_iter;
+++seg_count;
+}
+/*
+ * ACPI 5.0 spec: 20.2.2 Name Objects Encoding:
+ * "SegCount can be from 1 to 255"
+ */
+assert(seg_count > 0 && seg_count <= 255);
+
+/* handle RootPath || PrefixPath */
+s = *segs;
+while (*s == '\\' || *s == '^') {
+build_append_byte(array, *s);
+++s;
+}
+
+switch (seg_count) {
+case 1:
+if (!*s) {
+build_append_byte(array, 0x0); /* NullName */
+} else {
+build_append_nameseg(array, s);
+}
+break;
+
+case 2:
+build_append_byte(array, 0x2E); /* DualNamePrefix */
+build_append_nameseg(array, s);
+build_append_nameseg(array, segs[1]);
+break;
+default:
+build_append_byte(array, 0x2F); /* MultiNamePrefix */
+build_append_byte(array, seg_count);
+
+/* handle the 1st segment manually due to prefix/root path */
+build_append_nameseg(array, s);
+
+/* add the rest of segments */
+segs_iter = segs + 1;
+while (*segs_iter) {
+build_append_nameseg(array, *segs_iter);
+++segs_iter;
+}
+break;
+}
+g_strfreev(segs);
+}
+
+void build_append_namestring(GArray *array, const char *format, ...)
+{
+va_list ap;
+
+va_start(ap, format);
+build_append_namestringv(array, format, ap);
+va_end(ap);
+}
+
 /* 5.4 Definition Block Encoding */
 enum {
 PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index b3bd269..4784756 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -286,7 +286,7 @@ static GArray *build_alloc_method(const char *name, uint8_t 
arg_count)
 {
 GArray *method = build_alloc_array();
 
-build_append_nameseg(method, "%s", name);
+build_append_namestring(method, "%s", name);
 build_append_byte(method, arg_count); /* MethodFlags: ArgCount */
 
 return method;
@@ -578,7 +578,7 @@ build_append_notify_method(GArray *device, const char *name,
 
 for (i = 0; i < count; i++) {
 GArray *target = build_alloc_array();
-build_append_nameseg(target, format, i);
+build_append_namestring(target, format, i);

[Qemu-devel] [Bug 1397157] Re: cpu high with ps2 keyboard on multi-core win7 guest os

2015-01-30 Thread Dr. David Alan Gilbert
Thanks; I see your matching RH bugzilla entry;
https://bugzilla.redhat.com/show_bug.cgi?id=1169267

** Bug watch added: Red Hat Bugzilla #1169267
   https://bugzilla.redhat.com/show_bug.cgi?id=1169267

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1397157

Title:
  cpu high with ps2 keyboard on multi-core win7 guest os

Status in QEMU:
  New

Bug description:
  
  qemu ver: 1.5.3-Latest 

  guest os: window 7 64bit with 2 cpu and ps2 keybord.

  problem: Use virt-viwer as client to connect, When input quickly, the
  guest and host cpu will high and the input-char will display later.
  but when only 1 cpu on the vm, the problem will not display or When
  qemu ver is 0.12.1, the problem will not display.

  qemu cmd:
  /usr/libexec/qemu-kvm -name xx_win7 -machine 
pc-i440fx-rhel7.0.0,accel=kvm,usb=off -cpu 
qemu64,+sse4.2,+sse4.1,+ssse3,-svm,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 
4096 -realtime mlock=off -smp 2,sockets=1,cores=2,threads=1 -uuid 
0860a434-6560-591b-f92f-c13c5caaf52d -rtc base=localtime -no-shutdown -boot 
strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive 
file=/lfs/xx_win7/xx_win7.vda,if=none,id=drive-virtio-disk0,format=qcow2,cache=writeback
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
 -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device 
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -chardev 
spicevmc,id=charchannel0,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
 -device usb-tablet,id=input0 -spice 
port=5900,addr=::,disable-ticketing,plaintext-channel=main,plaintext-channel=display,plaintext-channel=inputs,plaintext-channel=cursor,plaintext-channel=playback,plaintext-channel=record,plaintext-channel=usbredir,image-compression=auto_glz,jpeg-wan-compression=auto,zlib-glz-wan-compression=auto,playback-compression=on,disable-copy-paste,seamless-migration=on
 -vga qxl -global qxl-vga.ram_size=268435456 -global qxl-vga.vram_size=67108864 
-device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device 
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev 
spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1 -chardev 
spicevmc,id=charredir2,name=usbredir -device 
usb-redir,chardev=charredir2,id=redir2 -chardev 
spicevmc,id=charredir3,name=usbredir -device 
usb-redir,chardev=charredir3,id=redir3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1397157/+subscriptions



Re: [Qemu-devel] [PATCH 1/2] acpi-build: fix memory leak with bridge hp off

2015-01-30 Thread Igor Mammedov
On Wed, 28 Jan 2015 18:30:34 +0200
"Michael S. Tsirkin"  wrote:

> When bridge hotplug is disabled for old machine types,
> we never free memory allocated for temporary tables.
> Fix this up.
patch "pc: acpi-build: simplify PCI bus tree generation"
http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg04030.html
obsoletes this patch since it replaces old code with leak with
a more simple one where this bug do not exists.

> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/i386/acpi-build.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 4944249..74586f3 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -859,6 +859,9 @@ static void build_pci_bus_end(PCIBus *bus, void
> *bus_state)
>   * to make acpi tables compatible with legacy machine types.
>   */
>  if (!child->pcihp_bridge_en && bus->parent_dev) {
> +build_free_array(bus_table);
> +build_pci_bus_state_cleanup(child);
> +g_free(child);
>  return;
>  }
>  




Re: [Qemu-devel] [PATCH] quorum: don't share qiov

2015-01-30 Thread Kevin Wolf
Am 30.01.2015 um 09:07 hat Wen Congyang geschrieben:
> If the child touches qiov->iov, it will cause unexpected results.
> 
> Signed-off-by: Wen Congyang 

Any specific child you're thinking of?

I think children are not supposed to modify their qiov (which would also
fail for init_external qiovs). Perhaps we should have made it const.

Kevin



Re: [Qemu-devel] [PATCH 0/7] MIPS: IEEE 754-2008 features support

2015-01-30 Thread Maciej W. Rozycki
On Fri, 30 Jan 2015, Peter Maydell wrote:

> >  This patch series comprises changes to QEMU, both the MIPS backend and
> > generic SoftFloat support code, to support IEEE 754-2008 features
> > introduced to revision 3.50 of the MIPS Architecture as follows.
> 
> Just to let you know that:
> (1) the softfloat relicensing has hit master, so this patchset isn't
> blocked by anything now
> (2) I would like to see a definite "we are happy to license this
> patchset under the SoftFloat-2a license" for these changes, because
> they were submitted before we applied the relicensing, and therefore
> the "changes after $DATE will be -2a license unless otherwise stated"
> note in the sourcecode can't be assumed to apply to them.

 Thanks for the heads-up!  At this stage however someone at Mentor will 
have to make such a statement on behalf of the company as I am no longer 
there and as far as this patch set is concerned I am merely a member of 
the public who can just make technical comments as anyone can, including 
you.

 I think Thomas, being the writer of the majority of code comprising these 
patches, is now in the best position to make such a statement happen.  
Thomas -- will you be able to take it from here?  Thanks!

  Maciej



Re: [Qemu-devel] [PATCH RFC v6 18/20] virtio: support revision-specific features

2015-01-30 Thread Cornelia Huck
On Wed, 7 Jan 2015 21:10:07 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jan 07, 2015 at 05:22:32PM +0100, Cornelia Huck wrote:
> > On Sun, 28 Dec 2014 10:32:06 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Thu, Dec 11, 2014 at 02:25:20PM +0100, Cornelia Huck wrote:
> > > > Devices may support different sets of feature bits depending on which
> > > > revision they're operating at. Let's give the transport a way to
> > > > re-query the device about its features when the revision has been
> > > > changed.
> > > > 
> > > > Signed-off-by: Cornelia Huck 
> > > 
> > > So now we have both get_features and get_features_rev, and
> > > it's never clear which revision does host_features refer to.
> > > IMHO that's just too messy.
> > > Let's add get_legacy_features and host_legacy_features instead?
> > 
> > I wanted to avoid touching anything that does not support version 1.
> > And this interface might still work for later revisions, no?
> 
> We can add _modern_ then, or rename host_features to host_legacy_features
> everywhere as preparation.
> 

OK, I've ditched the "don't modify old stuff" goal and introduced
->get_features_legacy(). For now, devices will add VERSION_1 in their
->get_features() callback when they support it. (For many devices, this
will be the only difference between the two callbacks.)

Two sets of host_features don't make much sense to me.

I've hacked up something and will play with it a bit; I might post a
new patch series next week.




Re: [Qemu-devel] [PATCH v2 3/7] softfloat: Convert `*_default_nan' variables into inline functions

2015-01-30 Thread Leon Alrae
On 12/12/2014 19:34, Maciej W. Rozycki wrote:
> Mechanically replace `*_default_nan' variables with inline functions and 
> convert references accordingly.  Use `__inline__' rather than `inline' 
> so that the latter does not cause the definitions to become static as a 
> result of macro expansion, the functions are best inlined when referred 
> to from softfloat.c, but external copies must be also produced for 
> external callers.
> 
> Signed-off-by: Thomas Schwinge 
> Signed-off-by: Maciej W. Rozycki 

> Index: qemu-git-trunk/include/fpu/softfloat.h
> ===
> --- qemu-git-trunk.orig/include/fpu/softfloat.h   2014-12-11 
> 18:15:02.0 +
> +++ qemu-git-trunk/include/fpu/softfloat.h2014-12-11 18:18:00.918095644 
> +
> @@ -370,7 +370,7 @@ static inline int float16_is_any_nan(flo
>  
> /*
>  | The pattern for a default generated half-precision NaN.
>  
> **/
> -extern const float16 float16_default_nan;
> +__inline__ float16 float16_default_nan(void);
>  
>  
> /*
>  | Software IEC/IEEE single-precision conversion routines.
> @@ -482,7 +482,7 @@ static inline float32 float32_set_sign(f
>  
> /*
>  | The pattern for a default generated single-precision NaN.
>  
> **/
> -extern const float32 float32_default_nan;
> +__inline__ float32 float32_default_nan(void);
>  
>  
> /*
>  | Software IEC/IEEE double-precision conversion routines.
> @@ -594,7 +594,7 @@ static inline float64 float64_set_sign(f
>  
> /*
>  | The pattern for a default generated double-precision NaN.
>  
> **/
> -extern const float64 float64_default_nan;
> +__inline__ float64 float64_default_nan(void);
>  
>  
> /*
>  | Software IEC/IEEE extended double-precision conversion routines.
> @@ -679,7 +679,7 @@ static inline int floatx80_is_any_nan(fl
>  
> /*
>  | The pattern for a default generated extended double-precision NaN.
>  
> **/
> -extern const floatx80 floatx80_default_nan;
> +__inline__ floatx80 floatx80_default_nan(void);
>  
>  
> /*
>  | Software IEC/IEEE quadruple-precision conversion routines.
> @@ -760,6 +760,6 @@ static inline int float128_is_any_nan(fl
>  
> /*
>  | The pattern for a default generated quadruple-precision NaN.
>  
> **/
> -extern const float128 float128_default_nan;
> +__inline__ float128 float128_default_nan(void);
>  

Unfortunately clang complains about it and eventually it won't link:

  CCaarch64-softmmu/target-arm/helper-a64.o
In file included from /qemu/target-arm/helper-a64.c:20:
In file included from /qemu/target-arm/cpu.h:37:
In file included from /qemu/include/qemu-common.h:120:
In file included from /qemu/include/qemu/bswap.h:8:
/qemu/include/fpu/softfloat.h:485:20: warning: inline function
'float32_default_nan' is not defined [-Wundefined-inline]
__inline__ float32 float32_default_nan(void);
   ^
/qemu/target-arm/helper-a64.c:345:19: note: used here
nan = float32_default_nan();
  ^
In file included from /qemu/target-arm/helper-a64.c:20:
In file included from /qemu/target-arm/cpu.h:37:
In file included from /qemu/include/qemu-common.h:120:
In file included from /qemu/include/qemu/bswap.h:8:
/qemu/include/fpu/softfloat.h:597:20: warning: inline function
'float64_default_nan' is not defined [-Wundefined-inline]
__inline__ float64 float64_default_nan(void);
   ^
/qemu/target-arm/helper-a64.c:374:19: note: used here
nan = float64_default_nan();
  ^
2 warnings generated.
  CCaarch64-softmmu/target-arm/gdbstub64.o
  CCaarch64-softmmu/target-arm/crypto_helper.o
  GEN   trace/generated-helpers.c
  CCaarch64-softmmu/trace/generated-helpers.o
  LINK  aarch64-softmmu/qemu-system-aarch64
fpu/softfloat.o: In function `commonNaNToFloat64':
/qemu/fpu/softfloat-specialize.h:796: undefined reference to
`float64_default_nan'
/qemu/fpu/softfloat-specialize.h:805: undefined reference to
`float64

Re: [Qemu-devel] [PATCH RFC v6 19/20] virtio-blk: revision specific feature bits

2015-01-30 Thread Cornelia Huck
On Wed, 7 Jan 2015 21:11:44 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jan 07, 2015 at 05:29:49PM +0100, Cornelia Huck wrote:
> > On Sun, 28 Dec 2014 12:24:46 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Thu, Dec 11, 2014 at 02:25:21PM +0100, Cornelia Huck wrote:
> > > > Wire up virtio-blk to provide different feature bit sets depending
> > > > on whether legacy or v1.0 has been requested.
> > > > 
> > > > Note that VERSION_1 is still disabled due to missing ANY_LAYOUT support.
> > > > 
> > > > Signed-off-by: Cornelia Huck 
> > > 
> > > So we need some way for devices to tell transports
> > > not to negotiate rev 1.
> > > Does clearing VERSION_1 have this effect?
> > > 
> > I just noticed that my patch is running in circles here.
> > 
> > What we need is probably the transport-dependent maximum revision
> > checker (which at least for ccw is acting on a device) pass in the
> > requested revision and check if the feature bits for the revision
> > include VERSION_1. Does that make sense?
> 
> Just make devices set 'rev 1 supported' flag?

I'm now using the ->get_features() callback to check for VERSION_1
(assuming every device that supports it adds the bit in its callback)
and only allow rev 1 if it is present. Will play with this a bit as
well.




Re: [Qemu-devel] [PATCH] target-arm: Squash input denormals in FRECPS and FRSQRTS

2015-01-30 Thread Greg Bellows
On Fri, Jan 30, 2015 at 3:41 AM, Laurent Desnogues <
laurent.desnog...@gmail.com> wrote:

> On Thu, Jan 29, 2015 at 8:31 PM, Peter Maydell 
> wrote:
> > The helper functions for FRECPS and FRSQRTS have special case
> > handling that includes checks for zero inputs, so squash input
> > denormals if necessary before those checks. This fixes incorrect
> > output when the FPCR DZ bit is set to enable squashing of input
> > denormals.
> >
> > Signed-off-by: Peter Maydell 
>
> Tested-by: Laurent Desnogues 
>
> Thanks,
>
> Laurent
>
> > ---
> > A quick eyeball of helper-a64.c suggests that these are the only
> > other insns we needed to fix, and a risu test of these insns
> > confirms that (a) they're buggy and (b) this patch fixes them.
> > I haven't done an exhaustive coverage test of the whole instruction
> > set with the DZ bit set, though...
> >
> >  target-arm/helper-a64.c | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> > index ebd9247..8aa40e9 100644
> > --- a/target-arm/helper-a64.c
> > +++ b/target-arm/helper-a64.c
> > @@ -229,6 +229,9 @@ float32 HELPER(recpsf_f32)(float32 a, float32 b,
> void *fpstp)
> >  {
> >  float_status *fpst = fpstp;
> >
> > +a = float32_squash_input_denormal(a, fpst);
> > +b = float32_squash_input_denormal(b, fpst);
> > +
> >  a = float32_chs(a);
> >  if ((float32_is_infinity(a) && float32_is_zero(b)) ||
> >  (float32_is_infinity(b) && float32_is_zero(a))) {
> > @@ -241,6 +244,9 @@ float64 HELPER(recpsf_f64)(float64 a, float64 b,
> void *fpstp)
> >  {
> >  float_status *fpst = fpstp;
> >
> > +a = float64_squash_input_denormal(a, fpst);
> > +b = float64_squash_input_denormal(b, fpst);
> > +
> >  a = float64_chs(a);
> >  if ((float64_is_infinity(a) && float64_is_zero(b)) ||
> >  (float64_is_infinity(b) && float64_is_zero(a))) {
> > @@ -253,6 +259,9 @@ float32 HELPER(rsqrtsf_f32)(float32 a, float32 b,
> void *fpstp)
> >  {
> >  float_status *fpst = fpstp;
> >
> > +a = float32_squash_input_denormal(a, fpst);
> > +b = float32_squash_input_denormal(b, fpst);
> > +
> >  a = float32_chs(a);
> >  if ((float32_is_infinity(a) && float32_is_zero(b)) ||
> >  (float32_is_infinity(b) && float32_is_zero(a))) {
> > @@ -265,6 +274,9 @@ float64 HELPER(rsqrtsf_f64)(float64 a, float64 b,
> void *fpstp)
> >  {
> >  float_status *fpst = fpstp;
> >
> > +a = float64_squash_input_denormal(a, fpst);
> > +b = float64_squash_input_denormal(b, fpst);
> > +
> >  a = float64_chs(a);
> >  if ((float64_is_infinity(a) && float64_is_zero(b)) ||
> >  (float64_is_infinity(b) && float64_is_zero(a))) {
> > --
> > 1.9.1
> >
> >
>
>
​Reviewed-by: Greg Bellows ​


  1   2   3   >