[Qemu-devel] [PATCH] build-sys: introduce install-prog macro to install&strip binaries and use it

2014-06-22 Thread Michael Tokarev
Use common rule (macro) to install and strip binaries, and use
it in all places where we install binaries, instead of fixing
bugs like 1319493 in every place.
(This fixes https://bugs.launchpad.net/bugs/1319493)

Signed-off-by: Michael Tokarev 
---
 Makefile|   12 ++--
 Makefile.target |5 +
 rules.mak   |7 +++
 3 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/Makefile b/Makefile
index 3c8f19c..4cbc26c 100644
--- a/Makefile
+++ b/Makefile
@@ -376,12 +376,8 @@ install-sysconfig: install-datadir install-confdir
 
 install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig \
 install-datadir install-localstatedir
-   $(INSTALL_DIR) "$(DESTDIR)$(bindir)"
 ifneq ($(TOOLS),)
-   $(INSTALL_PROG) $(TOOLS) "$(DESTDIR)$(bindir)"
-ifneq ($(STRIP),)
-   $(STRIP) $(TOOLS:%="$(DESTDIR)$(bindir)/%")
-endif
+   $(call install-prog,$(TOOLS),$(DESTDIR)$(bindir))
 endif
 ifneq ($(CONFIG_MODULES),)
$(INSTALL_DIR) "$(DESTDIR)$(qemu_moddir)"
@@ -392,11 +388,7 @@ ifneq ($(CONFIG_MODULES),)
done
 endif
 ifneq ($(HELPERS-y),)
-   $(INSTALL_DIR) "$(DESTDIR)$(libexecdir)"
-   $(INSTALL_PROG) $(HELPERS-y) "$(DESTDIR)$(libexecdir)"
-ifneq ($(STRIP),)
-   $(STRIP) $(HELPERS-y:%="$(DESTDIR)$(libexecdir)/%")
-endif
+   $(call install-prog,$(HELPERS-y),$(DESTDIR)$(libexecdir))
 endif
 ifneq ($(BLOBS),)
set -e; for x in $(BLOBS); do \
diff --git a/Makefile.target b/Makefile.target
index fc5827c..6089d29 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -190,10 +190,7 @@ endif
 
 install: all
 ifneq ($(PROGS),)
-   $(INSTALL_PROG) $(PROGS) "$(DESTDIR)$(bindir)"
-ifneq ($(STRIP),)
-   $(STRIP) $(PROGS:%="$(DESTDIR)$(bindir)/%")
-endif
+   $(call install-prog,$(PROGS),$(DESTDIR)$(bindir))
 endif
 ifdef CONFIG_TRACE_SYSTEMTAP
$(INSTALL_DIR) "$(DESTDIR)$(qemu_datadir)/../systemtap/tapset"
diff --git a/rules.mak b/rules.mak
index 945484e..ba2f4c1 100644
--- a/rules.mak
+++ b/rules.mak
@@ -101,6 +101,13 @@ cc-option = $(if $(shell $(CC) $1 $2 -S -o /dev/null -xc 
/dev/null \
 VPATH_SUFFIXES = %.c %.h %.S %.cc %.cpp %.m %.mak %.texi %.sh %.rc
 set-vpath = $(if $1,$(foreach PATTERN,$(VPATH_SUFFIXES),$(eval vpath 
$(PATTERN) $1)))
 
+# install-prog list, dir
+define install-prog
+   $(INSTALL_DIR) "$2"
+   $(INSTALL_PROG) $1 "$2"
+   $(if $(STRIP),$(STRIP) $(foreach T,$1,"$2/$(notdir $T)"),)
+endef
+
 # find-in-path
 # Usage: $(call find-in-path, prog)
 # Looks in the PATH if the argument contains no slash, else only considers one
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH 0/3] another round of pci fixes

2014-06-22 Thread Michael S. Tsirkin
On Fri, Jun 20, 2014 at 01:55:40PM +0800, Hu Tao wrote:
> Hi Michael,
> 
> This series is on top of your pci tree.
> 
> This series includes two fixups and one patch for adding test of human
> format of string output visitor, please review. thanks!

Please note we can't generally send fixups to patches
for which a pull request was sent.
We had a case recently where pull was rejected and
rebase requested, so that made fixups possible again.



> 
> Hu Tao (3):
>   fixup! libqemustub: add more stubs for qemu-char
>   fixup! qapi/string-output-visitor: fix bugs
>   tests: add human format test for string output visitor
> 
>  qapi/string-output-visitor.c   |   2 +-
>  stubs/Makefile.objs|   2 +-
>  tests/test-string-output-visitor.c | 109 
> ++---
>  3 files changed, 92 insertions(+), 21 deletions(-)
> 
> -- 
> 1.9.3



Re: [Qemu-devel] [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type

2014-06-22 Thread Chen, Tiejun

On 2014/6/20 20:32, Daniel Vetter wrote:

Well I have no clue about forwarding the intel gpu to virtualized
hosts and also no idea who could review this really. There's been a
bit a discussion around the iommu mapping forwarding and similar


No, this doesn't affect IOMMU mapping.


topics though. So I really wonder how well our driver works in this
use case ...


In native case the truth is intel_detect_pch() needs to probe the 
00:15.0 to determine what type current pch is, then the i915 driver can 
be initialized correctly. Actually the 00:15.0 is just a ISA bridge.


In virtualized case we thought this ISA bridge mayn't be represented 
with 00:15.0 originally by qemu-xen-traditional. So instead of checking 
devfn, the i915 driver check the class type to get this ISA bridge.


But with my investigation,qemu-xen-traditional always set 00:15.0 
explicitly to represent this ISA bridge. And especially, we wouldn't 
provide that ISA bridge with an explicit class type in qemu-upstream, so 
we need to the i915 driver to probe pch by checking devfn.


This should work both on the native case and the virtualized case.

Thanks
Tiejun


-Daniel

On Fri, Jun 20, 2014 at 11:40 AM, Chen, Tiejun  wrote:

Just ping, any comments?

Thanks
Tiejun


On 2014/6/19 17:53, Tiejun Chen wrote:


Originally the reason to probe ISA bridge instead of Dev31:Fun0
is to make graphics device passthrough work easy for VMM, that
only need to expose ISA bridge to let driver know the real
hardware underneath. This is a requirement from virtualization
team. Especially in that virtualized environments, XEN, there
is irrelevant ISA bridge in the system with that legacy qemu
version specific to xen, qemu-xen-traditional. So to work
reliably, we should scan through all the ISA bridge devices
and check for the first match, instead of only checking the
first one.

But actually, qemu-xen-traditional, is always enumerated with
Dev31:Fun0, 00:1f.0 as follows:

hw/pt-graphics.c:

intel_pch_init()
  |
  + pci_isa_bridge_init(bus, PCI_DEVFN(0x1f, 0), ...);

so this mean that isa bridge is still represented with Dev31:Func0
like the native OS. Furthermore, currently we're pushing VGA
passthrough support into qemu upstream, and with some discussion,
we wouldn't set the bridge class type and just expose this devfn.

So we just go back to check devfn to make life normal.

Signed-off-by: Tiejun Chen 
---
   drivers/gpu/drm/i915/i915_drv.c | 19 +++
   1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c
b/drivers/gpu/drm/i915/i915_drv.c
index 651e65e..cb2526e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -417,18 +417,8 @@ void intel_detect_pch(struct drm_device *dev)
 return;
 }

-   /*
-* The reason to probe ISA bridge instead of Dev31:Fun0 is to
-* make graphics device passthrough work easy for VMM, that only
-* need to expose ISA bridge to let driver know the real hardware
-* underneath. This is a requirement from virtualization team.
-*
-* In some virtualized environments (e.g. XEN), there is
irrelevant
-* ISA bridge in the system. To work reliably, we should scan
trhough
-* all the ISA bridge devices and check for the first match,
instead
-* of only checking the first one.
-*/
-   while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
+   pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0));
+   if (pch) {
 if (pch->vendor == PCI_VENDOR_ID_INTEL) {
 unsigned short id = pch->device &
INTEL_PCH_DEVICE_ID_MASK;
 dev_priv->pch_id = id;
@@ -462,10 +452,7 @@ void intel_detect_pch(struct drm_device *dev)
 DRM_DEBUG_KMS("Found LynxPoint LP PCH\n");
 WARN_ON(!IS_HASWELL(dev));
 WARN_ON(!IS_ULT(dev));
-   } else
-   continue;
-
-   break;
+   }
 }
 }
 if (!pch)











Re: [Qemu-devel] [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type

2014-06-22 Thread Chen, Tiejun

On 2014/6/20 20:48, Paolo Bonzini wrote:

Il 19/06/2014 11:53, Tiejun Chen ha scritto:

so this mean that isa bridge is still represented with Dev31:Func0
like the native OS. Furthermore, currently we're pushing VGA
passthrough support into qemu upstream, and with some discussion,
we wouldn't set the bridge class type and just expose this devfn.


Even this is not really optimal.  It just happens to work just because
QEMU's machine is currently a PCI machine with the ISA bridge on 00:01.0.

As soon as you'll try doing integrated graphics passthrough on a PCIe
machine type (such as QEMU's "-M q35") things will break down just as
badly.



Sorry, I can't understand why this is related to the ISA bridge, 00:01.0 
or even other PCIe machine type.


In virtualized case we always need to create this ISA bridge as a devfn, 
00:15.0, work for the i915 driver to support IGD passthrough.


In qemu-xen-traditional, this ISA bridge is already created as follows:

1> We set this ISA type explicitly;
2> We register that as 00:15.0.

In qemu-upstream, as you commented we can't create this as a ISA class 
type explicitly. So we compromise by faking this ISA bridge without ISA 
class type setting (as I recall you already said this way is slightly 
better). Maybe we will figure better way in the future. But anyway, this 
is always registered as 00:15.0, right? So I think the i915 driver can 
go back to probe the devfn like the native environment.


If I'm wrong please correct me.

Thanks
Tiejun



Re: [Qemu-devel] [Qemu-ppc] [PATCH v2] mac99: Change memory layout to better match PowerMac3, 1

2014-06-22 Thread BALATON Zoltan


Ping!

On Tue, 17 Jun 2014, BALATON Zoltan wrote:

On Tue, 17 Jun 2014, Alexander Graf wrote:

On 17.06.14 11:36, BALATON Zoltan wrote:

On Tue, 17 Jun 2014, Alexander Graf wrote:

On 12.04.14 11:20, BALATON Zoltan wrote:

Bring the memory map closer to a PowerMac3,1 model by removing unused
areas and adding the VGA and network cards after the macio to let the
latter be mapped from 0x8000 like on real hardware. (On real
hardware the graphics and network cards are on separate buses but we
don't model that yet.)

Signed-off-by: BALATON Zoltan 
---

This patch is intended to bring memory layout closer to what's seen in
these dumps:

http://nandra.segv.jp/NetBSD/G4.dump-device-tree.txt
http://raveland.org/ports/eeprom.txt
http://mail-index.netbsd.org/port-macppc/2007/10/24/.html
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604134

v2: Added back unin2_memory region that Darwin seems to like better

---
  hw/pci-host/uninorth.c |  2 +-
  hw/ppc/mac_newworld.c  | 14 +++---
  2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/pci-host/uninorth.c b/hw/pci-host/uninorth.c
index e72fe2a..21f805f 100644
--- a/hw/pci-host/uninorth.c
+++ b/hw/pci-host/uninorth.c
@@ -230,7 +230,7 @@ PCIBus *pci_pmac_init(qemu_irq *pic,
  d = UNI_NORTH_PCI_HOST_BRIDGE(dev);
  memory_region_init(&d->pci_mmio, OBJECT(d), "pci-mmio", 
0x1ULL);
  memory_region_init_alias(&d->pci_hole, OBJECT(d), "pci-hole", 
&d->pci_mmio,

- 0x8000ULL, 0x7000ULL);
+ 0x8000ULL, 0x1000ULL);


Doesn't OpenBIOS need to know about this change so it can update its 
device tree?


No.


We don't expose the pci-hole size in device tree?


We do but already as 0x1000. See:
http://tracker.coreboot.org/trac/openbios/browser/trunk/openbios-devel/arch/ppc/qemu/init.c


memory_region_add_subregion(address_space_mem, 0x8000ULL,
  &d->pci_hole);
  diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index 4bdaa8d..a4e5044 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -371,18 +371,11 @@ static void ppc_core99_init(MachineState *machine)
  machine_arch = ARCH_MAC99;
  }
  /* init basic PC hardware */
-pci_vga_init(pci_bus);
-
  escc_mem = escc_init(0, pic[0x25], pic[0x24],
   serial_hds[0], serial_hds[1], ESCC_CLOCK, 4);
  memory_region_init_alias(escc_bar, NULL, "escc-bar",
   escc_mem, 0, 
memory_region_size(escc_mem));

  -for(i = 0; i < nb_nics; i++)
-pci_nic_init_nofail(&nd_table[i], pci_bus, "ne2k_pci", NULL);
-
-ide_drive_get(hd, MAX_IDE_BUS);
-
  macio = pci_create(pci_bus, -1, TYPE_NEWWORLD_MACIO);
  dev = DEVICE(macio);
  qdev_connect_gpio_out(dev, 0, pic[0x19]); /* CUDA */
@@ -393,6 +386,8 @@ static void ppc_core99_init(MachineState *machine)
  macio_init(macio, pic_mem, escc_bar);
/* We only emulate 2 out of 3 IDE controllers for now */
+ide_drive_get(hd, MAX_IDE_BUS);
+
  macio_ide = MACIO_IDE(object_resolve_path_component(OBJECT(macio),
"ide[0]"));
  macio_ide_init_drives(macio_ide, hd);
@@ -418,9 +413,14 @@ static void ppc_core99_init(MachineState *machine)
  }
  }
  +pci_vga_init(pci_bus);
+
  if (graphic_depth != 15 && graphic_depth != 32 && graphic_depth != 
8)

  graphic_depth = 15;
  +for(i = 0; i < nb_nics; i++)
+pci_nic_init_nofail(&nd_table[i], pci_bus, "ne2k_pci", NULL);
+
  /* The NewWorld NVRAM is not located in the MacIO device */
  dev = qdev_create(NULL, TYPE_MACIO_NVRAM);
  qdev_prop_set_uint32(dev, "size", 0x2000);


I presume all the changes above only change the devfn ordering?


It changes the addresses assigned to devices which is needed for MorphOS 
to work as it hardcodes the mmio locations of some (actually most) 
devices.


I don't see how the ordering change here could possibly change MMIO 
locations for anything?


It's how OpenBIOS assigns MMIO addresses to pci devices. It does it by going 
through them in order and map them starting from the base address (with some 
allingment). I guess you could look at drivers/pci.c I think it's in there 
somewhere.


Regards,
BALATON Zoltan






Re: [Qemu-devel] [PATCH v2 15/16] linux-user: support the KDSIGACCEPT ioctl

2014-06-22 Thread Paul Burton
On Sun, Jun 22, 2014 at 12:13:27AM +0100, Peter Maydell wrote:
> > diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
> > index 309fb21..cd21e64 100644
> > --- a/linux-user/ioctls.h
> > +++ b/linux-user/ioctls.h
> > @@ -64,6 +64,7 @@
> >   IOCTL(KDSKBLED, 0, TYPE_INT)
> >   IOCTL(KDGETLED, 0, TYPE_INT)
> >   IOCTL(KDSETLED, 0, TYPE_INT)
> > + IOCTL(KDSIGACCEPT, 0, TYPE_INT)
...snip...
> 
> The argument to this ioctl is a signal number, right?
> Surely we need to translate it to a host signal number,
> not just pass it through as TYPE_INT...

True, not sure how I missed that - the clue is in the name!

Thanks,
Paul


signature.asc
Description: Digital signature


Re: [Qemu-devel] [PATCH v2 08/16] linux-user: respect timezone for settimeofday

2014-06-22 Thread Paul Burton
On Sun, Jun 22, 2014 at 12:18:02AM +0100, Peter Maydell wrote:
> > +if (!lock_user_struct(VERIFY_READ, target_tz, target_tz_addr, 1))
> > +return -TARGET_EFAULT;
> 
> Coding style mandates braces even on single-line if()s; checkpatch.pl
> will catch this usually.

I copied that style from the equivalent timeval functions, but yeah I'll
make sure to get into the habit of running checkpatch for QEMU patches.

> Code looks OK otherwise.

Thanks for looking,
Paul


signature.asc
Description: Digital signature


Re: [Qemu-devel] [PATCH 3/3] tests: add human format test for string output visitor

2014-06-22 Thread Michael S. Tsirkin
On Fri, Jun 20, 2014 at 09:40:31AM -0600, Eric Blake wrote:
> On 06/19/2014 11:55 PM, Hu Tao wrote:
> > Signed-off-by: Hu Tao 
> > ---
> >  tests/test-string-output-visitor.c | 109 
> > ++---
> >  1 file changed, 90 insertions(+), 19 deletions(-)
> > 
> 
> >  
> > +len = strlen(EnumOne_lookup[i]) + 2;
> > +str_human = g_malloc0(len);
> > +str_human[0] = '"';
> > +strncpy(str_human + 1, EnumOne_lookup[i], 
> > strlen(EnumOne_lookup[i]));
> > +str_human[len - 1] = '"';
> 
> Eww.  Just use g_strdup_printf("\"%s\"", EnumOne_lookup[i]), instead of
> futzing around with manual length calculations.
> 
> 
> >  
> > -static void output_visitor_test_add(const char *testpath,
> > -TestOutputVisitorData *data,
> > -void 
> > (*test_func)(TestOutputVisitorData *data, const void *user_data))
> > +static void
> > +output_visitor_test_add(const char *testpath,
> 
> Why the line split? You moved away from the usual qemu style.

We have different styles really, coding style does not say.
And that line's was way too long, so it did need a split.

> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 





[Qemu-devel] [PATCH v3 04/16] linux-user: support SO_PASSSEC setsockopt option

2014-06-22 Thread Paul Burton
Translate the SO_PASSSEC option to setsockopt to the host value &
perform the syscall as expected, allowing use of the option by target
programs.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - None.

Changes in v2:
  - Fix the value of TARGET_SO_PASSSEC for the sparc target.
---
 linux-user/socket.h  | 5 +
 linux-user/syscall.c | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/linux-user/socket.h b/linux-user/socket.h
index ae17959..4dacae6 100644
--- a/linux-user/socket.h
+++ b/linux-user/socket.h
@@ -63,6 +63,7 @@
 #define TARGET_SO_PEERSEC  30
 #define TARGET_SO_SNDBUFFORCE  31
 #define TARGET_SO_RCVBUFFORCE  33
+#define TARGET_SO_PASSSEC  34
 
 /** sock_type - Socket types
  *
@@ -242,6 +243,10 @@
 
 #define TARGET_SOCK_MAX (TARGET_SOCK_PACKET + 1)
 #define TARGET_SOCK_TYPE_MASK0xf  /* Covers up to TARGET_SOCK_MAX-1. */
+
+#define TARGET_SO_PASSSEC31
+#else
+#define TARGET_SO_PASSSEC34
 #endif
 
 /* For setsockopt(2) */
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 6a8fdd6..0d873af 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1531,6 +1531,9 @@ set_timeout:
 case TARGET_SO_PASSCRED:
optname = SO_PASSCRED;
break;
+case TARGET_SO_PASSSEC:
+optname = SO_PASSSEC;
+break;
 case TARGET_SO_TIMESTAMP:
optname = SO_TIMESTAMP;
break;
-- 
2.0.0




[Qemu-devel] [PATCH v3 02/16] linux-user: support SO_ACCEPTCONN getsockopt option

2014-06-22 Thread Paul Burton
Translate the SO_ACCEPTCONN option to the host value & execute the
syscall as expected.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - None.

Changes in v2:
  - None.
---
 linux-user/syscall.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index a157204..cabc9da 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1652,6 +1652,9 @@ static abi_long do_getsockopt(int sockfd, int level, int 
optname,
 case TARGET_SO_RCVLOWAT:
 optname = SO_RCVLOWAT;
 goto int_case;
+case TARGET_SO_ACCEPTCONN:
+optname = SO_ACCEPTCONN;
+goto int_case;
 default:
 goto int_case;
 }
-- 
2.0.0




[Qemu-devel] [PATCH v3 01/16] linux-user: translate the result of getsockopt SO_TYPE

2014-06-22 Thread Paul Burton
QEMU previously passed the result of the host syscall directly to the
target program. This is a problem if the host & target have different
representations of socket types, as is the case when running a MIPS
target program on an x86 host. Introduce a host_to_target_sock_type
helper function mirroring the existing target_to_host_sock_type, and
call it to translate the value provided by getsockopt when called for
the SO_TYPE option.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - Fix coding style, checkpatch clean.

Changes in v2:
  - Remove indirection via a function pointer, and just call
host_to_target_sock_type directly.
---
 linux-user/syscall.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 7d74079..a157204 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -592,6 +592,37 @@ char *target_strerror(int err)
 return strerror(target_to_host_errno(err));
 }
 
+static inline int host_to_target_sock_type(int host_type)
+{
+int target_type;
+
+switch (host_type & 0xf /* SOCK_TYPE_MASK */) {
+case SOCK_DGRAM:
+target_type = TARGET_SOCK_DGRAM;
+break;
+case SOCK_STREAM:
+target_type = TARGET_SOCK_STREAM;
+break;
+default:
+target_type = host_type & 0xf /* SOCK_TYPE_MASK */;
+break;
+}
+
+#if defined(SOCK_CLOEXEC)
+if (host_type & SOCK_CLOEXEC) {
+target_type |= TARGET_SOCK_CLOEXEC;
+}
+#endif
+
+#if defined(SOCK_NONBLOCK)
+if (host_type & SOCK_NONBLOCK) {
+target_type |= TARGET_SOCK_NONBLOCK;
+}
+#endif
+
+return target_type;
+}
+
 static abi_ulong target_brk;
 static abi_ulong target_original_brk;
 static abi_ulong brk_page;
@@ -1636,6 +1667,9 @@ static abi_long do_getsockopt(int sockfd, int level, int 
optname,
 ret = get_errno(getsockopt(sockfd, level, optname, &val, &lv));
 if (ret < 0)
 return ret;
+if (optname == SO_TYPE) {
+val = host_to_target_sock_type(val);
+}
 if (len > lv)
 len = lv;
 if (len == 4) {
-- 
2.0.0




[Qemu-devel] [PATCH v3 09/16] linux-user: allow NULL tv argument for settimeofday

2014-06-22 Thread Paul Burton
The tv argument to the settimeofday syscall is allowed to be NULL, if
the program only wishes to provide the timezone. QEMU previously
returned -EFAULT when tv was NULL. Instead, execute the syscall &
provide NULL to the kernel as the target program expected.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - Fix coding style, checkpatch clean.

Changes in v2:
  - None.
---
 linux-user/syscall.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 4f01bb7..700cfec 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6353,11 +6353,15 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 break;
 case TARGET_NR_settimeofday:
 {
-struct timeval tv;
+struct timeval tv, *ptv = NULL;
 struct timezone tz, *ptz = NULL;
 
-if (copy_from_user_timeval(&tv, arg1))
-goto efault;
+if (arg1) {
+if (copy_from_user_timeval(&tv, arg1)) {
+goto efault;
+}
+ptv = &tv;
+}
 
 if (arg2) {
 if (copy_from_user_timezone(&tz, arg2)) {
@@ -6366,7 +6370,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 ptz = &tz;
 }
 
-ret = get_errno(settimeofday(&tv, ptz));
+ret = get_errno(settimeofday(ptv, ptz));
 }
 break;
 #if defined(TARGET_NR_select)
-- 
2.0.0




[Qemu-devel] [PATCH v3 03/16] linux-user: support SO_{SND, RCV}BUFFORCE setsockopt options

2014-06-22 Thread Paul Burton
Translate the SO_SNDBUFFORCE & SO_RCVBUFFORCE options to setsockopt to
the host values & perform the syscall as expected, allowing use of those
options by target programs.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - Fix coding style, checkpatch clean.

Changes in v2:
  - None.
---
 linux-user/syscall.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index cabc9da..6a8fdd6 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1502,9 +1502,15 @@ set_timeout:
 case TARGET_SO_SNDBUF:
optname = SO_SNDBUF;
break;
+case TARGET_SO_SNDBUFFORCE:
+optname = SO_SNDBUFFORCE;
+break;
 case TARGET_SO_RCVBUF:
optname = SO_RCVBUF;
break;
+case TARGET_SO_RCVBUFFORCE:
+optname = SO_RCVBUFFORCE;
+break;
 case TARGET_SO_KEEPALIVE:
optname = SO_KEEPALIVE;
break;
-- 
2.0.0




[Qemu-devel] [PATCH v3 00/16] linux-user fixes & improvements

2014-06-22 Thread Paul Burton
This series fixes a number of bugs in QEMUs linux-user support, some
specific to targetting the MIPS architecture but mostly generic. It also
adds support for some previously unsupported syscalls & {g,s}etsockopt
options.

Paul Burton (16):
  linux-user: translate the result of getsockopt SO_TYPE
  linux-user: support SO_ACCEPTCONN getsockopt option
  linux-user: support SO_{SND,RCV}BUFFORCE setsockopt options
  linux-user: support SO_PASSSEC setsockopt option
  linux-user: allow NULL arguments to mount
  linux-user: support strace of epoll_create1
  linux-user: fix struct target_epoll_event layout for MIPS
  linux-user: respect timezone for settimeofday
  linux-user: allow NULL tv argument for settimeofday
  linux-user: support timerfd_{create,gettime,settime} syscalls
  linux-user: support ioprio_{get,set} syscalls
  linux-user: support {name_to,open_by}_handle_at syscalls
  linux-user: support the setns syscall
  linux-user: support the unshare syscall
  linux-user: support the KDSIGACCEPT ioctl
  linux-user: support the SIOCGIFINDEX ioctl

 linux-user/ioctls.h   |   2 +
 linux-user/socket.h   |   5 +
 linux-user/strace.c   |  30 +
 linux-user/strace.list|  21 
 linux-user/syscall.c  | 299 ++
 linux-user/syscall_defs.h |   9 +-
 6 files changed, 339 insertions(+), 27 deletions(-)

-- 
2.0.0




[Qemu-devel] [PATCH v3 11/16] linux-user: support ioprio_{get, set} syscalls

2014-06-22 Thread Paul Burton
Add support for the ioprio_get & ioprio_set syscalls, allowing their
use by target programs.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - None.

Changes in v2:
  - Declare the host syscalls conditionally upon whether both host &
target define their numbers, to avoid compile time warnings/errors
about the host syscall functions being unused when built for targets
which don't support them.
---
 linux-user/syscall.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 0d8cd0b..f90dd99 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -252,6 +252,12 @@ _syscall2(int, capget, struct __user_cap_header_struct *, 
header,
   struct __user_cap_data_struct *, data);
 _syscall2(int, capset, struct __user_cap_header_struct *, header,
   struct __user_cap_data_struct *, data);
+#if defined(TARGET_NR_ioprio_get) && defined(__NR_ioprio_get)
+_syscall2(int, ioprio_get, int, which, int, who)
+#endif
+#if defined(TARGET_NR_ioprio_set) && defined(__NR_ioprio_set)
+_syscall3(int, ioprio_set, int, which, int, who, int, ioprio)
+#endif
 
 static bitmask_transtbl fcntl_flags_tbl[] = {
   { TARGET_O_ACCMODE,   TARGET_O_WRONLY,O_ACCMODE,   O_WRONLY,},
@@ -9488,6 +9494,18 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 break;
 #endif
 
+#if defined(TARGET_NR_ioprio_get) && defined(__NR_ioprio_get)
+case TARGET_NR_ioprio_get:
+ret = get_errno(ioprio_get(arg1, arg2));
+break;
+#endif
+
+#if defined(TARGET_NR_ioprio_set) && defined(__NR_ioprio_set)
+case TARGET_NR_ioprio_set:
+ret = get_errno(ioprio_set(arg1, arg2, arg3));
+break;
+#endif
+
 default:
 unimplemented:
 gemu_log("qemu: Unsupported syscall: %d\n", num);
-- 
2.0.0




[Qemu-devel] [PATCH v3 16/16] linux-user: support the SIOCGIFINDEX ioctl

2014-06-22 Thread Paul Burton
Add a definition of the SIOCGIFINDEX ioctl, allowing its use by target
programs.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - None.

Changes in v2:
  - None.
---
 linux-user/ioctls.h   | 1 +
 linux-user/syscall_defs.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index f278d3e..07a00da 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -118,6 +118,7 @@
   IOCTL(SIOCSIFMEM, IOC_W, MK_PTR(MK_STRUCT(STRUCT_ptr_ifreq)))
   IOCTL(SIOCADDMULTI, IOC_W, MK_PTR(MK_STRUCT(STRUCT_sockaddr_ifreq)))
   IOCTL(SIOCDELMULTI, IOC_W, MK_PTR(MK_STRUCT(STRUCT_sockaddr_ifreq)))
+  IOCTL(SIOCGIFINDEX, IOC_W | IOC_R, MK_PTR(MK_STRUCT(STRUCT_sockaddr_ifreq)))
   IOCTL(SIOCSIFLINK, 0, TYPE_NULL)
   IOCTL_SPECIAL(SIOCGIFCONF, IOC_W | IOC_R, do_ioctl_ifconf,
 MK_PTR(MK_STRUCT(STRUCT_ifconf)))
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 4adfd3a..8563027 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -865,6 +865,7 @@ struct target_pollfd {
 #define TARGET_SIOCSIFSLAVE0x8930
 #define TARGET_SIOCADDMULTI0x8931  /* Multicast address lists  
*/
 #define TARGET_SIOCDELMULTI0x8932
+#define TARGET_SIOCGIFINDEX0x8933
 
 /* Bridging control calls */
 #define TARGET_SIOCGIFBR   0x8940  /* Bridging support 
*/
-- 
2.0.0




[Qemu-devel] [PATCH v3 05/16] linux-user: allow NULL arguments to mount

2014-06-22 Thread Paul Burton
Calls to the mount syscall can legitimately provide NULL as the value
for the source of filesystemtype arguments, which QEMU would previously
reject & return -EFAULT to the target program. An example of this is
remounting an already mounted filesystem with different properties.

Instead of rejecting such syscalls with -EFAULT, pass NULL along to the
kernel as the target program expects.

Additionally this patch fixes a potential memory leak when DEBUG_REMAP
is enabled and lock_user_string fails on the target or filesystemtype
arguments but a prior argument was non-NULL and already locked.

Since the patch already touched most lines of the TARGET_NR_mount case,
it fixes the indentation & coding style for good measure.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - Fix coding style, checkpatch clean.

Changes in v2:
  - None.
---
 linux-user/syscall.c | 75 +---
 1 file changed, 53 insertions(+), 22 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 0d873af..19c73c9 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5566,29 +5566,60 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 break;
 #endif
 case TARGET_NR_mount:
-   {
-   /* need to look at the data field */
-   void *p2, *p3;
-   p = lock_user_string(arg1);
-   p2 = lock_user_string(arg2);
-   p3 = lock_user_string(arg3);
-if (!p || !p2 || !p3)
-ret = -TARGET_EFAULT;
-else {
-/* FIXME - arg5 should be locked, but it isn't 
clear how to
- * do that since it's not guaranteed to be a 
NULL-terminated
- * string.
- */
-if ( ! arg5 )
-ret = get_errno(mount(p, p2, p3, (unsigned 
long)arg4, NULL));
-else
-ret = get_errno(mount(p, p2, p3, (unsigned 
long)arg4, g2h(arg5)));
-}
+{
+/* need to look at the data field */
+void *p2, *p3;
+
+if (arg1) {
+p = lock_user_string(arg1);
+if (!p) {
+goto efault;
+}
+} else {
+p = NULL;
+}
+
+p2 = lock_user_string(arg2);
+if (!p2) {
+if (arg1) {
+unlock_user(p, arg1, 0);
+}
+goto efault;
+}
+
+if (arg3) {
+p3 = lock_user_string(arg3);
+if (!p3) {
+if (arg1) {
 unlock_user(p, arg1, 0);
-unlock_user(p2, arg2, 0);
-unlock_user(p3, arg3, 0);
-   break;
-   }
+}
+unlock_user(p2, arg2, 0);
+goto efault;
+}
+} else {
+p3 = NULL;
+}
+
+/* FIXME - arg5 should be locked, but it isn't clear how to
+ * do that since it's not guaranteed to be a NULL-terminated
+ * string.
+ */
+if (!arg5) {
+ret = mount(p, p2, p3, (unsigned long)arg4, NULL);
+} else {
+ret = mount(p, p2, p3, (unsigned long)arg4, g2h(arg5));
+}
+ret = get_errno(ret);
+
+if (arg1) {
+unlock_user(p, arg1, 0);
+}
+unlock_user(p2, arg2, 0);
+if (arg3) {
+unlock_user(p3, arg3, 0);
+}
+}
+break;
 #ifdef TARGET_NR_umount
 case TARGET_NR_umount:
 if (!(p = lock_user_string(arg1)))
-- 
2.0.0




[Qemu-devel] [PATCH v3 13/16] linux-user: support the setns syscall

2014-06-22 Thread Paul Burton
Add support for the setns syscall, trivially passed through to the host.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - None.

Changes in v2:
  - None.
---
 linux-user/strace.list | 3 +++
 linux-user/syscall.c   | 6 ++
 2 files changed, 9 insertions(+)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index 147f579..d5b8033 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1191,6 +1191,9 @@
 #ifdef TARGET_NR_set_mempolicy
 { TARGET_NR_set_mempolicy, "set_mempolicy" , NULL, NULL, NULL },
 #endif
+#ifdef TARGET_NR_setns
+{ TARGET_NR_setns, "setns" , NULL, NULL, NULL },
+#endif
 #ifdef TARGET_NR_setpgid
 { TARGET_NR_setpgid, "setpgid" , NULL, NULL, NULL },
 #endif
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index d8bd773..b630ef3 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9563,6 +9563,12 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 break;
 #endif
 
+#ifdef TARGET_NR_setns
+case TARGET_NR_setns:
+ret = get_errno(setns(arg1, arg2));
+break;
+#endif
+
 default:
 unimplemented:
 gemu_log("qemu: Unsupported syscall: %d\n", num);
-- 
2.0.0




[Qemu-devel] [PATCH v3 06/16] linux-user: support strace of epoll_create1

2014-06-22 Thread Paul Burton
Add the epoll_create1 syscall to strace.list in order to display that
syscall when it occurs, rather than a message about the syscall being
unknown despite QEMU already implementing support for it.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - None.

Changes in v2:
  - None.
---
 linux-user/strace.list | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index cf5841a..fcb258d 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -114,6 +114,9 @@
 #ifdef TARGET_NR_epoll_create
 { TARGET_NR_epoll_create, "epoll_create" , NULL, NULL, NULL },
 #endif
+#ifdef TARGET_NR_epoll_create1
+{ TARGET_NR_epoll_create1, "epoll_create1" , NULL, NULL, NULL },
+#endif
 #ifdef TARGET_NR_epoll_ctl
 { TARGET_NR_epoll_ctl, "epoll_ctl" , NULL, NULL, NULL },
 #endif
-- 
2.0.0




[Qemu-devel] [PATCH v3 08/16] linux-user: respect timezone for settimeofday

2014-06-22 Thread Paul Burton
The settimeofday syscall accepts a tz argument indicating the desired
timezone to the kernel. QEMU previously ignored any argument provided
by the target program & always passed NULL to the kernel. Instead,
translate the argument & pass along the data userland provided.

Although this argument is described by the settimeofday man page as
obsolete, it is used by systemd as of version 213.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - Fix coding style, checkpatch clean.

Changes in v2:
  - None.
---
 linux-user/syscall.c  | 29 -
 linux-user/syscall_defs.h |  5 +
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 19c73c9..4f01bb7 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -935,6 +935,23 @@ static inline abi_long copy_to_user_timeval(abi_ulong 
target_tv_addr,
 return 0;
 }
 
+static inline abi_long copy_from_user_timezone(struct timezone *tz,
+   abi_ulong target_tz_addr)
+{
+struct target_timezone *target_tz;
+
+if (!lock_user_struct(VERIFY_READ, target_tz, target_tz_addr, 1)) {
+return -TARGET_EFAULT;
+}
+
+__get_user(tz->tz_minuteswest, &target_tz->tz_minuteswest);
+__get_user(tz->tz_dsttime, &target_tz->tz_dsttime);
+
+unlock_user_struct(target_tz, target_tz_addr, 0);
+
+return 0;
+}
+
 #if defined(TARGET_NR_mq_open) && defined(__NR_mq_open)
 #include 
 
@@ -6337,9 +6354,19 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 case TARGET_NR_settimeofday:
 {
 struct timeval tv;
+struct timezone tz, *ptz = NULL;
+
 if (copy_from_user_timeval(&tv, arg1))
 goto efault;
-ret = get_errno(settimeofday(&tv, NULL));
+
+if (arg2) {
+if (copy_from_user_timezone(&tz, arg2)) {
+goto efault;
+}
+ptz = &tz;
+}
+
+ret = get_errno(settimeofday(&tv, ptz));
 }
 break;
 #if defined(TARGET_NR_select)
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index e379b45..a1f1fce 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -165,6 +165,11 @@ struct target_timespec {
 abi_long tv_nsec;
 };
 
+struct target_timezone {
+abi_int tz_minuteswest;
+abi_int tz_dsttime;
+};
+
 struct target_itimerval {
 struct target_timeval it_interval;
 struct target_timeval it_value;
-- 
2.0.0




[Qemu-devel] [PATCH v3 12/16] linux-user: support {name_to, open_by}_handle_at syscalls

2014-06-22 Thread Paul Burton
Implement support for the name_to_handle_at and open_by_handle_at
syscalls, allowing their use by the target program.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - Fix coding style, checkpatch clean.

Changes in v2:
  - None.
---
 linux-user/strace.c| 30 ++
 linux-user/strace.list |  6 ++
 linux-user/syscall.c   | 57 ++
 3 files changed, 93 insertions(+)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index ea6c1d2..c20ddf1 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -1552,6 +1552,36 @@ print_kill(const struct syscallname *name,
 }
 #endif
 
+#ifdef TARGET_NR_name_to_handle_at
+static void
+print_name_to_handle_at(const struct syscallname *name,
+abi_long arg0, abi_long arg1, abi_long arg2,
+abi_long arg3, abi_long arg4, abi_long arg5)
+{
+print_syscall_prologue(name);
+print_at_dirfd(arg0, 0);
+print_string(arg1, 0);
+print_pointer(arg2, 0);
+print_pointer(arg3, 0);
+print_raw_param("0x%x", arg4, 1);
+print_syscall_epilogue(name);
+}
+#endif
+
+#ifdef TARGET_NR_open_by_handle_at
+static void
+print_open_by_handle_at(const struct syscallname *name,
+abi_long arg0, abi_long arg1, abi_long arg2,
+abi_long arg3, abi_long arg4, abi_long arg5)
+{
+print_syscall_prologue(name);
+print_raw_param("%d", arg0, 0);
+print_pointer(arg2, 0);
+print_open_flags(arg3, 1);
+print_syscall_epilogue(name);
+}
+#endif
+
 /*
  * An array of all of the syscalls we know about
  */
diff --git a/linux-user/strace.list b/linux-user/strace.list
index 8de972a..147f579 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -582,6 +582,9 @@
 #ifdef TARGET_NR_munmap
 { TARGET_NR_munmap, "munmap" , NULL, print_munmap, NULL },
 #endif
+#ifdef TARGET_NR_name_to_handle_at
+{ TARGET_NR_name_to_handle_at, "name_to_handle_at" , NULL, 
print_name_to_handle_at, NULL },
+#endif
 #ifdef TARGET_NR_nanosleep
 { TARGET_NR_nanosleep, "nanosleep" , NULL, NULL, NULL },
 #endif
@@ -624,6 +627,9 @@
 #ifdef TARGET_NR_openat
 { TARGET_NR_openat, "openat" , NULL, print_openat, NULL },
 #endif
+#ifdef TARGET_NR_open_by_handle_at
+{ TARGET_NR_open_by_handle_at, "open_by_handle_at" , NULL, 
print_open_by_handle_at, NULL },
+#endif
 #ifdef TARGET_NR_osf_adjtime
 { TARGET_NR_osf_adjtime, "osf_adjtime" , NULL, NULL, NULL },
 #endif
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index f90dd99..d8bd773 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5351,6 +5351,63 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 unlock_user(p, arg2, 0);
 break;
 #endif
+#ifdef TARGET_NR_name_to_handle_at
+case TARGET_NR_name_to_handle_at:
+{
+struct file_handle *fh;
+uint32_t sz;
+int mount_id;
+
+p = lock_user_string(arg2);
+if (!p) {
+goto efault;
+}
+
+if (get_user_u32(sz, arg3)) {
+unlock_user(p, arg2, 0);
+goto efault;
+}
+
+fh = lock_user(VERIFY_WRITE, arg3, sizeof(*fh) + sz, 1);
+if (!fh) {
+unlock_user(p, arg2, 0);
+goto efault;
+}
+
+ret = get_errno(name_to_handle_at(arg1, path(p), fh,
+  &mount_id, arg5));
+
+unlock_user(p, arg2, 0);
+unlock_user(p, arg3, sizeof(*fh) + sz);
+
+if (put_user_s32(mount_id, arg4)) {
+goto efault;
+}
+}
+break;
+#endif
+#ifdef TARGET_NR_open_by_handle_at
+case TARGET_NR_open_by_handle_at:
+{
+struct file_handle *fh;
+uint32_t sz;
+
+if (get_user_u32(sz, arg2)) {
+goto efault;
+}
+
+fh = lock_user(VERIFY_WRITE, arg2, sizeof(*fh) + sz, 1);
+if (!fh) {
+goto efault;
+}
+
+ret = get_errno(open_by_handle_at(arg1, fh,
+target_to_host_bitmask(arg3, fcntl_flags_tbl)));
+
+unlock_user(p, arg2, sizeof(*fh) + sz);
+}
+break;
+#endif
 case TARGET_NR_close:
 ret = get_errno(close(arg1));
 break;
-- 
2.0.0




[Qemu-devel] [PATCH v3 10/16] linux-user: support timerfd_{create, gettime, settime} syscalls

2014-06-22 Thread Paul Burton
Adds support for the timerfd_create, timerfd_gettime & timerfd_settime
syscalls, allowing use of timerfds by target programs.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - Fix coding style, checkpatch clean.

Changes in v2:
  - None.
---
 linux-user/strace.list |  9 +
 linux-user/syscall.c   | 45 +
 2 files changed, 54 insertions(+)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index fcb258d..8de972a 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1404,6 +1404,15 @@
 #ifdef TARGET_NR_timer_settime
 { TARGET_NR_timer_settime, "timer_settime" , NULL, NULL, NULL },
 #endif
+#ifdef TARGET_NR_timerfd_create
+{ TARGET_NR_timerfd_create, "timerfd_create" , NULL, NULL, NULL },
+#endif
+#ifdef TARGET_NR_timerfd_gettime
+{ TARGET_NR_timerfd_gettime, "timerfd_gettime" , NULL, NULL, NULL },
+#endif
+#ifdef TARGET_NR_timerfd_settime
+{ TARGET_NR_timerfd_settime, "timerfd_settime" , NULL, NULL, NULL },
+#endif
 #ifdef TARGET_NR_times
 { TARGET_NR_times, "times" , NULL, NULL, NULL },
 #endif
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 700cfec..0d8cd0b 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -58,6 +58,7 @@ int __clone2(int (*fn)(void *), void *child_stack_base,
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 //#include 
@@ -9443,6 +9444,50 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 }
 #endif
 
+#ifdef TARGET_NR_timerfd_create
+case TARGET_NR_timerfd_create:
+ret = get_errno(timerfd_create(arg1,
+target_to_host_bitmask(arg2, fcntl_flags_tbl)));
+break;
+#endif
+
+#ifdef TARGET_NR_timerfd_gettime
+case TARGET_NR_timerfd_gettime:
+{
+struct itimerspec its_curr;
+
+ret = get_errno(timerfd_gettime(arg1, &its_curr));
+
+if (arg2 && host_to_target_itimerspec(arg2, &its_curr)) {
+goto efault;
+}
+}
+break;
+#endif
+
+#ifdef TARGET_NR_timerfd_settime
+case TARGET_NR_timerfd_settime:
+{
+struct itimerspec its_new, its_old, *p_new;
+
+if (arg3) {
+if (target_to_host_itimerspec(&its_new, arg3)) {
+goto efault;
+}
+p_new = &its_new;
+} else {
+p_new = NULL;
+}
+
+ret = get_errno(timerfd_settime(arg1, arg2, p_new, &its_old));
+
+if (arg4 && host_to_target_itimerspec(arg4, &its_old)) {
+goto efault;
+}
+}
+break;
+#endif
+
 default:
 unimplemented:
 gemu_log("qemu: Unsupported syscall: %d\n", num);
-- 
2.0.0




[Qemu-devel] [PATCH v3 14/16] linux-user: support the unshare syscall

2014-06-22 Thread Paul Burton
Add support for the unshare syscall, trivially passed through to the
host.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - None.

Changes in v2:
  - None.
---
 linux-user/syscall.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b630ef3..876e557 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9569,6 +9569,12 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 break;
 #endif
 
+#ifdef TARGET_NR_unshare
+case TARGET_NR_unshare:
+ret = get_errno(unshare(arg1));
+break;
+#endif
+
 default:
 unimplemented:
 gemu_log("qemu: Unsupported syscall: %d\n", num);
-- 
2.0.0




[Qemu-devel] [PATCH v3 15/16] linux-user: support the KDSIGACCEPT ioctl

2014-06-22 Thread Paul Burton
Add a definition of the KDSIGACCEPT ioctl & allow its use by target
programs.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - Translate signal number to host value.

Changes in v2:
  - None.
---
 linux-user/ioctls.h   | 1 +
 linux-user/syscall.c  | 7 +++
 linux-user/syscall_defs.h | 1 +
 3 files changed, 9 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index 309fb21..f278d3e 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -64,6 +64,7 @@
  IOCTL(KDSKBLED, 0, TYPE_INT)
  IOCTL(KDGETLED, 0, TYPE_INT)
  IOCTL(KDSETLED, 0, TYPE_INT)
+ IOCTL_SPECIAL(KDSIGACCEPT, 0, do_ioctl_kdsigaccept, TYPE_INT)
 
  IOCTL(BLKROSET, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(BLKROGET, IOC_R, MK_PTR(TYPE_INT))
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 876e557..7765658 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -3696,6 +3696,13 @@ static abi_long do_ioctl_rt(const IOCTLEntry *ie, 
uint8_t *buf_temp,
 return ret;
 }
 
+static abi_long do_ioctl_kdsigaccept(const IOCTLEntry *ie, uint8_t *buf_temp,
+ int fd, abi_long cmd, abi_long arg)
+{
+int sig = target_to_host_signal(arg);
+return get_errno(ioctl(fd, ie->host_cmd, sig));
+}
+
 static IOCTLEntry ioctl_entries[] = {
 #define IOCTL(cmd, access, ...) \
 { TARGET_ ## cmd, cmd, #cmd, access, 0, {  __VA_ARGS__ } },
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index a1f1fce..4adfd3a 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -831,6 +831,7 @@ struct target_pollfd {
 #define TARGET_KDSKBLED0x4B65  /* set led flags (not lights) */
 #define TARGET_KDGETLED0x4B31  /* return current led state */
 #define TARGET_KDSETLED0x4B32  /* set led state [lights, not flags] */
+#define TARGET_KDSIGACCEPT 0x4B4E
 
 #define TARGET_SIOCATMARK  0x8905
 
-- 
2.0.0




Re: [Qemu-devel] [PATCH 2/3] hw/pcie: implement power controller functionality

2014-06-22 Thread Marcel Apfelbaum
On Thu, 2014-06-19 at 17:39 +0300, Michael S. Tsirkin wrote:
> On Thu, Jun 19, 2014 at 04:52:20PM +0300, Marcel Apfelbaum wrote:
> > It is needed by hot-unplug in order to get an indication
> > from the OS when the device can be physically detached.
> > 
> > Signed-off-by: Marcel Apfelbaum 
> > ---
> >  hw/pci/pcie.c  | 15 ++-
> >  include/hw/pci/pcie_regs.h |  2 ++
> >  2 files changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index ae92f00..f8bf515 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -292,16 +292,19 @@ void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot)
> > PCI_EXP_SLTCAP_HPC |
> > PCI_EXP_SLTCAP_PIP |
> > PCI_EXP_SLTCAP_AIP |
> > +   PCI_EXP_SLTCAP_PCP |
> > PCI_EXP_SLTCAP_ABP);
> >  
> >  pci_word_test_and_clear_mask(dev->config + pos + PCI_EXP_SLTCTL,
> >   PCI_EXP_SLTCTL_PIC |
> > + PCI_EXP_SLTCTL_PCC |
> >   PCI_EXP_SLTCTL_AIC);
> >  pci_word_test_and_set_mask(dev->config + pos + PCI_EXP_SLTCTL,
> > PCI_EXP_SLTCTL_PIC_OFF |
> > PCI_EXP_SLTCTL_AIC_OFF);
> >  pci_word_test_and_set_mask(dev->wmask + pos + PCI_EXP_SLTCTL,
> > PCI_EXP_SLTCTL_PIC |
> > +   PCI_EXP_SLTCTL_PCC |
> > PCI_EXP_SLTCTL_AIC |
> > PCI_EXP_SLTCTL_HPIE |
> > PCI_EXP_SLTCTL_CCIE |
> 
> Need to disable for compat types?
Does Paolo's explanation answer your question?

> 
> 
> > @@ -327,21 +330,31 @@ void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot)
> >  void pcie_cap_slot_reset(PCIDevice *dev)
> >  {
> >  uint8_t *exp_cap = dev->config + dev->exp.exp_cap;
> > +PCIDevice *slot_dev = 
> > pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
> 
> What does this mean?
> Downstream port?
Yes

> A switch can have multiple downstream ports at any slot #.
It doesn't matter how many devices are under this slot, we need only
one to power up the slot. The question here was "Do we have at least
one device attahc to slot? If yes, power up the slot."
  
> 
> > +int pic;
> 
> uint16_t please.
Sure,

> 
> > +
> > +pic = slot_dev ? PCI_EXP_SLTCTL_PIC_ON : PCI_EXP_SLTCTL_PIC_OFF;
> >  
> >  PCIE_DEV_PRINTF(dev, "reset\n");
> >  
> >  pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTCTL,
> >   PCI_EXP_SLTCTL_EIC |
> >   PCI_EXP_SLTCTL_PIC |
> > + PCI_EXP_SLTCTL_PCC |
> >   PCI_EXP_SLTCTL_AIC |
> >   PCI_EXP_SLTCTL_HPIE |
> >   PCI_EXP_SLTCTL_CCIE |
> >   PCI_EXP_SLTCTL_PDCE |
> >   PCI_EXP_SLTCTL_ABPE);
> >  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
> > -   PCI_EXP_SLTCTL_PIC_OFF |
> > +   pic |
> > PCI_EXP_SLTCTL_AIC_OFF);
> >  
> > +if (!slot_dev) {
> > +pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
> > +   PCI_EXP_SLTCTL_PCC);
> 
> I dislike it when we clear bits then set them back.
> Please just add else here.
Sure,

> 
> > +}
> > +
> 
> Need to disable for compat types?
As above,

Thanks for the review,
Marcel

> 
> >  pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> >   PCI_EXP_SLTSTA_EIS |/* on reset,
> >  the lock is 
> > released */
> > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> > index 4d123d9..652d9fc 100644
> > --- a/include/hw/pci/pcie_regs.h
> > +++ b/include/hw/pci/pcie_regs.h
> > @@ -57,6 +57,8 @@
> >  #define PCI_EXP_SLTCTL_PIC_SHIFT(ffs(PCI_EXP_SLTCTL_PIC) - 1)
> >  #define PCI_EXP_SLTCTL_PIC_OFF  \
> >  (PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_PIC_SHIFT)
> > +#define PCI_EXP_SLTCTL_PIC_ON  \
> > +(PCI_EXP_SLTCTL_IND_ON << PCI_EXP_SLTCTL_PIC_SHIFT)
> >  
> >  #define PCI_EXP_SLTCTL_SUPPORTED\
> >  (PCI_EXP_SLTCTL_ABPE |  \
> > -- 
> > 1.8.3.1






[Qemu-devel] [PATCH v3 07/16] linux-user: fix struct target_epoll_event layout for MIPS

2014-06-22 Thread Paul Burton
MIPS requires the pad field to 64b-align the data field just as ARM
does.

Signed-off-by: Paul Burton 
---
Changes in v3:
  - None.

Changes in v2:
  - Apply the same padding for the mips64 target.
---
 linux-user/syscall_defs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 69c3982..e379b45 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -2528,7 +2528,7 @@ typedef union target_epoll_data {
 
 struct target_epoll_event {
 uint32_t events;
-#ifdef TARGET_ARM
+#if defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_MIPS64)
 uint32_t __pad;
 #endif
 target_epoll_data_t data;
-- 
2.0.0




Re: [Qemu-devel] [PATCH 2/3] hw/pcie: implement power controller functionality

2014-06-22 Thread Michael S. Tsirkin
On Sun, Jun 22, 2014 at 01:47:24PM +0300, Marcel Apfelbaum wrote:
> On Thu, 2014-06-19 at 17:39 +0300, Michael S. Tsirkin wrote:
> > On Thu, Jun 19, 2014 at 04:52:20PM +0300, Marcel Apfelbaum wrote:
> > > It is needed by hot-unplug in order to get an indication
> > > from the OS when the device can be physically detached.
> > > 
> > > Signed-off-by: Marcel Apfelbaum 
> > > ---
> > >  hw/pci/pcie.c  | 15 ++-
> > >  include/hw/pci/pcie_regs.h |  2 ++
> > >  2 files changed, 16 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > index ae92f00..f8bf515 100644
> > > --- a/hw/pci/pcie.c
> > > +++ b/hw/pci/pcie.c
> > > @@ -292,16 +292,19 @@ void pcie_cap_slot_init(PCIDevice *dev, uint16_t 
> > > slot)
> > > PCI_EXP_SLTCAP_HPC |
> > > PCI_EXP_SLTCAP_PIP |
> > > PCI_EXP_SLTCAP_AIP |
> > > +   PCI_EXP_SLTCAP_PCP |
> > > PCI_EXP_SLTCAP_ABP);
> > >  
> > >  pci_word_test_and_clear_mask(dev->config + pos + PCI_EXP_SLTCTL,
> > >   PCI_EXP_SLTCTL_PIC |
> > > + PCI_EXP_SLTCTL_PCC |
> > >   PCI_EXP_SLTCTL_AIC);
> > >  pci_word_test_and_set_mask(dev->config + pos + PCI_EXP_SLTCTL,
> > > PCI_EXP_SLTCTL_PIC_OFF |
> > > PCI_EXP_SLTCTL_AIC_OFF);
> > >  pci_word_test_and_set_mask(dev->wmask + pos + PCI_EXP_SLTCTL,
> > > PCI_EXP_SLTCTL_PIC |
> > > +   PCI_EXP_SLTCTL_PCC |
> > > PCI_EXP_SLTCTL_AIC |
> > > PCI_EXP_SLTCTL_HPIE |
> > > PCI_EXP_SLTCTL_CCIE |
> > 
> > Need to disable for compat types?
> Does Paolo's explanation answer your question?

Kind of hacky - we do have compat work for
q35. So I'd prefer consistency.

> > 
> > 
> > > @@ -327,21 +330,31 @@ void pcie_cap_slot_init(PCIDevice *dev, uint16_t 
> > > slot)
> > >  void pcie_cap_slot_reset(PCIDevice *dev)
> > >  {
> > >  uint8_t *exp_cap = dev->config + dev->exp.exp_cap;
> > > +PCIDevice *slot_dev = 
> > > pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
> > 
> > What does this mean?
> > Downstream port?
> Yes

well it's not really clear, needs a comment.

> > A switch can have multiple downstream ports at any slot #.
> It doesn't matter how many devices are under this slot, we need only
> one to power up the slot. The question here was "Do we have at least
> one device attahc to slot? If yes, power up the slot."

You need a loop for that I think. There's no guarantee the
device is at devfn=0.

> > 
> > > +int pic;
> > 
> > uint16_t please.
> Sure,
> 
> > 
> > > +
> > > +pic = slot_dev ? PCI_EXP_SLTCTL_PIC_ON : PCI_EXP_SLTCTL_PIC_OFF;
> > >  
> > >  PCIE_DEV_PRINTF(dev, "reset\n");
> > >  
> > >  pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTCTL,
> > >   PCI_EXP_SLTCTL_EIC |
> > >   PCI_EXP_SLTCTL_PIC |
> > > + PCI_EXP_SLTCTL_PCC |
> > >   PCI_EXP_SLTCTL_AIC |
> > >   PCI_EXP_SLTCTL_HPIE |
> > >   PCI_EXP_SLTCTL_CCIE |
> > >   PCI_EXP_SLTCTL_PDCE |
> > >   PCI_EXP_SLTCTL_ABPE);
> > >  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
> > > -   PCI_EXP_SLTCTL_PIC_OFF |
> > > +   pic |
> > > PCI_EXP_SLTCTL_AIC_OFF);
> > >  
> > > +if (!slot_dev) {
> > > +pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
> > > +   PCI_EXP_SLTCTL_PCC);
> > 
> > I dislike it when we clear bits then set them back.
> > Please just add else here.
> Sure,
> 
> > 
> > > +}
> > > +
> > 
> > Need to disable for compat types?
> As above,
> 
> Thanks for the review,
> Marcel
> 
> > 
> > >  pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > >   PCI_EXP_SLTSTA_EIS |/* on reset,
> > >  the lock is 
> > > released */
> > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> > > index 4d123d9..652d9fc 100644
> > > --- a/include/hw/pci/pcie_regs.h
> > > +++ b/include/hw/pci/pcie_regs.h
> > > @@ -57,6 +57,8 @@
> > >  #define PCI_EXP_SLTCTL_PIC_SHIFT(ffs(PCI_EXP_SLTCTL_PIC) - 1)
> > >  #define PCI_EXP_SLTCTL_PIC_OFF  \
> > >  (PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_PIC_SHIFT)
> > > +#define PCI_EXP_SLTCTL_PIC_ON  \
> > > +(PCI_EXP_SLTCTL_IND_ON << PCI_EXP_SLTCTL_PIC

Re: [Qemu-devel] [PATCH for-2.1 0/2] qemu-char: fixes to the new get_msgfds API

2014-06-22 Thread Michael S. Tsirkin
On Sun, Jun 22, 2014 at 10:38:35AM +0800, Stefan Hajnoczi wrote:
> The recent changes to extend qemu-char get_msgfds to support multiple file
> descriptors caused a qemu-iotests regression and also introduced a file
> descriptor leak.  This patch series fixes the bugs.

Applied, thanks!

> Stefan Hajnoczi (2):
>   qemu-char: fix qemu_chr_fe_get_msgfd()
>   qemu-char: avoid leaking unused fds in tcp_get_msgfds()
> 
>  qemu-char.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> -- 
> 1.9.3



Re: [Qemu-devel] [PATCH 3/3] hw/pcie: better hotplug/hotunplug support

2014-06-22 Thread Marcel Apfelbaum
On Thu, 2014-06-19 at 17:43 +0300, Michael S. Tsirkin wrote:
> On Thu, Jun 19, 2014 at 04:52:21PM +0300, Marcel Apfelbaum wrote:
> > Hotplug triggers both 'present detect change' and
> > 'attention button pressed'.
> > 
> > Hotunplug starts by triggering 'attention button pressed',
> > then waits for the OS to power off the device and only
> > then detaches it.
> > 
> 
> Pls not here that current code is broken: it does surprise removal which
> crashes guests.
I'll add a note, sure.

> 
> > Signed-off-by: Marcel Apfelbaum 
> > ---
> >  hw/pci/pcie.c | 20 +++-
> >  1 file changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index f8bf515..9cfd93d 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -258,7 +258,8 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  
> >  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > PCI_EXP_SLTSTA_PDS);
> > -pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), PCI_EXP_HP_EV_PDC);
> > +pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> > +PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> >  }
> >  
> >  void pcie_cap_slot_hot_unplug_cb(HotplugHandler *hotplug_dev, DeviceState 
> > *dev,
> > @@ -268,10 +269,7 @@ void pcie_cap_slot_hot_unplug_cb(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  
> >  pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, 
> > errp);
> >  
> > -object_unparent(OBJECT(dev));
> > -pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > - PCI_EXP_SLTSTA_PDS);
> > -pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), PCI_EXP_HP_EV_PDC);
> > +pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
> >  }
> >  
> >  /* pci express slot for pci express root/downstream port
> > @@ -389,6 +387,18 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
> >  sltsta);
> >  }
> >  
> 
> pls add code comments explaining the logic here.
I thought is clear :(
Basically: If the device is present, power indicator off and power
controller off, it is safe to detach the device.
> 
> > +if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
> > +((val & PCI_EXP_SLTCTL_PIC_OFF) == PCI_EXP_SLTCTL_PIC_OFF)) {
> > +PCIDevice *slot_dev = 
> > pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
> > +if (slot_dev) {
> 
> Here you want to remove all devices behind the bridge?
Yes, but since it is PCIe we only have one device,
but we may have a multi-function device...

> You need to do this for all functions, not just function 0.
So bus devices are actually functions... OK, I'll run a loop here.

> 
> > +object_unparent(OBJECT(slot_dev));
> > +pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > + PCI_EXP_SLTSTA_PDS);
> > +pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > +   PCI_EXP_SLTSTA_PDC);
> 
> These bits need to be cleared in any case?
No, "PDS on" means the devices is present, so we clear it only after the
OS  powers it off.

> 
> > +}
> > +}
> > +
> >  hotplug_event_notify(dev);
> >  
> >  /* 
> > -- 
> > 1.8.3.1






Re: [Qemu-devel] [PATCH v2] mc146818rtc: add rtc-reset-reinjection QMP command

2014-06-22 Thread Michael S. Tsirkin
On Tue, Jun 17, 2014 at 07:48:36PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 17, 2014 at 12:47:51PM -0300, Marcelo Tosatti wrote:
> > 
> > It is necessary to reset RTC interrupt reinjection backlog if
> > guest time is synchronized via a different mechanism, such as
> > QGA's guest-set-time command.
> > 
> > Failing to do so causes both corrections to be applied (summed),
> > resulting in an incorrect guest time.
> > 
> > Signed-off-by: Marcelo Tosatti 
> 
> Fails build for non x86 target:
> 
>   LINK  arm-softmmu/qemu-system-arm
> ../qmp-marshal.o: In function `qmp_marshal_input_rtc_reset_reinjection':
> /scm/qemu/qmp-marshal.c:5059: undefined reference to
> `qmp_rtc_reset_reinjection'
> collect2: error: ld returned 1 exit status
> make[1]: *** [qemu-system-arm] Error 1
> make: *** [subdir-arm-softmmu] Error 2

Ping.
Do you plan to fix and re-post the patch?

> 
> > diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
> > index df54546..1782c4e 100644
> > --- a/hw/timer/mc146818rtc.c
> > +++ b/hw/timer/mc146818rtc.c
> > @@ -26,6 +26,7 @@
> >  #include "sysemu/sysemu.h"
> >  #include "hw/timer/mc146818rtc.h"
> >  #include "qapi/visitor.h"
> > +#include "qmp-commands.h"
> >  
> >  #ifdef TARGET_I386
> >  #include "hw/i386/apic.h"
> > @@ -84,6 +85,7 @@ typedef struct RTCState {
> >  Notifier clock_reset_notifier;
> >  LostTickPolicy lost_tick_policy;
> >  Notifier suspend_notifier;
> > +QLIST_ENTRY(RTCState) link;
> >  } RTCState;
> >  
> >  static void rtc_set_time(RTCState *s);
> > @@ -522,6 +524,20 @@ static void rtc_get_time(RTCState *s, struct tm *tm)
> >  rtc_from_bcd(s, s->cmos_data[RTC_CENTURY]) * 100 - 1900;
> >  }
> >  
> > +static QLIST_HEAD(, RTCState) rtc_devices =
> > +QLIST_HEAD_INITIALIZER(rtc_devices);
> > +
> > +#ifdef TARGET_I386
> > +void qmp_rtc_reset_reinjection(Error **errp)
> > +{
> > +RTCState *s;
> > +
> > +QLIST_FOREACH(s, &rtc_devices, link) {
> > +s->irq_coalesced = 0;
> > +}
> > +}
> > +#endif
> > +
> >  static void rtc_set_time(RTCState *s)
> >  {
> >  struct tm tm;
> > @@ -911,6 +927,8 @@ ISADevice *rtc_init(ISABus *bus, int base_year, 
> > qemu_irq intercept_irq)
> >  } else {
> >  isa_init_irq(isadev, &s->irq, RTC_ISA_IRQ);
> >  }
> > +QLIST_INSERT_HEAD(&rtc_devices, s, link);
> > +
> >  return isadev;
> >  }
> >  
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 7bc33ea..a4e2c21 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -4722,3 +4722,15 @@
> >'btn' : 'InputBtnEvent',
> >'rel' : 'InputMoveEvent',
> >'abs' : 'InputMoveEvent' } }
> > +
> > +##
> > +# @rtc-reset-reinjection
> > +#
> > +# This command will reset the RTC interrupt reinjection backlog.
> > +# Can be used if another mechanism to synchronize guest time
> > +# is in effect, for example QEMU guest agent's guest-set-time
> > +# command.
> > +#
> > +# Since: 2.1
> > +##
> > +{ 'command': 'rtc-reset-reinjection' }
> > diff --git a/qmp-commands.hx b/qmp-commands.hx
> > index d8aa4ed..dc43789 100644
> > --- a/qmp-commands.hx
> > +++ b/qmp-commands.hx
> > @@ -3572,3 +3572,26 @@ Example:
> > } } ] }
> >  
> >  EQMP
> > +
> > +#if defined (TARGET_I386)
> > +{
> > +.name   = "rtc-reset-reinjection",
> > +.args_type  = "",
> > +.mhandler.cmd_new = qmp_marshal_input_rtc_reset_reinjection,
> > +},
> > +#endif
> > +
> > +SQMP
> > +rtc-reset-reinjection
> > +-
> > +
> > +Reset the RTC interrupt reinjection backlog.
> > +
> > +Arguments: None.
> > +
> > +Example:
> > +
> > +-> { "execute": "rtc-reset-reinjection" }
> > +<- { "return": {} }
> > +
> > +EQMP



Re: [Qemu-devel] [PATCH 3/3] hw/pcie: better hotplug/hotunplug support

2014-06-22 Thread Michael S. Tsirkin
On Sun, Jun 22, 2014 at 01:54:06PM +0300, Marcel Apfelbaum wrote:
> On Thu, 2014-06-19 at 17:43 +0300, Michael S. Tsirkin wrote:
> > On Thu, Jun 19, 2014 at 04:52:21PM +0300, Marcel Apfelbaum wrote:
> > > Hotplug triggers both 'present detect change' and
> > > 'attention button pressed'.
> > > 
> > > Hotunplug starts by triggering 'attention button pressed',
> > > then waits for the OS to power off the device and only
> > > then detaches it.
> > > 
> > 
> > Pls not here that current code is broken: it does surprise removal which
> > crashes guests.
> I'll add a note, sure.
> 
> > 
> > > Signed-off-by: Marcel Apfelbaum 
> > > ---
> > >  hw/pci/pcie.c | 20 +++-
> > >  1 file changed, 15 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > index f8bf515..9cfd93d 100644
> > > --- a/hw/pci/pcie.c
> > > +++ b/hw/pci/pcie.c
> > > @@ -258,7 +258,8 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler 
> > > *hotplug_dev, DeviceState *dev,
> > >  
> > >  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > > PCI_EXP_SLTSTA_PDS);
> > > -pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), PCI_EXP_HP_EV_PDC);
> > > +pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> > > +PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> > >  }
> > >  
> > >  void pcie_cap_slot_hot_unplug_cb(HotplugHandler *hotplug_dev, 
> > > DeviceState *dev,
> > > @@ -268,10 +269,7 @@ void pcie_cap_slot_hot_unplug_cb(HotplugHandler 
> > > *hotplug_dev, DeviceState *dev,
> > >  
> > >  pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, 
> > > errp);
> > >  
> > > -object_unparent(OBJECT(dev));
> > > -pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > > - PCI_EXP_SLTSTA_PDS);
> > > -pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), PCI_EXP_HP_EV_PDC);
> > > +pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
> > >  }
> > >  
> > >  /* pci express slot for pci express root/downstream port
> > > @@ -389,6 +387,18 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
> > >  sltsta);
> > >  }
> > >  
> > 
> > pls add code comments explaining the logic here.
> I thought is clear :(
> Basically: If the device is present, power indicator off and power
> controller off, it is safe to detach the device.

Sure, put this in the comment.

> > 
> > > +if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
> > > +((val & PCI_EXP_SLTCTL_PIC_OFF) == PCI_EXP_SLTCTL_PIC_OFF)) {
> > > +PCIDevice *slot_dev = 
> > > pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
> > > +if (slot_dev) {
> > 
> > Here you want to remove all devices behind the bridge?
> Yes, but since it is PCIe we only have one device,
> but we may have a multi-function device...

Exactly. Up to 256 functions.

> > You need to do this for all functions, not just function 0.
> So bus devices are actually functions... OK, I'll run a loop here.
> 
> > 
> > > +object_unparent(OBJECT(slot_dev));
> > > +pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > > + PCI_EXP_SLTSTA_PDS);
> > > +pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > > +   PCI_EXP_SLTSTA_PDC);
> > 
> > These bits need to be cleared in any case?
> No, "PDS on" means the devices is present, so we clear it only after the
> OS  powers it off.

So why not clear PDS unconditionally?

> > 
> > > +}
> > > +}
> > > +
> > >  hotplug_event_notify(dev);
> > >  
> > >  /* 
> > > -- 
> > > 1.8.3.1
> 
> 



Re: [Qemu-devel] [PATCH 2/3] hw/pcie: implement power controller functionality

2014-06-22 Thread Michael S. Tsirkin
On Sun, Jun 22, 2014 at 01:52:46PM +0300, Michael S. Tsirkin wrote:
> On Sun, Jun 22, 2014 at 01:47:24PM +0300, Marcel Apfelbaum wrote:
> > On Thu, 2014-06-19 at 17:39 +0300, Michael S. Tsirkin wrote:
> > > On Thu, Jun 19, 2014 at 04:52:20PM +0300, Marcel Apfelbaum wrote:
> > > > It is needed by hot-unplug in order to get an indication
> > > > from the OS when the device can be physically detached.
> > > > 
> > > > Signed-off-by: Marcel Apfelbaum 
> > > > ---
> > > >  hw/pci/pcie.c  | 15 ++-
> > > >  include/hw/pci/pcie_regs.h |  2 ++
> > > >  2 files changed, 16 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > index ae92f00..f8bf515 100644
> > > > --- a/hw/pci/pcie.c
> > > > +++ b/hw/pci/pcie.c
> > > > @@ -292,16 +292,19 @@ void pcie_cap_slot_init(PCIDevice *dev, uint16_t 
> > > > slot)
> > > > PCI_EXP_SLTCAP_HPC |
> > > > PCI_EXP_SLTCAP_PIP |
> > > > PCI_EXP_SLTCAP_AIP |
> > > > +   PCI_EXP_SLTCAP_PCP |
> > > > PCI_EXP_SLTCAP_ABP);
> > > >  
> > > >  pci_word_test_and_clear_mask(dev->config + pos + PCI_EXP_SLTCTL,
> > > >   PCI_EXP_SLTCTL_PIC |
> > > > + PCI_EXP_SLTCTL_PCC |
> > > >   PCI_EXP_SLTCTL_AIC);
> > > >  pci_word_test_and_set_mask(dev->config + pos + PCI_EXP_SLTCTL,
> > > > PCI_EXP_SLTCTL_PIC_OFF |
> > > > PCI_EXP_SLTCTL_AIC_OFF);
> > > >  pci_word_test_and_set_mask(dev->wmask + pos + PCI_EXP_SLTCTL,
> > > > PCI_EXP_SLTCTL_PIC |
> > > > +   PCI_EXP_SLTCTL_PCC |
> > > > PCI_EXP_SLTCTL_AIC |
> > > > PCI_EXP_SLTCTL_HPIE |
> > > > PCI_EXP_SLTCTL_CCIE |
> > > 
> > > Need to disable for compat types?
> > Does Paolo's explanation answer your question?
> 
> Kind of hacky - we do have compat work for
> q35. So I'd prefer consistency.
> 
> > > 
> > > 
> > > > @@ -327,21 +330,31 @@ void pcie_cap_slot_init(PCIDevice *dev, uint16_t 
> > > > slot)
> > > >  void pcie_cap_slot_reset(PCIDevice *dev)
> > > >  {
> > > >  uint8_t *exp_cap = dev->config + dev->exp.exp_cap;
> > > > +PCIDevice *slot_dev = 
> > > > pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
> > > 
> > > What does this mean?
> > > Downstream port?
> > Yes
> 
> well it's not really clear, needs a comment.
> 
> > > A switch can have multiple downstream ports at any slot #.
> > It doesn't matter how many devices are under this slot, we need only
> > one to power up the slot. The question here was "Do we have at least
> > one device attahc to slot? If yes, power up the slot."
> 
> You need a loop for that I think. There's no guarantee the
> device is at devfn=0.

Unless something guarantees this is a downstream port,
if yes maybe add an assert?
slot_dev is also not a very good variable name.
How about
bool populated = pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
?
This is all you care about ...
And maybe add a comment
/* Downstream ports enforce device number 0. */

> > > 
> > > > +int pic;
> > > 
> > > uint16_t please.
> > Sure,
> > 
> > > 
> > > > +
> > > > +pic = slot_dev ? PCI_EXP_SLTCTL_PIC_ON : PCI_EXP_SLTCTL_PIC_OFF;
> > > >  
> > > >  PCIE_DEV_PRINTF(dev, "reset\n");
> > > >  
> > > >  pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTCTL,
> > > >   PCI_EXP_SLTCTL_EIC |
> > > >   PCI_EXP_SLTCTL_PIC |
> > > > + PCI_EXP_SLTCTL_PCC |
> > > >   PCI_EXP_SLTCTL_AIC |
> > > >   PCI_EXP_SLTCTL_HPIE |
> > > >   PCI_EXP_SLTCTL_CCIE |
> > > >   PCI_EXP_SLTCTL_PDCE |
> > > >   PCI_EXP_SLTCTL_ABPE);
> > > >  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
> > > > -   PCI_EXP_SLTCTL_PIC_OFF |
> > > > +   pic |
> > > > PCI_EXP_SLTCTL_AIC_OFF);
> > > >  
> > > > +if (!slot_dev) {
> > > > +pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
> > > > +   PCI_EXP_SLTCTL_PCC);
> > > 
> > > I dislike it when we clear bits then set them back.
> > > Please just add else here.
> > Sure,
> > 
> > > 
> > > > +}
> > > > +
> > > 
> > > Need to disable for compat types?
> > As above,
> > 
> > Thanks for the review,
> > Marcel
> > 
> > > 
> > > >  pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > > >   PCI_EXP_SLTSTA_EIS |/* on reset,
> > > >

Re: [Qemu-devel] [PATCH 3/3] hw/pcie: better hotplug/hotunplug support

2014-06-22 Thread Marcel Apfelbaum
On Sun, 2014-06-22 at 14:03 +0300, Michael S. Tsirkin wrote:
> On Sun, Jun 22, 2014 at 01:54:06PM +0300, Marcel Apfelbaum wrote:
> > On Thu, 2014-06-19 at 17:43 +0300, Michael S. Tsirkin wrote:
> > > On Thu, Jun 19, 2014 at 04:52:21PM +0300, Marcel Apfelbaum wrote:
> > > > Hotplug triggers both 'present detect change' and
> > > > 'attention button pressed'.
> > > > 
> > > > Hotunplug starts by triggering 'attention button pressed',
> > > > then waits for the OS to power off the device and only
> > > > then detaches it.
> > > > 
> > > 
> > > Pls not here that current code is broken: it does surprise removal which
> > > crashes guests.
> > I'll add a note, sure.
> > 
> > > 
> > > > Signed-off-by: Marcel Apfelbaum 
> > > > ---
> > > >  hw/pci/pcie.c | 20 +++-
> > > >  1 file changed, 15 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > index f8bf515..9cfd93d 100644
> > > > --- a/hw/pci/pcie.c
> > > > +++ b/hw/pci/pcie.c
> > > > @@ -258,7 +258,8 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler 
> > > > *hotplug_dev, DeviceState *dev,
> > > >  
> > > >  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > PCI_EXP_SLTSTA_PDS);
> > > > -pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), PCI_EXP_HP_EV_PDC);
> > > > +pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> > > > +PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> > > >  }
> > > >  
> > > >  void pcie_cap_slot_hot_unplug_cb(HotplugHandler *hotplug_dev, 
> > > > DeviceState *dev,
> > > > @@ -268,10 +269,7 @@ void pcie_cap_slot_hot_unplug_cb(HotplugHandler 
> > > > *hotplug_dev, DeviceState *dev,
> > > >  
> > > >  pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, 
> > > > &exp_cap, errp);
> > > >  
> > > > -object_unparent(OBJECT(dev));
> > > > -pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > - PCI_EXP_SLTSTA_PDS);
> > > > -pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), PCI_EXP_HP_EV_PDC);
> > > > +pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
> > > >  }
> > > >  
> > > >  /* pci express slot for pci express root/downstream port
> > > > @@ -389,6 +387,18 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
> > > >  sltsta);
> > > >  }
> > > >  
> > > 
> > > pls add code comments explaining the logic here.
> > I thought is clear :(
> > Basically: If the device is present, power indicator off and power
> > controller off, it is safe to detach the device.
> 
> Sure, put this in the comment.
> 
> > > 
> > > > +if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
> > > > +((val & PCI_EXP_SLTCTL_PIC_OFF) == PCI_EXP_SLTCTL_PIC_OFF)) {
> > > > +PCIDevice *slot_dev = 
> > > > pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
> > > > +if (slot_dev) {
> > > 
> > > Here you want to remove all devices behind the bridge?
> > Yes, but since it is PCIe we only have one device,
> > but we may have a multi-function device...
> 
> Exactly. Up to 256 functions.
> 
> > > You need to do this for all functions, not just function 0.
> > So bus devices are actually functions... OK, I'll run a loop here.
> > 
> > > 
> > > > +object_unparent(OBJECT(slot_dev));
> > > > +pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > + PCI_EXP_SLTSTA_PDS);
> > > > +pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > +   PCI_EXP_SLTSTA_PDC);
> > > 
> > > These bits need to be cleared in any case?
> > No, "PDS on" means the devices is present, so we clear it only after the
> > OS  powers it off.
> 
> So why not clear PDS unconditionally?
Because we are 'allowed' to remove the device only when the OS tells us
the power is off. Before that, it can result in data loss,
since the device driver is not yet unloaded.


> > > 
> > > > +}
> > > > +}
> > > > +
> > > >  hotplug_event_notify(dev);
> > > >  
> > > >  /* 
> > > > -- 
> > > > 1.8.3.1
> > 
> > 






Re: [Qemu-devel] [PATCH 3/3] hw/pcie: better hotplug/hotunplug support

2014-06-22 Thread Michael S. Tsirkin
On Sun, Jun 22, 2014 at 02:11:05PM +0300, Marcel Apfelbaum wrote:
> On Sun, 2014-06-22 at 14:03 +0300, Michael S. Tsirkin wrote:
> > On Sun, Jun 22, 2014 at 01:54:06PM +0300, Marcel Apfelbaum wrote:
> > > On Thu, 2014-06-19 at 17:43 +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Jun 19, 2014 at 04:52:21PM +0300, Marcel Apfelbaum wrote:
> > > > > Hotplug triggers both 'present detect change' and
> > > > > 'attention button pressed'.
> > > > > 
> > > > > Hotunplug starts by triggering 'attention button pressed',
> > > > > then waits for the OS to power off the device and only
> > > > > then detaches it.
> > > > > 
> > > > 
> > > > Pls not here that current code is broken: it does surprise removal which
> > > > crashes guests.
> > > I'll add a note, sure.
> > > 
> > > > 
> > > > > Signed-off-by: Marcel Apfelbaum 
> > > > > ---
> > > > >  hw/pci/pcie.c | 20 +++-
> > > > >  1 file changed, 15 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > index f8bf515..9cfd93d 100644
> > > > > --- a/hw/pci/pcie.c
> > > > > +++ b/hw/pci/pcie.c
> > > > > @@ -258,7 +258,8 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler 
> > > > > *hotplug_dev, DeviceState *dev,
> > > > >  
> > > > >  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > > PCI_EXP_SLTSTA_PDS);
> > > > > -pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), PCI_EXP_HP_EV_PDC);
> > > > > +pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> > > > > +PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> > > > >  }
> > > > >  
> > > > >  void pcie_cap_slot_hot_unplug_cb(HotplugHandler *hotplug_dev, 
> > > > > DeviceState *dev,
> > > > > @@ -268,10 +269,7 @@ void pcie_cap_slot_hot_unplug_cb(HotplugHandler 
> > > > > *hotplug_dev, DeviceState *dev,
> > > > >  
> > > > >  pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, 
> > > > > &exp_cap, errp);
> > > > >  
> > > > > -object_unparent(OBJECT(dev));
> > > > > -pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > > - PCI_EXP_SLTSTA_PDS);
> > > > > -pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), PCI_EXP_HP_EV_PDC);
> > > > > +pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
> > > > >  }
> > > > >  
> > > > >  /* pci express slot for pci express root/downstream port
> > > > > @@ -389,6 +387,18 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
> > > > >  sltsta);
> > > > >  }
> > > > >  
> > > > 
> > > > pls add code comments explaining the logic here.
> > > I thought is clear :(
> > > Basically: If the device is present, power indicator off and power
> > > controller off, it is safe to detach the device.
> > 
> > Sure, put this in the comment.
> > 
> > > > 
> > > > > +if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) 
> > > > > &&
> > > > > +((val & PCI_EXP_SLTCTL_PIC_OFF) == PCI_EXP_SLTCTL_PIC_OFF)) {
> > > > > +PCIDevice *slot_dev = 
> > > > > pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
> > > > > +if (slot_dev) {
> > > > 
> > > > Here you want to remove all devices behind the bridge?
> > > Yes, but since it is PCIe we only have one device,
> > > but we may have a multi-function device...
> > 
> > Exactly. Up to 256 functions.
> > 
> > > > You need to do this for all functions, not just function 0.
> > > So bus devices are actually functions... OK, I'll run a loop here.
> > > 
> > > > 
> > > > > +object_unparent(OBJECT(slot_dev));
> > > > > +pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > > + PCI_EXP_SLTSTA_PDS);
> > > > > +pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > > +   PCI_EXP_SLTSTA_PDC);
> > > > 
> > > > These bits need to be cleared in any case?
> > > No, "PDS on" means the devices is present, so we clear it only after the
> > > OS  powers it off.
> > 
> > So why not clear PDS unconditionally?
> Because we are 'allowed' to remove the device only when the OS tells us
> the power is off. Before that, it can result in data loss,
> since the device driver is not yet unloaded.

Yes. But if (slot_dev) seems not to be needed:
if slot is empty we can clear PDC.



> 
> > > > 
> > > > > +}
> > > > > +}
> > > > > +
> > > > >  hotplug_event_notify(dev);
> > > > >  
> > > > >  /* 
> > > > > -- 
> > > > > 1.8.3.1
> > > 
> > > 
> 
> 



Re: [Qemu-devel] [PATCH 2/3] hw/pcie: implement power controller functionality

2014-06-22 Thread Marcel Apfelbaum
On Sun, 2014-06-22 at 14:11 +0300, Michael S. Tsirkin wrote:
> On Sun, Jun 22, 2014 at 01:52:46PM +0300, Michael S. Tsirkin wrote:
> > On Sun, Jun 22, 2014 at 01:47:24PM +0300, Marcel Apfelbaum wrote:
> > > On Thu, 2014-06-19 at 17:39 +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Jun 19, 2014 at 04:52:20PM +0300, Marcel Apfelbaum wrote:
> > > > > It is needed by hot-unplug in order to get an indication
> > > > > from the OS when the device can be physically detached.
> > > > > 
> > > > > Signed-off-by: Marcel Apfelbaum 
> > > > > ---
> > > > >  hw/pci/pcie.c  | 15 ++-
> > > > >  include/hw/pci/pcie_regs.h |  2 ++
> > > > >  2 files changed, 16 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > index ae92f00..f8bf515 100644
> > > > > --- a/hw/pci/pcie.c
> > > > > +++ b/hw/pci/pcie.c
> > > > > @@ -292,16 +292,19 @@ void pcie_cap_slot_init(PCIDevice *dev, 
> > > > > uint16_t slot)
> > > > > PCI_EXP_SLTCAP_HPC |
> > > > > PCI_EXP_SLTCAP_PIP |
> > > > > PCI_EXP_SLTCAP_AIP |
> > > > > +   PCI_EXP_SLTCAP_PCP |
> > > > > PCI_EXP_SLTCAP_ABP);
> > > > >  
> > > > >  pci_word_test_and_clear_mask(dev->config + pos + PCI_EXP_SLTCTL,
> > > > >   PCI_EXP_SLTCTL_PIC |
> > > > > + PCI_EXP_SLTCTL_PCC |
> > > > >   PCI_EXP_SLTCTL_AIC);
> > > > >  pci_word_test_and_set_mask(dev->config + pos + PCI_EXP_SLTCTL,
> > > > > PCI_EXP_SLTCTL_PIC_OFF |
> > > > > PCI_EXP_SLTCTL_AIC_OFF);
> > > > >  pci_word_test_and_set_mask(dev->wmask + pos + PCI_EXP_SLTCTL,
> > > > > PCI_EXP_SLTCTL_PIC |
> > > > > +   PCI_EXP_SLTCTL_PCC |
> > > > > PCI_EXP_SLTCTL_AIC |
> > > > > PCI_EXP_SLTCTL_HPIE |
> > > > > PCI_EXP_SLTCTL_CCIE |
> > > > 
> > > > Need to disable for compat types?
> > > Does Paolo's explanation answer your question?
> > 
> > Kind of hacky - we do have compat work for
> > q35. So I'd prefer consistency.
> > 
> > > > 
> > > > 
> > > > > @@ -327,21 +330,31 @@ void pcie_cap_slot_init(PCIDevice *dev, 
> > > > > uint16_t slot)
> > > > >  void pcie_cap_slot_reset(PCIDevice *dev)
> > > > >  {
> > > > >  uint8_t *exp_cap = dev->config + dev->exp.exp_cap;
> > > > > +PCIDevice *slot_dev = 
> > > > > pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
> > > > 
> > > > What does this mean?
> > > > Downstream port?
> > > Yes
> > 
> > well it's not really clear, needs a comment.
> > 
> > > > A switch can have multiple downstream ports at any slot #.
> > > It doesn't matter how many devices are under this slot, we need only
> > > one to power up the slot. The question here was "Do we have at least
> > > one device attahc to slot? If yes, power up the slot."
> > 
> > You need a loop for that I think. There's no guarantee the
> > device is at devfn=0.
> 
> Unless something guarantees this is a downstream port,
> if yes maybe add an assert?
> slot_dev is also not a very good variable name.
> How about
>   bool populated = pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
> ?
> This is all you care about ...
> And maybe add a comment
> /* Downstream ports enforce device number 0. */
I'll use that, thanks!
Marcel

> 
> > > > 
> > > > > +int pic;
> > > > 
> > > > uint16_t please.
> > > Sure,
> > > 
> > > > 
> > > > > +
> > > > > +pic = slot_dev ? PCI_EXP_SLTCTL_PIC_ON : PCI_EXP_SLTCTL_PIC_OFF;
> > > > >  
> > > > >  PCIE_DEV_PRINTF(dev, "reset\n");
> > > > >  
> > > > >  pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTCTL,
> > > > >   PCI_EXP_SLTCTL_EIC |
> > > > >   PCI_EXP_SLTCTL_PIC |
> > > > > + PCI_EXP_SLTCTL_PCC |
> > > > >   PCI_EXP_SLTCTL_AIC |
> > > > >   PCI_EXP_SLTCTL_HPIE |
> > > > >   PCI_EXP_SLTCTL_CCIE |
> > > > >   PCI_EXP_SLTCTL_PDCE |
> > > > >   PCI_EXP_SLTCTL_ABPE);
> > > > >  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
> > > > > -   PCI_EXP_SLTCTL_PIC_OFF |
> > > > > +   pic |
> > > > > PCI_EXP_SLTCTL_AIC_OFF);
> > > > >  
> > > > > +if (!slot_dev) {
> > > > > +pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
> > > > > +   PCI_EXP_SLTCTL_PCC);
> > > > 
> > > > I dislike it when we clear bits then set them back.
> > > > Please just add else here.
> > >

Re: [Qemu-devel] [PATCH 3/3] hw/pcie: better hotplug/hotunplug support

2014-06-22 Thread Marcel Apfelbaum
On Sun, 2014-06-22 at 14:12 +0300, Michael S. Tsirkin wrote:
> On Sun, Jun 22, 2014 at 02:11:05PM +0300, Marcel Apfelbaum wrote:
> > On Sun, 2014-06-22 at 14:03 +0300, Michael S. Tsirkin wrote:
> > > On Sun, Jun 22, 2014 at 01:54:06PM +0300, Marcel Apfelbaum wrote:
> > > > On Thu, 2014-06-19 at 17:43 +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Jun 19, 2014 at 04:52:21PM +0300, Marcel Apfelbaum wrote:
> > > > > > Hotplug triggers both 'present detect change' and
> > > > > > 'attention button pressed'.
> > > > > > 
> > > > > > Hotunplug starts by triggering 'attention button pressed',
> > > > > > then waits for the OS to power off the device and only
> > > > > > then detaches it.
> > > > > > 
> > > > > 
> > > > > Pls not here that current code is broken: it does surprise removal 
> > > > > which
> > > > > crashes guests.
> > > > I'll add a note, sure.
> > > > 
> > > > > 
> > > > > > Signed-off-by: Marcel Apfelbaum 
> > > > > > ---
> > > > > >  hw/pci/pcie.c | 20 +++-
> > > > > >  1 file changed, 15 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > index f8bf515..9cfd93d 100644
> > > > > > --- a/hw/pci/pcie.c
> > > > > > +++ b/hw/pci/pcie.c
> > > > > > @@ -258,7 +258,8 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler 
> > > > > > *hotplug_dev, DeviceState *dev,
> > > > > >  
> > > > > >  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > > > PCI_EXP_SLTSTA_PDS);
> > > > > > -pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), 
> > > > > > PCI_EXP_HP_EV_PDC);
> > > > > > +pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> > > > > > +PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> > > > > >  }
> > > > > >  
> > > > > >  void pcie_cap_slot_hot_unplug_cb(HotplugHandler *hotplug_dev, 
> > > > > > DeviceState *dev,
> > > > > > @@ -268,10 +269,7 @@ void 
> > > > > > pcie_cap_slot_hot_unplug_cb(HotplugHandler *hotplug_dev, 
> > > > > > DeviceState *dev,
> > > > > >  
> > > > > >  pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, 
> > > > > > &exp_cap, errp);
> > > > > >  
> > > > > > -object_unparent(OBJECT(dev));
> > > > > > -pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > > > - PCI_EXP_SLTSTA_PDS);
> > > > > > -pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), 
> > > > > > PCI_EXP_HP_EV_PDC);
> > > > > > +pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
> > > > > >  }
> > > > > >  
> > > > > >  /* pci express slot for pci express root/downstream port
> > > > > > @@ -389,6 +387,18 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
> > > > > >  sltsta);
> > > > > >  }
> > > > > >  
> > > > > 
> > > > > pls add code comments explaining the logic here.
> > > > I thought is clear :(
> > > > Basically: If the device is present, power indicator off and power
> > > > controller off, it is safe to detach the device.
> > > 
> > > Sure, put this in the comment.
> > > 
> > > > > 
> > > > > > +if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & 
> > > > > > PCI_EXP_SLTCTL_PCC) &&
> > > > > > +((val & PCI_EXP_SLTCTL_PIC_OFF) == 
> > > > > > PCI_EXP_SLTCTL_PIC_OFF)) {
> > > > > > +PCIDevice *slot_dev = 
> > > > > > pci_bridge_get_sec_bus(PCI_BRIDGE(dev))->devices[0];
> > > > > > +if (slot_dev) {
> > > > > 
> > > > > Here you want to remove all devices behind the bridge?
> > > > Yes, but since it is PCIe we only have one device,
> > > > but we may have a multi-function device...
> > > 
> > > Exactly. Up to 256 functions.
> > > 
> > > > > You need to do this for all functions, not just function 0.
> > > > So bus devices are actually functions... OK, I'll run a loop here.
> > > > 
> > > > > 
> > > > > > +object_unparent(OBJECT(slot_dev));
> > > > > > +pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > > > + PCI_EXP_SLTSTA_PDS);
> > > > > > +pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > > > > > +   PCI_EXP_SLTSTA_PDC);
> > > > > 
> > > > > These bits need to be cleared in any case?
> > > > No, "PDS on" means the devices is present, so we clear it only after the
> > > > OS  powers it off.
> > > 
> > > So why not clear PDS unconditionally?
> > Because we are 'allowed' to remove the device only when the OS tells us
> > the power is off. Before that, it can result in data loss,
> > since the device driver is not yet unloaded.
> 
> Yes. But if (slot_dev) seems not to be needed:
> if slot is empty we can clear PDC.
You mean PDS, PDC is the event we send to the OS.
The main reason I added this "if clause" is to protect against multiple OS
events of 'power controller off' and 'power indicator off', but now I see
that indeed I don't need that because I check "sltsta & PCI_EXP_SLTSTA_PDS".
I will remove that if.

Thanks,
Marcel


> 
> 
> 
> 

[Qemu-devel] [PATCH v2 0/4] trace: add simpletrace-stap format to generate binary trace

2014-06-22 Thread Stefan Hajnoczi
v2:
 * I realized that v1 was not complete enough after feedback from Lluís and
   Frank Ch. Eigler, so here is a v2 after all.
 * Add Makefile target for simpletrace .stp file [Lluís]
 * Generate SystemTap probe aliases so users can select a subset of probes to
   enable [Frank Ch. Eigler]
 * Delete --no-header from sys.argv[] so further command-line parsing works

SystemTap is a popular tracing solution on Fedora and RHEL.  It does not have
its own trace file format, instead stap scripts have the freedom to output data
in whatever format is most appropriate.

QEMU supports the simpletrace binary format for built-in tracing that comes
with QEMU.  Since we need a format for persisting SystemTap traces and a binary
format has performance advantages, let's output simpletrace-compatible data.

This patch adds a new stap file that is autogenerated from trace-events.  The
following example shows how to use SystemTap and analyze the resulting trace
file with simpletrace.py:

  $ ./configure --enable-trace-backend=dtrace ... && make
  $ sudo stap x86_64-softmmu/qemu-system-x86_64-simpletrace.stp >trace
  ...run QEMU in another shell...
  ^C
  $ scripts/simpletrace.py --no-header trace-events trace

One new concept here is the simpletrace.py --no-header option.  SystemTap
supports flight-recorder mode where trace records are written into a ring
buffer.  The ring buffer can be dumped at any time so it is useful to skip the
simpletrace header check.

Stefan Hajnoczi (4):
  trace: extract stap_escape() function for reuse
  trace: add tracetool simpletrace_stap format
  simpletrace: add simpletrace.py --no-header option
  trace: install simpletrace SystemTap tapset

 Makefile.target  | 10 +++-
 scripts/simpletrace.py   | 24 +++---
 scripts/tracetool/format/simpletrace_stap.py | 71 
 scripts/tracetool/format/stap.py | 11 +++--
 4 files changed, 105 insertions(+), 11 deletions(-)
 create mode 100644 scripts/tracetool/format/simpletrace_stap.py

-- 
1.9.3




[Qemu-devel] [PATCH v2 2/4] trace: add tracetool simpletrace_stap format

2014-06-22 Thread Stefan Hajnoczi
This new tracetool "format" generates a SystemTap .stp file that outputs
simpletrace binary trace data.

In contrast to simpletrace or ftrace, SystemTap does not define its own
trace format.  All output from SystemTap is generated by .stp files.
This patch lets us generate a .stp file that outputs in the simpletrace
binary format.

This makes it possible to reuse simpletrace.py to analyze traces
recorded using SystemTap.  The simpletrace binary format is especially
useful for long-running traces like flight-recorder mode where string
formatting can be expensive.

Signed-off-by: Stefan Hajnoczi 
---
 scripts/tracetool/format/simpletrace_stap.py | 71 
 1 file changed, 71 insertions(+)
 create mode 100644 scripts/tracetool/format/simpletrace_stap.py

diff --git a/scripts/tracetool/format/simpletrace_stap.py 
b/scripts/tracetool/format/simpletrace_stap.py
new file mode 100644
index 000..7e44bc1
--- /dev/null
+++ b/scripts/tracetool/format/simpletrace_stap.py
@@ -0,0 +1,71 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+Generate .stp file that outputs simpletrace binary traces (DTrace with 
SystemTAP only).
+"""
+
+__author__ = "Stefan Hajnoczi "
+__copyright__  = "Copyright (C) 2014, Red Hat, Inc."
+__license__= "GPL version 2 or (at your option) any later version"
+
+__maintainer__ = "Stefan Hajnoczi"
+__email__  = "stefa...@redhat.com"
+
+
+from tracetool import out
+from tracetool.backend.dtrace import binary, probeprefix
+from tracetool.backend.simple import is_string
+from tracetool.format.stap import stap_escape
+
+
+def generate(events, backend):
+out('/* This file is autogenerated by tracetool, do not edit. */',
+'')
+
+for event_id, e in enumerate(events):
+if 'disable' in e.properties:
+continue
+
+out('probe %(probeprefix)s.simpletrace.%(name)s = 
%(probeprefix)s.%(name)s ?',
+'{',
+probeprefix=probeprefix(),
+name=e.name)
+
+# Calculate record size
+sizes = ['24'] # sizeof(TraceRecord)
+for type_, name in e.args:
+name = stap_escape(name)
+if is_string(type_):
+out('try {',
+'arg%(name)s_str = %(name)s ? 
user_string_n(%(name)s, 512) : ""',
+'} catch {}',
+'arg%(name)s_len = strlen(arg%(name)s_str)',
+name=name)
+sizes.append('4 + arg%s_len' % name)
+else:
+sizes.append('8')
+sizestr = ' + '.join(sizes)
+
+# Generate format string and value pairs for record header and 
arguments
+fields = [('8b', str(event_id)),
+  ('8b', 'gettimeofday_ns()'),
+  ('4b', sizestr),
+  ('4b', 'pid()')]
+for type_, name in e.args:
+name = stap_escape(name)
+if is_string(type_):
+fields.extend([('4b', 'arg%s_len' % name),
+   ('.*s', 'arg%s_len, arg%s_str' % (name, name))])
+else:
+fields.append(('8b', name))
+
+# Emit the entire record in a single SystemTap printf()
+fmt_str = '%'.join(fmt for fmt, _ in fields)
+arg_str = ', '.join(arg for _, arg in fields)
+out('printf("%%%(fmt_str)s", %(arg_str)s)',
+fmt_str=fmt_str, arg_str=arg_str)
+
+out('}')
+
+out()
-- 
1.9.3




[Qemu-devel] [PATCH v2 4/4] trace: install simpletrace SystemTap tapset

2014-06-22 Thread Stefan Hajnoczi
The simpletrace SystemTap tapset outputs simpletrace binary traces for
SystemTap probes.  This is useful because SystemTap has no default way
to format or store traces.  The simpletrace SystemTap tapset provides an
easy way to store traces.

The simpletrace.py tool or custom Python scripts using the
simpletrace.py API can analyze SystemTap these traces:

  $ ./configure --enable-trace-backends=dtrace ...
  $ make && make install
  $ stap -e 'probe qemu.system.x86_64.simpletrace.* {}' \
 -c qemu-system-x86_64 >/tmp/trace.out
  $ scripts/simpletrace.py --no-header trace-events /tmp/trace.out
  g_malloc 4.531 pid=15519 size=0xb ptr=0x7f8639c10470
  g_malloc 3.264 pid=15519 size=0x300 ptr=0x7f8639c10490
  g_free 5.155 pid=15519 ptr=0x7f8639c0f7b0

Note that, unlike qemu-system-x86_64.stp and
qemu-system-x86_64.stp-installed, only one file is needed since the
simpletrace SystemTap tapset does not reference the QEMU binary by path.
Therefore it doesn't matter whether the QEMU binary is installed or not.

Signed-off-by: Stefan Hajnoczi 
---
 Makefile.target | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Makefile.target b/Makefile.target
index fc5827c..ec7f1b8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -38,7 +38,7 @@ config-target.h: config-target.h-timestamp
 config-target.h-timestamp: config-target.mak
 
 ifdef CONFIG_TRACE_SYSTEMTAP
-stap: $(QEMU_PROG).stp-installed $(QEMU_PROG).stp
+stap: $(QEMU_PROG).stp-installed $(QEMU_PROG).stp $(QEMU_PROG)-simpletrace.stp
 
 ifdef CONFIG_USER_ONLY
 TARGET_TYPE=user
@@ -64,6 +64,13 @@ $(QEMU_PROG).stp: $(SRC_PATH)/trace-events
--target-type=$(TARGET_TYPE) \
< $< > $@,"  GEN   $(TARGET_DIR)$(QEMU_PROG).stp")
 
+$(QEMU_PROG)-simpletrace.stp: $(SRC_PATH)/trace-events
+   $(call quiet-command,$(TRACETOOL) \
+   --format=simpletrace-stap \
+   --backends=$(TRACE_BACKENDS) \
+   --probe-prefix=qemu.$(TARGET_TYPE).$(TARGET_NAME) \
+   < $< > $@,"  GEN   $(TARGET_DIR)$(QEMU_PROG)-simpletrace.stp")
+
 else
 stap:
 endif
@@ -198,6 +205,7 @@ endif
 ifdef CONFIG_TRACE_SYSTEMTAP
$(INSTALL_DIR) "$(DESTDIR)$(qemu_datadir)/../systemtap/tapset"
$(INSTALL_DATA) $(QEMU_PROG).stp-installed 
"$(DESTDIR)$(qemu_datadir)/../systemtap/tapset/$(QEMU_PROG).stp"
+   $(INSTALL_DATA) $(QEMU_PROG)-simpletrace.stp 
"$(DESTDIR)$(qemu_datadir)/../systemtap/tapset/$(QEMU_PROG)-simpletrace.stp"
 endif
 
 GENERATED_HEADERS += config-target.h
-- 
1.9.3




[Qemu-devel] [PATCH v2 1/4] trace: extract stap_escape() function for reuse

2014-06-22 Thread Stefan Hajnoczi
SystemTap reserved words sometimes conflict with QEMU variable names.
We escape them to prevent conflicts.

Move escaping into its own function so the next patch can reuse it.

Signed-off-by: Stefan Hajnoczi 
---
 scripts/tracetool/format/stap.py | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/scripts/tracetool/format/stap.py b/scripts/tracetool/format/stap.py
index e24abf7..9e780f1 100644
--- a/scripts/tracetool/format/stap.py
+++ b/scripts/tracetool/format/stap.py
@@ -27,6 +27,13 @@ RESERVED_WORDS = (
 )
 
 
+def stap_escape(identifier):
+# Append underscore to reserved keywords
+if identifier in RESERVED_WORDS:
+return identifier + '_'
+return identifier
+
+
 def generate(events, backend):
 events = [e for e in events
   if "disable" not in e.properties]
@@ -45,9 +52,7 @@ def generate(events, backend):
 i = 1
 if len(e.args) > 0:
 for name in e.args.names():
-# Append underscore to reserved keywords
-if name in RESERVED_WORDS:
-name += '_'
+name = stap_escape(name)
 out('  %s = $arg%d;' % (name, i))
 i += 1
 
-- 
1.9.3




[Qemu-devel] [PATCH v2 3/4] simpletrace: add simpletrace.py --no-header option

2014-06-22 Thread Stefan Hajnoczi
It can be useful to read simpletrace files that have no header.  For
example, a ring buffer may not have a header record but can still be
processed if the user is sure the file format version is compatible.

  $ scripts/simpletrace.py --no-header trace-events trace-file

Signed-off-by: Stefan Hajnoczi 
---
 scripts/simpletrace.py | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/scripts/simpletrace.py b/scripts/simpletrace.py
index 1aa9460..3916c6d 100755
--- a/scripts/simpletrace.py
+++ b/scripts/simpletrace.py
@@ -58,8 +58,8 @@ def read_record(edict, fobj):
 rechdr = read_header(fobj, rec_header_fmt)
 return get_record(edict, rechdr, fobj) # return tuple of record elements
 
-def read_trace_file(edict, fobj):
-"""Deserialize trace records from a file, yielding record tuples 
(event_num, timestamp, pid, arg1, ..., arg6)."""
+def read_trace_header(fobj):
+"""Read and verify trace file header"""
 header = read_header(fobj, log_header_fmt)
 if header is None or \
header[0] != header_event_id or \
@@ -73,6 +73,8 @@ def read_trace_file(edict, fobj):
 raise ValueError('Log format %d not supported with this QEMU release!'
  % log_version)
 
+def read_trace_records(edict, fobj):
+"""Deserialize trace records from a file, yielding record tuples 
(event_num, timestamp, pid, arg1, ..., arg6)."""
 while True:
 rec = read_record(edict, fobj)
 if rec is None:
@@ -102,13 +104,16 @@ class Analyzer(object):
 """Called at the end of the trace."""
 pass
 
-def process(events, log, analyzer):
+def process(events, log, analyzer, read_header=True):
 """Invoke an analyzer on each event in a log."""
 if isinstance(events, str):
 events = _read_events(open(events, 'r'))
 if isinstance(log, str):
 log = open(log, 'rb')
 
+if read_header:
+read_trace_header(log)
+
 dropped_event = Event.build("Dropped_Event(uint64_t num_events_dropped)")
 edict = {dropped_event_id: dropped_event}
 
@@ -137,7 +142,7 @@ def process(events, log, analyzer):
 
 analyzer.begin()
 fn_cache = {}
-for rec in read_trace_file(edict, log):
+for rec in read_trace_records(edict, log):
 event_num = rec[0]
 event = edict[event_num]
 if event_num not in fn_cache:
@@ -152,12 +157,17 @@ def run(analyzer):
 advanced scripts will want to call process() instead."""
 import sys
 
-if len(sys.argv) != 3:
-sys.stderr.write('usage: %s  \n' % 
sys.argv[0])
+read_header = True
+if len(sys.argv) == 4 and sys.argv[1] == '--no-header':
+read_header = False
+del sys.argv[1]
+elif len(sys.argv) != 3:
+sys.stderr.write('usage: %s [--no-header]  ' \
+ '\n' % sys.argv[0])
 sys.exit(1)
 
 events = _read_events(open(sys.argv[1], 'r'))
-process(events, sys.argv[2], analyzer)
+process(events, sys.argv[2], analyzer, read_header=read_header)
 
 if __name__ == '__main__':
 class Formatter(Analyzer):
-- 
1.9.3




Re: [Qemu-devel] [PATCH] libqtest: escape strings in QMP commands, fix leak

2014-06-22 Thread Stefan Hajnoczi
On Fri, Jun 13, 2014 at 10:15:00AM +0200, Paolo Bonzini wrote:
> libqtest is using g_strdup_printf to format QMP commands, but
> this does not work if the argument strings need to be escaped.
> Instead, use the fancy %-formatting functionality of QObject.
> The only change required in tests is that strings have to be
> formatted as %s, not '%s' or \"%s\".  Luckily this usage of
> parameterized QMP commands is not that frequent.
> 
> The leak is in socket_sendf.  Since we are extracting the send
> loop to a new function, fix it now.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  tests/fdc-test.c|  2 +-
>  tests/libqtest.c| 47 +--
>  tests/qom-test.c|  6 +++---
>  tests/tmp105-test.c |  4 ++--
>  4 files changed, 43 insertions(+), 16 deletions(-)

Reviewed-by: Stefan Hajnoczi 


pgprMm1bfFc5U.pgp
Description: PGP signature


Re: [Qemu-devel] PATCH for bugs 661696 and 1248376: target-i386: x87 exception pointers using TCG.

2014-06-22 Thread Richard Henderson
On 06/22/2014 07:55 AM, Jaume Martí wrote:
> -cpu_x86_fsave(env, fpstate_addr, 1);
> -fpstate->status = fpstate->sw;
> -magic = 0x;
> +cpu_x86_fsave(env, fpstate_addr);
> +fpstate->status = fpstate->sw;
> +magic = 0x;

This patch needs to be split into format fixes and the actual change to be
reviewed.

> -/* KVM-only so far */
> -uint16_t fpop;
> +union {
> +uint32_t tcg;
> +uint16_t kvm;
> +} fpop;

This is highly questionable.

>  .fields = (VMStateField[]) {
> -VMSTATE_UINT16(env.fpop, X86CPU),
> +VMSTATE_UINT16(env.fpop.kvm, X86CPU),

You're breaking save/restore in tcg.  KVM is not required for migration.

> +if (non_control_x87_instr(modrm, b)) {
> +tcg_gen_movi_i32(cpu_fpop, ((b & 0x7) << 8) | (modrm & 0xff));
> +tcg_gen_movi_tl(cpu_fpip, pc_start - s->cs_base);
> +tcg_gen_movi_i32(cpu_fpcs, env->segs[R_CS].selector);
> +}

I strongly suspect you can implement this feature without having to add 3
(largely redundant) register writes to every x87 instruction executed.

See how restore_state_to_opc works to compute the value of CC_OP during
translation.  You can do the same thing to recover these three values.

You do have to sync these values before normal exits from the TB, but you only
have to do that once, not once for every insn executed.  See gen_update_cc_op.


r~



Re: [Qemu-devel] PATCH for bugs 661696 and 1248376: target-i386: x87 exception pointers using TCG.

2014-06-22 Thread Jaume Martí
Thanks Richard for your feedback. I am going to correct the patch and
resubmit it.

Best regards,
Jaume

On Sun, Jun 22, 2014 at 8:55 PM, Richard Henderson  wrote:
> On 06/22/2014 07:55 AM, Jaume Martí wrote:
>> -cpu_x86_fsave(env, fpstate_addr, 1);
>> -fpstate->status = fpstate->sw;
>> -magic = 0x;
>> +cpu_x86_fsave(env, fpstate_addr);
>> +fpstate->status = fpstate->sw;
>> +magic = 0x;
>
> This patch needs to be split into format fixes and the actual change to be
> reviewed.
>
>> -/* KVM-only so far */
>> -uint16_t fpop;
>> +union {
>> +uint32_t tcg;
>> +uint16_t kvm;
>> +} fpop;
>
> This is highly questionable.
>
>>  .fields = (VMStateField[]) {
>> -VMSTATE_UINT16(env.fpop, X86CPU),
>> +VMSTATE_UINT16(env.fpop.kvm, X86CPU),
>
> You're breaking save/restore in tcg.  KVM is not required for migration.
>
>> +if (non_control_x87_instr(modrm, b)) {
>> +tcg_gen_movi_i32(cpu_fpop, ((b & 0x7) << 8) | (modrm & 0xff));
>> +tcg_gen_movi_tl(cpu_fpip, pc_start - s->cs_base);
>> +tcg_gen_movi_i32(cpu_fpcs, env->segs[R_CS].selector);
>> +}
>
> I strongly suspect you can implement this feature without having to add 3
> (largely redundant) register writes to every x87 instruction executed.
>
> See how restore_state_to_opc works to compute the value of CC_OP during
> translation.  You can do the same thing to recover these three values.
>
> You do have to sync these values before normal exits from the TB, but you only
> have to do that once, not once for every insn executed.  See gen_update_cc_op.
>
>
> r~



Re: [Qemu-devel] Endian control register

2014-06-22 Thread Benjamin Herrenschmidt
On Mon, 2014-06-16 at 20:30 +1000, Benjamin Herrenschmidt wrote:
> Hi !
> 
> So while trying to solve an issue we have with qemu/kvm and powerpc who
> can be both LE and BE nowadays, we thought the ideal solution would be
> to have a register in the emulated VGA to control the endian.

No reply ... anybody out there ? :-) (Responding with a different
email address as I've had issues with gmail over eager spam filtering
lately).

So basically, I want to define VBE register 0xb as "endian control"
which may or may not be supported by a given platform. The problem
is how to expose to the OS that it exists (or doesn't) ?

Because if we just add it to qemu and make Linux start poking at
it (including on x86), then things will break when running Linux
under Bochs as it will hit a BX_PANIC when we access a VBE register
it doesn't know about.

My proposal is thus to add a backward compatible way to check for
that support before accessing it (and of course define the register
number in bochs so that this same number doesn't get reused later
for other purposes).

One simple option that will probably work is to add stuff to the
existing "GET_CAPABILITIES" mode. When that is selected, we could
for example make the "BANK" register return additional capability
flags, one of them being "endian support".

This gives us a way to add more extensions later on if needed as well.

Another option would be to use an unused bit if the ENABLE register
and simply force it to 1 on reads when the endian control register
is present... though in this case it might be a better idea to use
that bit to indicate the presence of "extended flags", use a new
register 0xb as "EXTENDED_FLAGS", and use a flag in there as indicative
of the presence of endian control which becomes register 0xc.

That way you also have room for more extensions if you wish to do
so in the future.

Any opinion ?

I'm happy to make & submit the patches, I just want an agreement
on the approach first.

Cheers,
Ben.

> Since qemu wishes to remain as much as possible in sync / compatible
> with Bochs, I'm posting to this list so that the definition can be
> made common even if Bochs doesn't have to actually implement it.
> 
> Now the basic idea is to have a bit indicating that endian control is
> supported and a bit to control the guest endianness, set by the guest
> appropriately (which affects all the pixel formats).
> 
> My original idea was to add a VBE register. However BOCHS seems to
> be unhappy (after a quick glance at the source) if the guest tries to
> read a non-existing register, so I suppose either the "support endian
> switch" information needs to be encoded in an existing register in a way
> that doesn't break existing code ... or the presence of the new register
> advertised in such a way, for example via the PCI revision ID.
> 
> Note that I am not trying to support something like endian apertures and
> that whole trainwreck that we had back in the day on BE machines. Simply
> carry the guest endianness assuming that this affects all the pixel
> formats which continue to be the classic ARGB ones, simply encoded with
> the right endian (so 8bpp is unaffected for example).
> 
> Any suggestion ? Comment ? Flame ? :-)
> 
> Cheers,
> Ben.
> 





Re: [Qemu-devel] Endian control register

2014-06-22 Thread Benjamin Herrenschmidt

On Mon, 2014-06-16 at 20:30 +1000, Benjamin Herrenschmidt wrote:
> Hi !
> 
> So while trying to solve an issue we have with qemu/kvm and powerpc who
> can be both LE and BE nowadays, we thought the ideal solution would be
> to have a register in the emulated VGA to control the endian.

No reply ... anybody out there ? :-)

So basically, I want to define VBE register 0xb as "endian control"
which may or may not be supported by a given platform. The problem
is how to expose to the OS that it exists (or doesn't) ?

Because if we just add it to qemu and make Linux start poking at
it (including on x86), then things will break when running Linux
under Bochs as it will hit a BX_PANIC when we access a VBE register
it doesn't know about.

My proposal is thus to add a backward compatible way to check for
that support before accessing it (and of course define the register
number in bochs so that this same number doesn't get reused later
for other purposes).

One simple option that will probably work is to add stuff to the
existing "GET_CAPABILITIES" mode. When that is selected, we could
for example make the "BANK" register return additional capability
flags, one of them being "endian support".

This gives us a way to add more extensions later on if needed as well.

Another option would be to use an unused bit if the ENABLE register
and simply force it to 1 on reads when the endian control register
is present... though in this case it might be a better idea to use
that bit to indicate the presence of "extended flags", use a new
register 0xb as "EXTENDED_FLAGS", and use a flag in there as indicative
of the presence of endian control which becomes register 0xc.

That way you also have room for more extensions if you wish to do
so in the future.

Any opinion ?

I'm happy to make & submit the patches, I just want an agreement
on the approach first.

Cheers,
Ben.

> Since qemu wishes to remain as much as possible in sync / compatible
> with Bochs, I'm posting to this list so that the definition can be
> made common even if Bochs doesn't have to actually implement it.
> 
> Now the basic idea is to have a bit indicating that endian control is
> supported and a bit to control the guest endianness, set by the guest
> appropriately (which affects all the pixel formats).
> 
> My original idea was to add a VBE register. However BOCHS seems to
> be unhappy (after a quick glance at the source) if the guest tries to
> read a non-existing register, so I suppose either the "support endian
> switch" information needs to be encoded in an existing register in a way
> that doesn't break existing code ... or the presence of the new register
> advertised in such a way, for example via the PCI revision ID.
> 
> Note that I am not trying to support something like endian apertures and
> that whole trainwreck that we had back in the day on BE machines. Simply
> carry the guest endianness assuming that this affects all the pixel
> formats which continue to be the classic ARGB ones, simply encoded with
> the right endian (so 8bpp is unaffected for example).
> 
> Any suggestion ? Comment ? Flame ? :-)
> 
> Cheers,
> Ben.
> 






Re: [Qemu-devel] [RFC] qemu VGA endian swap low level drawing changes

2014-06-22 Thread Benjamin Herrenschmidt
On Sun, 2014-06-22 at 12:10 +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2014-06-21 at 15:37 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2014-06-19 at 11:36 +0200, Gerd Hoffmann wrote:
> > > If not -- then I prefer to play save and not use a vbe register to avoid
> > > ending up with two incompatible bochs interface revisions.  If you don't
> > > like the pci config space option -- the mmio bar has plenty of free
> > > space left ;)
> > 
> > I'll find something :-)
> > 
> > Another advantage I realize in going down that path is that's yet another
> > use of target endian removed from hw/ which gets us a little step closer
> > to having to make the whole hw/ stuff target neutral.
> 
> Ok, I hit a (small) problem ...

BTW, here's my current work:

https://github.com/ozbenh/qemu/commits/vga-work

Cheers,
Ben.





Re: [Qemu-devel] [PATCH v1] trace: add qemu_system_powerdown_request and qemu_system_shutdown_request trace events

2014-06-22 Thread yangzy.f...@cn.fujitsu.com
Ping...

> -Original Message-
> From: Yang, Zhiyong/杨 志勇
> Sent: Wednesday, June 11, 2014 10:43 PM
> To: qemu-devel@nongnu.org
> Cc: Yang, Zhiyong/杨 志勇; peter.mayd...@linaro.org;
> mdr...@linux.vnet.ibm.com; jfor...@redhat.com
> Subject: [Qemu-devel] [PATCH v1] trace: add
> qemu_system_powerdown_request and
> qemu_system_shutdown_request trace events
> 
> We have the experience that the guest doesn't stop successfully though it
> was instructed to shut down.
> 
> The root cause may be not in QEMU mostly.  However, QEMU is often
> suspected at the beginning just because the issue occurred in virtualization
> environment.
> 
> Therefore, we need to affirm that QEMU received the shutdown request
> and raised ACPI irq from "virsh shutdown" command, virt-manger or
> stopping QEMU process to the VM .
> So that we can affirm the problems was belonged to the Guset OS rather
> than the QEMU itself.
> 
> When we stop guests by "virsh shutdown" command or virt-manger, or
> stopping QEMU process, qemu_system_powerdown_request() or
> qemu_system_shutdown_request() is called. Then the below functions in
> main_loop_should_exit() of Vl.c are called roughly in the following order.
> 
>   if (qemu_powerdown_requested())
>   qemu_system_powerdown()
>   monitor_protocol_event(QEVENT_POWERDOWN, NULL)
> 
>   OR
> 
>   if(qemu_shutdown_requested()}
>   monitor_protocol_event(QEVENT_SHUTDOWN, NULL);
> 
> The tracepoint of monitor_protocol_event() already exists, but no
> tracepoints are defined for qemu_system_powerdown_request() and
> qemu_system_shutdown_request(). So this patch adds two tracepoints
> for the two functions. We believe that it will become much easier to isolate
> the problem mentioned above by these tracepoints.
> 
> 
> Signed-off-by: Yang Zhiyong 
> 
> ---
>  trace-events |2 ++
>  vl.c |2 ++
>  2 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/trace-events b/trace-events index 2c5b307..d642cc4 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -511,6 +511,8 @@ g_malloc(size_t size, void *ptr) "size %zu ptr %p"
>  g_realloc(void *ptr, size_t size, void *newptr) "ptr %p size %zu
> newptr %p"
>  g_free(void *ptr) "ptr %p"
>  system_wakeup_request(int reason) "reason=%d"
> +qemu_system_shutdown_request(void) ""
> +qemu_system_powerdown_request(void) ""
> 
>  # block/qcow2.c
>  qcow2_writev_start_req(void *co, int64_t sector, int nb_sectors) "co %p
> sector %" PRIx64 " nb_sectors %d"
> diff --git a/vl.c b/vl.c
> index 709d8cd..aed0868 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1982,6 +1982,7 @@ void qemu_system_killed(int signal, pid_t pid)
> 
>  void qemu_system_shutdown_request(void)
>  {
> +trace_qemu_system_shutdown_request();
>  shutdown_requested = 1;
>  qemu_notify_event();
>  }
> @@ -1994,6 +1995,7 @@ static void qemu_system_powerdown(void)
> 
>  void qemu_system_powerdown_request(void)
>  {
> +trace_qemu_system_powerdown_request();
>  powerdown_requested = 1;
>  qemu_notify_event();
>  }
> --
> 1.7.1



[Qemu-devel] help

2014-06-22 Thread lc
hi,everyone
I want to copy a 3M picture from the guest to the host,which method i can use,i 
think channel is too slow,because i want sent 30 pieces per second.

thanks.

Re: [Qemu-devel] [RFC v2 1/2] arm: Add the cortex-a9 CPU to the a9mpcore device

2014-06-22 Thread Alistair Francis
Ping

On Tue, Jun 17, 2014 at 3:57 PM, Alistair Francis
 wrote:
> Add Stefan Hajnoczi as it relies on his work
> (http://lists.gnu.org/archive/html/qemu-devel/2014-05/msg06489.html)
>
> On Tue, Jun 17, 2014 at 2:18 PM, Alistair Francis
>  wrote:
>> This patch adds the Cortex-A9 ARM CPU to the A9MPCore.
>>
>> The CPU is only created if the num-cpu property is set.
>>
>> This patch relies on Stefan Hajnoczi's v3 'virtio-blk:
>> use alias properties in transport devices' patch. This is
>> used to pass the CPU properties through to the mpcore.
>>
>> This patch allows the midr and reset-cbar properties to be set
>>
>> Signed-off-by: Alistair Francis 
>> ---
>> V2: - Small fixes thanks to Peter Crosthwaite
>> - Properties are now passed through
>> - CPUs are no longer inited in the realize
>>   function, instead it uses a property listening function
>> - Thanks to Maydell, Crosthwaite and Andreas
>> V1: First Release
>>
>>  hw/cpu/a9mpcore.c |   62 
>> -
>>  include/hw/cpu/a9mpcore.h |3 ++
>>  2 files changed, 64 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/cpu/a9mpcore.c b/hw/cpu/a9mpcore.c
>> index c09358c..3d22b86 100644
>> --- a/hw/cpu/a9mpcore.c
>> +++ b/hw/cpu/a9mpcore.c
>> @@ -17,10 +17,54 @@ static void a9mp_priv_set_irq(void *opaque, int irq, int 
>> level)
>>  qemu_set_irq(qdev_get_gpio_in(DEVICE(&s->gic), irq), level);
>>  }
>>
>> +static void a9mpcore_init_cpus(Object *obj, Visitor *v,
>> +   void *opaque, const char *name,
>> +   Error **errp)
>> +{
>> +A9MPPrivState *s = A9MPCORE_PRIV(obj);
>> +ObjectClass *cpu_oc;
>> +Error *err = NULL;
>> +int i;
>> +int64_t value;
>> +
>> +visit_type_int(v, &value, name, &err);
>> +if (err) {
>> +error_propagate(errp, err);
>> +return;
>> +}
>> +s->num_cpu = value;
>> +
>> +s->cpu = g_new0(ARMCPU, s->num_cpu);
>> +cpu_oc = cpu_class_by_name(TYPE_ARM_CPU, "cortex-a9");
>> +
>> +for (i = 0; i < s->num_cpu; i++) {
>> +object_initialize(&s->cpu[i], sizeof(*s->cpu),
>> +  object_class_get_name(cpu_oc));
>> +
>> +object_property_add_alias(obj, "midr", OBJECT(&s->cpu[i]),
>> +  "midr", NULL);
>> +object_property_add_alias(obj, "reset-cbar", OBJECT(&s->cpu[i]),
>> +  "reset-cbar", NULL);
>> +/* This fails to pass through thte reset-cbar property
>> + * The qdev_alias_all_properties() also fails with multiple CPUs
>> + * Hence the code above is used instead
>> + * qdev_alias_all_properties(DEVICE(&s->cpu[i]), obj);
>> + */
>> +}
>> +}
>> +
>>  static void a9mp_priv_initfn(Object *obj)
>>  {
>>  A9MPPrivState *s = A9MPCORE_PRIV(obj);
>>
>> +/* Set up the CPU to be initiated */
>> +object_property_add(obj, "num-cpu", "int",
>> +NULL, a9mpcore_init_cpus,
>> +NULL, NULL, NULL);
>> +/* Use this as the default */
>> +s->cpu = NULL;
>> +s->num_cpu = 1;
>> +
>>  memory_region_init(&s->container, obj, "a9mp-priv-container", 0x2000);
>>  sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->container);
>>
>> @@ -50,6 +94,17 @@ static void a9mp_priv_realize(DeviceState *dev, Error 
>> **errp)
>>  Error *err = NULL;
>>  int i;
>>
>> +if (s->cpu) {
>> +for (i = 0; i < s->num_cpu; i++) {
>> +object_property_set_bool(OBJECT(&s->cpu[i]), true,
>> + "realized", &err);
>> +if (err) {
>> +error_propagate(errp, err);
>> +return;
>> +}
>> +}
>> +}
>> +
>>  scudev = DEVICE(&s->scu);
>>  qdev_prop_set_uint32(scudev, "num-cpu", s->num_cpu);
>>  object_property_set_bool(OBJECT(&s->scu), true, "realized", &err);
>> @@ -75,6 +130,12 @@ static void a9mp_priv_realize(DeviceState *dev, Error 
>> **errp)
>>  /* Pass through inbound GPIO lines to the GIC */
>>  qdev_init_gpio_in(dev, a9mp_priv_set_irq, s->num_irq - 32);
>>
>> +/* Connect the GIC to the first CPU */
>> +if (s->cpu) {
>> +sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0,
>> +   qdev_get_gpio_in(DEVICE(s->cpu), ARM_CPU_IRQ));
>> +}
>> +
>>  gtimerdev = DEVICE(&s->gtimer);
>>  qdev_prop_set_uint32(gtimerdev, "num-cpu", s->num_cpu);
>>  object_property_set_bool(OBJECT(&s->gtimer), true, "realized", &err);
>> @@ -144,7 +205,6 @@ static void a9mp_priv_realize(DeviceState *dev, Error 
>> **errp)
>>  }
>>
>>  static Property a9mp_priv_properties[] = {
>> -DEFINE_PROP_UINT32("num-cpu", A9MPPrivState, num_cpu, 1),
>>  /* The Cortex-A9MP may have anything from 0 to 224 external interrupt
>>   * IRQ lines (with another 32 internal). We default to 64+32, which
>>   * is the number provided by the Cortex-A9MP t

[Qemu-devel] [PATCH v1 3/3] sPAPR: Export RTAS property

2014-06-22 Thread Gavin Shan
The patch exports RTAS property "ibm,errinjct-tokens", which is
defined in PAPR spec and used to indicate various error types
we can inject.

Signed-off-by: Gavin Shan 
---
 hw/ppc/spapr.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a61af85..1d52229 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -443,6 +443,23 @@ static void spapr_create_drc_dt_entries(void *fdt)
 } while (0)
 
 
+static void add_errinjct_token(GString *s, uint32_t token, const gchar *desc)
+{
+g_string_append_len(s, desc, strlen(desc) + 1);
+g_string_append_len(s, (gchar *)&token, sizeof(token));
+}
+
+static void add_errinjct_token_prop(void *fdt)
+{
+GString *s = g_string_sized_new(256);
+
+add_errinjct_token(s, RTAS_ERRINJCT_TYPE_IOA_BUS_ERROR, "ioa-bus-error");
+add_errinjct_token(s, RTAS_ERRINJCT_TYPE_IOA_BUS_ERROR64, 
"ioa-bus-error-64");
+
+_FDT((fdt_property(fdt, "ibm,errinjct-tokens", s->str, s->len)));
+g_string_free(s, true);
+}
+
 static void *spapr_create_fdt_skel(hwaddr initrd_base,
hwaddr initrd_size,
hwaddr kernel_size,
@@ -664,6 +681,8 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
 _FDT((fdt_property(fdt, "qemu,hypertas-functions", qemu_hypertas_prop,
sizeof(qemu_hypertas_prop;
 
+add_errinjct_token_prop(fdt);
+
 _FDT((fdt_property(fdt, "ibm,associativity-reference-points",
 refpoints, sizeof(refpoints;
 
-- 
1.8.3.2




[Qemu-devel] [PATCH v1 0/3] Support PCI Error Injection

2014-06-22 Thread Gavin Shan
There is one guest userland utility "errinjct" used to inject various types
of errors for testing purpose. The utility works with 3 RTAS calls, which
are defined in PAPR spec as follows:

- "ibm,open-errinjct": Apply for token to do error injection
- "ibm,close-errinjct": Free the token assigned previously
- "ibm,errinjct": Do error injection

The series of patches support above RTAS calls in order to support error
injection. For now, we only support PCI related error injection (32-bits
and 64-bits PCI error types) and have to figure out the error injection
for other types in future.

It requires corresponding kernel changes as follows. Please comments, thanks!

http://patchwork.ozlabs.org/patch/362637/
http://patchwork.ozlabs.org/patch/362638/
http://patchwork.ozlabs.org/patch/362639/

Gavin Shan (3):
  sPAPR: Implement PCI error injection RTAS calls
  sPAPR: Implement sPAPRPHBClass::format_errinjct_cmd
  sPAPR: Export RTAS property 

 hw/ppc/spapr.c  |  19 +
 hw/ppc/spapr_pci_vfio.c |  19 +
 hw/ppc/spapr_rtas.c | 198 
 include/hw/pci-host/spapr.h |  11 +++
 include/hw/ppc/spapr.h  |  35 
 5 files changed, 282 insertions(+)

-- 
1.8.3.2




[Qemu-devel] [PATCH v1 2/3] sPAPR: Implement sPAPRPHBClass::format_errinjct_cmd

2014-06-22 Thread Gavin Shan
The patch implements sPAPRPHBClass::format_errinjct_cmd to do the
address translation (BUID+PE number to IOMMU group ID) and then
come up with the formatted string for PCI error injection.

Signed-off-by: Gavin Shan 
---
 hw/ppc/spapr_pci_vfio.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 0a1e902..1f9f0bf 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -227,6 +227,24 @@ static int spapr_phb_vfio_eeh_handler(sPAPRPHBState *sphb, 
int req, int opt)
 VFIO_EEH_PE_OP, &op);
 }
 
+static char *format_errinjct_cmd(sPAPRPHBState *sphb, sPAPRErrInjctIOA *ioa)
+{
+sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
+char *pstr = NULL;
+
+/* Invalid IOMMU group ID ? */
+if (svphb->iommugroupid == -1) {
+goto out;
+}
+
+/* Formatted string */
+pstr = g_strdup_printf("%x:%lx:%lx:%x:%x",
+   ioa->ei_token, ioa->addr, ioa->mask,
+   svphb->iommugroupid, ioa->func);
+out:
+return pstr;
+}
+
 static void spapr_phb_vfio_reset(DeviceState *qdev)
 {
 /* Do nothing */
@@ -241,6 +259,7 @@ static void spapr_phb_vfio_class_init(ObjectClass *klass, 
void *data)
 dc->reset = spapr_phb_vfio_reset;
 spc->finish_realize = spapr_phb_vfio_finish_realize;
 spc->eeh_handler = spapr_phb_vfio_eeh_handler;
+spc->format_errinjct_cmd = format_errinjct_cmd;
 }
 
 static const TypeInfo spapr_phb_vfio_info = {
-- 
1.8.3.2




[Qemu-devel] [PATCH v1 1/3] sPAPR: Implement PCI error injection RTAS calls

2014-06-22 Thread Gavin Shan
The patch implements PCI error injection RTAS calls for sPAPR
platform, which are defined PAPR spec as follows. Those RTAS
calls are expected to be invoked in strict sequence of
"ibm,open-errinjct", "ibm,errinjct", "ibm,close-errinjct".

- "ibm,open-errinjct": Apply for token to do error injection
- "ibm,close-errinjct": Free the token assigned previously
- "ibm,errinjct": Do error injection

The patch defines constants used by error injection RTAS calls
and adds callback sPAPRPHBClass::format_errinjct_cmd, which is
going to be used like this way:

- Error injection RTAS calls are received in spapr_rtas.c. If that's
  "ibm,open-errinjct" or "ibm,close-errinjct", we handle it there
  with the help of static counter "sPAPREnvironment::errinjct_token_cnt".
  If the  RTAS call is "ibm,errinjct", sanity check is done there and
  just returns failure if the error type isn't PCI related. Otherwise,
  the input parameters are figured out and route the request to
  sPAPRPHBClass::format_errinjct_cmd.
- sPAPRPHBClass::format_errinjct_cmd translates the address (BUID, PE
  number) to IOMMU group ID and then return the formated string back to
  spapr_rtas.c where the string is writen to firmware sysfs file
  "/sys/firmware/opal/errinjct" to do error injection.

Signed-off-by: Gavin Shan 
---
 hw/ppc/spapr_rtas.c | 198 
 include/hw/pci-host/spapr.h |  11 +++
 include/hw/ppc/spapr.h  |  35 
 3 files changed, 244 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index a7233e4..94fb866 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -32,6 +32,7 @@
 
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_vio.h"
+#include "hw/pci-host/spapr.h"
 
 #include 
 
@@ -268,6 +269,196 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
 rtas_st(rets, 0, ret);
 }
 
+static sPAPRErrInjct *rtas_find_errinjct_token(sPAPREnvironment *spapr,
+   uint32_t token_num)
+{
+sPAPRErrInjct *open_token;
+
+QLIST_FOREACH(open_token, &spapr->errinjct_open_tokens, list) {
+if (open_token->token == token_num) {
+return open_token;
+}
+}
+
+return NULL;
+}
+
+static void rtas_ibm_open_errinjct(PowerPCCPU *cpu,
+   sPAPREnvironment *spapr,
+   uint32_t token, uint32_t nargs,
+   target_ulong args, uint32_t nret,
+   target_ulong rets)
+{
+sPAPRErrInjct *open_token;
+int32_t ret = RTAS_OUT_HW_ERROR;
+
+/* Sanity check on number of arguments */
+if ((nargs != 0) || (nret != 2)) {
+goto out;
+}
+
+/* Allocate token */
+open_token = g_malloc(sizeof(*open_token));
+if (!open_token) {
+goto out;
+}
+open_token->token = spapr->errinjct_open_token_cnt++;
+QLIST_INSERT_HEAD(&spapr->errinjct_open_tokens, open_token, list);
+
+/* Success */
+rtas_st(rets, 0, open_token->token);
+ret = RTAS_OUT_SUCCESS;
+out:
+rtas_st(rets, 1, ret);
+}
+
+static char *rtas_do_errinjct_ioa(sPAPREnvironment *spapr,
+sPAPRErrInjctIOA *ioa)
+{
+sPAPRPHBState *sphb;
+sPAPRPHBClass *info;
+
+/* Sanity check on argument */
+if (!spapr || !ioa) {
+return NULL;
+}
+
+/* Invalid option ? */
+if (ioa->func > RTAS_ERRINJCT_IOA_DMA_WR_MEM_TARGET) {
+return NULL;
+}
+
+/* Find PHB */
+sphb = spapr_find_phb(spapr, ioa->buid);
+if (!sphb) {
+return NULL;
+}
+
+/* PHB doesn't support error injection ? */
+info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+if (!info->format_errinjct_cmd) {
+return NULL;
+}
+
+return info->format_errinjct_cmd(sphb, ioa);
+}
+
+static void rtas_ibm_errinjct(PowerPCCPU *cpu,
+  sPAPREnvironment *spapr,
+  uint32_t token, uint32_t nargs,
+  target_ulong args, uint32_t nret,
+  target_ulong rets)
+{
+sPAPRErrInjctIOA ioa;
+sPAPRErrInjct *open_token;
+uint32_t token_num, ei_token;
+target_ulong param_buf;
+char *pbuf;
+int fd;
+int32_t ret = RTAS_OUT_PARAM_ERROR;
+
+/* Sanity check on number of arguments */
+if ((nargs != 3) || (nret != 1)) {
+goto out;
+}
+
+/* Check if we have opened token */
+token_num = rtas_ld(args, 1);
+open_token = rtas_find_errinjct_token(spapr, token_num);
+if (!open_token) {
+goto out;
+}
+
+/* Argument buffer should be aligned with 1KB */
+param_buf = rtas_ld(args, 2);
+if (param_buf & 0x3ff) {
+goto out;
+}
+
+/* We only support IOA error for now */
+ei_token = rtas_ld(args, 0);
+switch (ei_token) {
+case RTAS_ERRINJCT_TYPE_IOA_BUS_ERROR:
+ioa.ei_token = RTAS_ERRINJCT_TYPE_IOA_BUS_ERROR;
+   

[Qemu-devel] extremely low IOPS performance of QCOW2 image format on an SSD RAID1

2014-06-22 Thread lihuiba
Hi, all


I'm using a qcow2 image stored on a SSD RAID1 (2 x intel S3500), and I'm 
benchmarking the
system using fio. Although the throughput in VM (with KVM and virtio enabled) 
is acceptable (67%
of thoughtput in host), the IOPS performance seems is extremely low  only 
2% of IOPS in host.


I was initially using qemu-1.1.2, and I also tried qemu-1.7.1 for comparison. 
There was no significant
difference.


In contrast, raw image and LVM perform very well. They usually achieve 90%+ of 
throughput and
60%+ of IOPS. So the problem must lie in the QCOW2 image format.


And I observed that, when I perform 4KB IOPS benchmark in VM with a QCOW2 
image, fio in VM reports
it is reading 9.x MB/s, while iostat in host reports the SSD is being read 150+ 
MB/s. So QEMU or QCOW2
must have amplified the amount of read by nearly 16 times.


So, how can I fix or tune the performance issue of qcow2?


Thanks!




PS:
1. qemu parameters:
-enable-kvm -cpu qemu64 -rtc base=utc,clock=host,driftfix=none -usb -device 
usb-tablet -nodefaults -nodefconfig -no-kvm-pit-reinjection -global 
kvm-pit.lost_tick_policy=discard -machine pc,accel=kvm -vga std -k en-us -smp 8 
-m 4096 -boot order=cdn -vnc :1 -drive 
file=$1,if=none,id=drive_0,cache=none,aio=native -device 
virtio-blk-pci,drive=drive_0,bus=pci.0,addr=0x5 -drive 
file=$2,if=none,id=drive_2,cache=none,aio=native -device 
virtio-blk-pci,drive=drive_2,bus=pci.0,addr=0x7


2. fio parameters for IOPS:
fio --filename=/dev/vdb --direct=1 --ioengine=libaio --iodepth 32 --thread 
--numjobs=1 --rw=randread --bs=4k --size=100% --runtime=60s --group_reporting 
--name=test


3. fio parameters for throughput:
fio --filename=/dev/vdb--direct=1 --ioengine=psync --thread --numjobs=3 
--rw=randread --bs=1024k --size=100% --runtime=60s --name=randread 
--group_reporting -name=test





Re: [Qemu-devel] extremely low IOPS performance of QCOW2 image format on an SSD RAID1

2014-06-22 Thread Fam Zheng
On Mon, 06/23 10:06, lihuiba wrote:
> Hi, all
> 
> 
> I'm using a qcow2 image stored on a SSD RAID1 (2 x intel S3500), and I'm 
> benchmarking the
> system using fio. Although the throughput in VM (with KVM and virtio enabled) 
> is acceptable (67%
> of thoughtput in host), the IOPS performance seems is extremely low  only 
> 2% of IOPS in host.
> 
> 
> I was initially using qemu-1.1.2, and I also tried qemu-1.7.1 for comparison. 
> There was no significant
> difference.
> 
> 
> In contrast, raw image and LVM perform very well. They usually achieve 90%+ 
> of throughput and
> 60%+ of IOPS. So the problem must lie in the QCOW2 image format.
> 
> 
> And I observed that, when I perform 4KB IOPS benchmark in VM with a QCOW2 
> image, fio in VM reports
> it is reading 9.x MB/s, while iostat in host reports the SSD is being read 
> 150+ MB/s. So QEMU or QCOW2
> must have amplified the amount of read by nearly 16 times.
> 
> 
> So, how can I fix or tune the performance issue of qcow2?

Did you prefill the image? Amplification could come from cluster allocation.

Fam

> 
> 
> Thanks!
> 
> 
> 
> 
> PS:
> 1. qemu parameters:
> -enable-kvm -cpu qemu64 -rtc base=utc,clock=host,driftfix=none -usb -device 
> usb-tablet -nodefaults -nodefconfig -no-kvm-pit-reinjection -global 
> kvm-pit.lost_tick_policy=discard -machine pc,accel=kvm -vga std -k en-us -smp 
> 8 -m 4096 -boot order=cdn -vnc :1 -drive 
> file=$1,if=none,id=drive_0,cache=none,aio=native -device 
> virtio-blk-pci,drive=drive_0,bus=pci.0,addr=0x5 -drive 
> file=$2,if=none,id=drive_2,cache=none,aio=native -device 
> virtio-blk-pci,drive=drive_2,bus=pci.0,addr=0x7
> 
> 
> 2. fio parameters for IOPS:
> fio --filename=/dev/vdb --direct=1 --ioengine=libaio --iodepth 32 --thread 
> --numjobs=1 --rw=randread --bs=4k --size=100% --runtime=60s --group_reporting 
> --name=test
> 
> 
> 3. fio parameters for throughput:
> fio --filename=/dev/vdb--direct=1 --ioengine=psync --thread --numjobs=3 
> --rw=randread --bs=1024k --size=100% --runtime=60s --name=randread 
> --group_reporting -name=test
> 
> 
> 



Re: [Qemu-devel] extremely low IOPS performance of QCOW2 image format on an SSD RAID1

2014-06-22 Thread lihuiba
>Did you prefill the image? Amplification could come from cluster allocation.
Yes! 
I forgot to mention that I created the qcow2 image with 
'preallocation=metadata', and I have allocated
the data blocks with dd in VM.


Creating image in host:
qemu-img create -f qcow2 -preallocation=metadata test.qcow2 100G


Allocating the blocks in VM:
dd if=/dev/zero of=/dev/vdb bs=1M
where vdb is the target image.







At 2014-06-23 11:01:20, "Fam Zheng"  wrote:
>On Mon, 06/23 10:06, lihuiba wrote:
>> Hi, all
>> 
>> 
>> I'm using a qcow2 image stored on a SSD RAID1 (2 x intel S3500), and I'm 
>> benchmarking the
>> system using fio. Although the throughput in VM (with KVM and virtio 
>> enabled) is acceptable (67%
>> of thoughtput in host), the IOPS performance seems is extremely low  
>> only 2% of IOPS in host.
>> 
>> 
>> I was initially using qemu-1.1.2, and I also tried qemu-1.7.1 for 
>> comparison. There was no significant
>> difference.
>> 
>> 
>> In contrast, raw image and LVM perform very well. They usually achieve 90%+ 
>> of throughput and
>> 60%+ of IOPS. So the problem must lie in the QCOW2 image format.
>> 
>> 
>> And I observed that, when I perform 4KB IOPS benchmark in VM with a QCOW2 
>> image, fio in VM reports
>> it is reading 9.x MB/s, while iostat in host reports the SSD is being read 
>> 150+ MB/s. So QEMU or QCOW2
>> must have amplified the amount of read by nearly 16 times.
>> 
>> 
>> So, how can I fix or tune the performance issue of qcow2?
>
>Did you prefill the image? Amplification could come from cluster allocation.
>
>Fam
>
>> 
>> 
>> Thanks!
>> 
>> 
>> 
>> 
>> PS:
>> 1. qemu parameters:
>> -enable-kvm -cpu qemu64 -rtc base=utc,clock=host,driftfix=none -usb -device 
>> usb-tablet -nodefaults -nodefconfig -no-kvm-pit-reinjection -global 
>> kvm-pit.lost_tick_policy=discard -machine pc,accel=kvm -vga std -k en-us 
>> -smp 8 -m 4096 -boot order=cdn -vnc :1 -drive 
>> file=$1,if=none,id=drive_0,cache=none,aio=native -device 
>> virtio-blk-pci,drive=drive_0,bus=pci.0,addr=0x5 -drive 
>> file=$2,if=none,id=drive_2,cache=none,aio=native -device 
>> virtio-blk-pci,drive=drive_2,bus=pci.0,addr=0x7
>> 
>> 
>> 2. fio parameters for IOPS:
>> fio --filename=/dev/vdb --direct=1 --ioengine=libaio --iodepth 32 --thread 
>> --numjobs=1 --rw=randread --bs=4k --size=100% --runtime=60s 
>> --group_reporting --name=test
>> 
>> 
>> 3. fio parameters for throughput:
>> fio --filename=/dev/vdb--direct=1 --ioengine=psync --thread --numjobs=3 
>> --rw=randread --bs=1024k --size=100% --runtime=60s --name=randread 
>> --group_reporting -name=test
>> 
>> 
>> 
>


Re: [Qemu-devel] help

2014-06-22 Thread Gonglei (Arei)
Hi,

Why not configure Nic  for VM to transfer those pictures ?




Best regards,
-Gonglei

From: qemu-devel-bounces+arei.gonglei=huawei@nongnu.org 
[mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On Behalf Of lc
Sent: Monday, June 23, 2014 9:32 AM
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] help

hi,everyone
I want to copy a 3M picture from the guest to the host,which method i can use,i 
think channel is too slow,because i want sent 30 pieces per second.
thanks.



Re: [Qemu-devel] extremely low IOPS performance of QCOW2 image format on an SSD RAID1

2014-06-22 Thread Fam Zheng
Cc'ing more qcow2 experts.

On Mon, 06/23 11:14, lihuiba wrote:
> >Did you prefill the image? Amplification could come from cluster allocation.
> Yes! 
> I forgot to mention that I created the qcow2 image with 
> 'preallocation=metadata', and I have allocated
> the data blocks with dd in VM.
> 
> 
> Creating image in host:
> qemu-img create -f qcow2 -preallocation=metadata test.qcow2 100G
> 
> 
> Allocating the blocks in VM:
> dd if=/dev/zero of=/dev/vdb bs=1M
> where vdb is the target image.
> 
> 
> 
> 
> 
> 
> 
> At 2014-06-23 11:01:20, "Fam Zheng"  wrote:
> >On Mon, 06/23 10:06, lihuiba wrote:
> >> Hi, all
> >> 
> >> 
> >> I'm using a qcow2 image stored on a SSD RAID1 (2 x intel S3500), and I'm 
> >> benchmarking the
> >> system using fio. Although the throughput in VM (with KVM and virtio 
> >> enabled) is acceptable (67%
> >> of thoughtput in host), the IOPS performance seems is extremely low  
> >> only 2% of IOPS in host.
> >> 
> >> 
> >> I was initially using qemu-1.1.2, and I also tried qemu-1.7.1 for 
> >> comparison. There was no significant
> >> difference.
> >> 
> >> 
> >> In contrast, raw image and LVM perform very well. They usually achieve 
> >> 90%+ of throughput and
> >> 60%+ of IOPS. So the problem must lie in the QCOW2 image format.
> >> 
> >> 
> >> And I observed that, when I perform 4KB IOPS benchmark in VM with a QCOW2 
> >> image, fio in VM reports
> >> it is reading 9.x MB/s, while iostat in host reports the SSD is being read 
> >> 150+ MB/s. So QEMU or QCOW2
> >> must have amplified the amount of read by nearly 16 times.
> >> 
> >> 
> >> So, how can I fix or tune the performance issue of qcow2?
> >
> >Did you prefill the image? Amplification could come from cluster allocation.
> >
> >Fam
> >
> >> 
> >> 
> >> Thanks!
> >> 
> >> 
> >> 
> >> 
> >> PS:
> >> 1. qemu parameters:
> >> -enable-kvm -cpu qemu64 -rtc base=utc,clock=host,driftfix=none -usb 
> >> -device usb-tablet -nodefaults -nodefconfig -no-kvm-pit-reinjection 
> >> -global kvm-pit.lost_tick_policy=discard -machine pc,accel=kvm -vga std -k 
> >> en-us -smp 8 -m 4096 -boot order=cdn -vnc :1 -drive 
> >> file=$1,if=none,id=drive_0,cache=none,aio=native -device 
> >> virtio-blk-pci,drive=drive_0,bus=pci.0,addr=0x5 -drive 
> >> file=$2,if=none,id=drive_2,cache=none,aio=native -device 
> >> virtio-blk-pci,drive=drive_2,bus=pci.0,addr=0x7
> >> 
> >> 
> >> 2. fio parameters for IOPS:
> >> fio --filename=/dev/vdb --direct=1 --ioengine=libaio --iodepth 32 --thread 
> >> --numjobs=1 --rw=randread --bs=4k --size=100% --runtime=60s 
> >> --group_reporting --name=test
> >> 
> >> 
> >> 3. fio parameters for throughput:
> >> fio --filename=/dev/vdb--direct=1 --ioengine=psync --thread --numjobs=3 
> >> --rw=randread --bs=1024k --size=100% --runtime=60s --name=randread 
> >> --group_reporting -name=test
> >> 
> >> 
> >> 
> >



Re: [Qemu-devel] Endian control register

2014-06-22 Thread Benjamin Herrenschmidt
On Mon, 2014-06-23 at 10:45 +1000, Benjamin Herrenschmidt wrote:
> 
> Another option would be to use an unused bit if the ENABLE register
> and simply force it to 1 on reads when the endian control register
> is present... though in this case it might be a better idea to use
> that bit to indicate the presence of "extended flags", use a new
> register 0xb as "EXTENDED_FLAGS", and use a flag in there as indicative
> of the presence of endian control which becomes register 0xc.
> 
> That way you also have room for more extensions if you wish to do
> so in the future.

I just remembered... we use 16-bit wide ports, we could just use the
top 8 bits of the enable register as "capabilities" ... 

Cheers,
Ben.





Re: [Qemu-devel] [Xen-devel] [RFC PATCH V3 2/2] qemu: support xen hvm direct kernel boot

2014-06-22 Thread Chun Yan Liu


>>> On 6/20/2014 at 08:08 PM, in message
, Stefano
Stabellini  wrote: 
> On Fri, 20 Jun 2014, Chunyan Liu wrote: 
> > qemu side patch to support xen HVM direct kernel boot: 
> > if -kernel exists, calls xen_load_linux(), which will read kernel/initrd 
> > and add a linuxboot.bin or multiboot.bin option rom. The 
> > linuxboot.bin/multiboot.bin will load kernel/initrd and jump to execute 
> > kernel directly. It's working when xen uses seabios. 
> >  
> > Signed-off-by: Chunyan Liu  
> > --- 
> >  hw/i386/pc.c   | 22 ++ 
> >  hw/i386/pc_piix.c  |  7 +++ 
> >  hw/i386/xen/xen_apic.c |  1 + 
> >  include/hw/i386/pc.h   |  5 + 
> >  4 files changed, 35 insertions(+) 
> >  
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c 
> > index 3e0ecf1..e005136 100644 
> > --- a/hw/i386/pc.c 
> > +++ b/hw/i386/pc.c 
> > @@ -1183,6 +1183,28 @@ void pc_acpi_init(const char *default_dsdt) 
> >  } 
> >  } 
> >   
> > +FWCfgState *xen_load_linux(const char *kernel_filename, 
> > +   const char *kernel_cmdline, 
> > +   const char *initrd_filename, 
> > +   ram_addr_t below_4g_mem_size, 
> > +   PcGuestInfo *guest_info) 
> > +{ 
> > +int i; 
> > +FWCfgState *fw_cfg; 
> > + 
> > +assert(kernel_filename != NULL); 
> > + 
> > +fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0); 
>  
> Is it actually OK to initialize just BIOS_CFG_IOPORT and avoid 
> everything else currently done in bochs_bios_init? 
> Would it make sense to simply call bochs_bios_init? 
>  
>  
> > +rom_set_fw(fw_cfg); 
> > + 
> > +load_linux(fw_cfg, kernel_filename, initrd_filename, kernel_cmdline,  
> below_4g_mem_size); 
> > +for (i = 0; i < nb_option_roms; i++) { 
> > +rom_add_option(option_rom[i].name, option_rom[i].bootindex); 
> > +} 
>  
> Wouldn't this have the unintended consequence of possibly loading other 
> option_roms into the guest memory? 

For xen, no.

Before this patch, indeed there is another option_rom "kvmvapic" in the list,
but since the option_rom list is never loaded in xen case, that won't harm. But
as expected, I think in xen case, it should not be in option_rom list at all, 
since
we won't expect to load it. In v1, I tried to check the option_rom name to
bypass it and load multiboot.bin/linuxboot.bin only.

In comments, Paolo Bonzini suggests a better way, that is to add following line
in xen_apic_realize():
+ s->vapic_control = 0;
In this way, "kvmvapic" won't be added to the option_rom list. I think that
should be the normal way for xen.

So, since V2, I updated in this way.
http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00704.html

>  
>  
> > +guest_info->fw_cfg = fw_cfg; 
> > +return fw_cfg; 
> > +} 
> > + 
> >  FWCfgState *pc_memory_init(MemoryRegion *system_memory, 
> > const char *kernel_filename, 
> > const char *kernel_cmdline, 
> > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c 
> > index a48e263..b737868 100644 
> > --- a/hw/i386/pc_piix.c 
> > +++ b/hw/i386/pc_piix.c 
> > @@ -158,6 +158,13 @@ static void pc_init1(MachineState *machine, 
> > machine->initrd_filename, 
> > below_4g_mem_size, above_4g_mem_size, 
> > rom_memory, &ram_memory, guest_info); 
> > +} else if (machine->kernel_filename != NULL) { 
> > +/* For xen HVM direct kernel boot, load linux here */ 
> > +fw_cfg = xen_load_linux(machine->kernel_filename, 
> > +machine->kernel_cmdline, 
> > +machine->initrd_filename, 
> > +below_4g_mem_size, 
> > +guest_info); 
> >  } 
> >   
> >  gsi_state = g_malloc0(sizeof(*gsi_state)); 
> > diff --git a/hw/i386/xen/xen_apic.c b/hw/i386/xen/xen_apic.c 
> > index 63bb7f7..f5acd6a 100644 
> > --- a/hw/i386/xen/xen_apic.c 
> > +++ b/hw/i386/xen/xen_apic.c 
> > @@ -40,6 +40,7 @@ static void xen_apic_realize(DeviceState *dev, Error  
> **errp) 
> >  { 
> >  APICCommonState *s = APIC_COMMON(dev); 
> >   
> > +s->vapic_control = 0; 
> >  memory_region_init_io(&s->io_memory, OBJECT(s), &xen_apic_io_ops, s, 
> >"xen-apic-msi", APIC_SPACE_SIZE); 
>  
> Why this change? It is not mentioned in the commit message. 
>  
>  
> > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h 
> > index ca7a0bd..171a597 100644 
> > --- a/include/hw/i386/pc.h 
> > +++ b/include/hw/i386/pc.h 
> > @@ -134,6 +134,11 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t  
> below_4g_mem_size, 
> >  void pc_pci_as_mapping_init(Object *owner, MemoryRegion *system_memory, 
> >  MemoryRegion *pci_address_space); 
> >   
> > +FWCfgState *xen_load_linux(const char *kernel_filename, 
> > + 

Re: [Qemu-devel] mem_pci devices

2014-06-22 Thread lc
I want to use the memory mapping methods, so I made a memory pci devices, but I 
know very little about Windows driver programming, someone has done such a 
thing。


static TypeInfo mem_pci_info = {
.name = "mem_pci",
.parent = TYPE_PCI_DEVICE,
.instance_size = sizeof(PCIMEMPCIState),
.class_init = mem_pci_class_init,
};






thinks  Gonglei.







At 2014-06-23 11:18:59, "Gonglei (Arei)"  wrote:


Hi,

 

Why not configure Nic  for VM to transfer those pictures ?

 

 

 

 

Best regards,

-Gonglei

 

From: qemu-devel-bounces+arei.gonglei=huawei@nongnu.org 
[mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On Behalf Of lc
Sent: Monday, June 23, 2014 9:32 AM
To:qemu-devel@nongnu.org
Subject: [Qemu-devel] help

 

hi,everyone
I want to copy a 3M picture from the guest to the host,which method i can use,i 
think channel is too slow,because i want sent 30 pieces per second.

thanks.

 

Re: [Qemu-devel] mem_pci devices

2014-06-22 Thread Gonglei (Arei)
Hi,

Ok, your used method  as the same as ivshmem,
so actually your problem is  “how to write driver for ivshmem in windows guests 
?”

but IMHO no one has realized this.




Best regards,
-Gonglei

From: lc [mailto:nighto...@163.com]
Sent: Monday, June 23, 2014 11:45 AM
To: Gonglei (Arei)
Cc: qemu-devel@nongnu.org
Subject: Re:RE: [Qemu-devel] mem_pci devices

I want to use the memory mapping methods, so I made a memory pci devices, but I 
know very little about Windows driver programming, someone has done such a 
thing。

static TypeInfo mem_pci_info = {
.name = "mem_pci",
.parent = TYPE_PCI_DEVICE,
.instance_size = sizeof(PCIMEMPCIState),
.class_init = mem_pci_class_init,
};



thinks  Gonglei.





At 2014-06-23 11:18:59, "Gonglei (Arei)"  wrote:

Hi,

Why not configure Nic  for VM to transfer those pictures ?




Best regards,
-Gonglei

From: 
qemu-devel-bounces+arei.gonglei=huawei@nongnu.org
 
[mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org]
 On Behalf Of lc
Sent: Monday, June 23, 2014 9:32 AM
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] help

hi,everyone
I want to copy a 3M picture from the guest to the host,which method i can use,i 
think channel is too slow,because i want sent 30 pieces per second.
thanks.




Re: [Qemu-devel] [PATCH] block: Make op blocker recursive

2014-06-22 Thread Fam Zheng
On Sat, 06/21 17:40, Benoît Canet wrote:
> The Saturday 21 Jun 2014 à 17:39:11 (+0200), Benoît Canet wrote :
> > We still have the issue of unlocking the bottom BDS when a subtree is 
> > detached
> > from the graphs by a swap. (It does happen in my drive-mirror arbitrary node
> > replacement series).
> > 
> > From my understanding the unlocking of the root BDS is done by 
> > drive_mirror_complete
> > while the mirror code tries to unref the orphaned subtree _before_ 
> > drive_mirror_complete
> > is called.
> 
> One fixe to my sentence:
> s/drive_mirror_complete/block_job_complete/
> 
> 
> > 
> > So the bottom BDS would be unrefed before being unlocked.

I don't see a problem with that, we can do the unlock before unref the node if
we want.

Fam



Re: [Qemu-devel] [PATCH] input: fix jumpy mouse cursor with USB mouse emulation

2014-06-22 Thread Benjamin Herrenschmidt
On Sat, 2014-06-14 at 20:19 +0100, Christian Burger wrote:
> Guest mouse pointer was jumpy, when moving host mouse in the vertical 
> direction (see bug #1327800).

Ah, I've just done a deep dive into qemu input code to debug that
one as well :-)

It's not just "jumpy", it goes the wrong way around too...

> Signed-off-by: Christian Burger 

Tested-by: Benjamin Herrenschmidt 

> ---
>  hw/input/hid.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/input/hid.c b/hw/input/hid.c
> index 295bdab..c58847e 100644
> --- a/hw/input/hid.c
> +++ b/hw/input/hid.c
> @@ -124,7 +124,7 @@ static void hid_pointer_event(DeviceState *dev, 
> QemuConsole *src,
>  if (evt->rel->axis == INPUT_AXIS_X) {
>  e->xdx += evt->rel->value;
>  } else if (evt->rel->axis == INPUT_AXIS_Y) {
> -e->ydy -= evt->rel->value;
> +e->ydy += evt->rel->value;
>  }
>  break;
>  
> @@ -191,7 +191,7 @@ static void hid_pointer_sync(DeviceState *dev)
>  if (hs->kind == HID_MOUSE) {
>  prev->xdx += curr->xdx;
>  curr->xdx = 0;
> -prev->ydy -= curr->ydy;
> +prev->ydy += curr->ydy;
>  curr->ydy = 0;
>  } else {
>  prev->xdx = curr->xdx;





Re: [Qemu-devel] [PATCH] block: Make op blocker recursive

2014-06-22 Thread Benoît Canet
The Monday 23 Jun 2014 à 12:32:30 (+0800), Fam Zheng wrote :
> On Sat, 06/21 17:40, Benoît Canet wrote:
> > The Saturday 21 Jun 2014 à 17:39:11 (+0200), Benoît Canet wrote :
> > > We still have the issue of unlocking the bottom BDS when a subtree is 
> > > detached
> > > from the graphs by a swap. (It does happen in my drive-mirror arbitrary 
> > > node
> > > replacement series).
> > > 
> > > From my understanding the unlocking of the root BDS is done by 
> > > drive_mirror_complete
> > > while the mirror code tries to unref the orphaned subtree _before_ 
> > > drive_mirror_complete
> > > is called.
> > 
> > One fixe to my sentence:
> > s/drive_mirror_complete/block_job_complete/
> > 
> > 
> > > 
> > > So the bottom BDS would be unrefed before being unlocked.
> 
> I don't see a problem with that, we can do the unlock before unref the node if
> we want.

My concern is that mirror.c does the unref and don't own the Blocker by itself.
The blocker is owned by the blockjob so it's difficult for mirror.c to do the 
unblock.

> 
> Fam



Re: [Qemu-devel] [PULL v2 059/106] libqemustub: add stubs to be able to use qemu-char.c

2014-06-22 Thread Riku Voipio
Hi,

On Wed, Jun 18, 2014 at 07:19:17PM +0300, Michael S. Tsirkin wrote:
> From: Nikolay Nikolaev 
> 
> chardev depends on lots of external symbols that are not necessarily
> needed to be able to use, for example, 'socket chardev'. So add stubs
> for these functions:
> 
>  - bdrv_commit_all
>  - qemu_chr_open_msmouse
>  - is_daemonized
>  - qemu_add_machine_init_done_notifier
>  - monitor_init
>  - qemu_notify_event
>  - vc_init

This broke user build of user tools only. With:

./configure --disable-tools --disable-docs --target-list=arm-linux-user
...
pixmannone
...
make
...
In file included from /data/home/nchip/linaro/qemu/include/ui/console.h:4:0,
 from /data/home/nchip/linaro/qemu/stubs/vc-init.c:2:
/data/home/nchip/linaro/qemu/include/ui/qemu-pixman.h:14:20: fatal error: 
pixman.h: No such file or directory
 #include 
^
compilation terminated.

I'm not familiar with this area of Qemu - perhaps the following would be a good 
fix:

-stub-obj-y += vc-init.o
+stub-obj-$(CONFIG_SOFTMMU) += vc-init.o

Riku

> 
> and this array:
> 
>  - serial_hds
> 
> Signed-off-by: Antonios Motakis 
> Signed-off-by: Nikolay Nikolaev 
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  stubs/bdrv-commit-all.c   | 7 +++
>  stubs/chr-msmouse.c   | 7 +++
>  stubs/get-next-serial.c   | 3 +++
>  stubs/is-daemonized.c | 7 +++
>  stubs/machine-init-done.c | 6 ++
>  stubs/monitor-init.c  | 6 ++
>  stubs/notify-event.c  | 6 ++
>  stubs/vc-init.c   | 7 +++
>  stubs/Makefile.objs   | 8 
>  9 files changed, 57 insertions(+)
>  create mode 100644 stubs/bdrv-commit-all.c
>  create mode 100644 stubs/chr-msmouse.c
>  create mode 100644 stubs/get-next-serial.c
>  create mode 100644 stubs/is-daemonized.c
>  create mode 100644 stubs/machine-init-done.c
>  create mode 100644 stubs/monitor-init.c
>  create mode 100644 stubs/notify-event.c
>  create mode 100644 stubs/vc-init.c
> 
> diff --git a/stubs/bdrv-commit-all.c b/stubs/bdrv-commit-all.c
> new file mode 100644
> index 000..a8e0a95
> --- /dev/null
> +++ b/stubs/bdrv-commit-all.c
> @@ -0,0 +1,7 @@
> +#include "qemu-common.h"
> +#include "block/block.h"
> +
> +int bdrv_commit_all(void)
> +{
> +return 0;
> +}
> diff --git a/stubs/chr-msmouse.c b/stubs/chr-msmouse.c
> new file mode 100644
> index 000..812f8b0
> --- /dev/null
> +++ b/stubs/chr-msmouse.c
> @@ -0,0 +1,7 @@
> +#include "qemu-common.h"
> +#include "sysemu/char.h"
> +
> +CharDriverState *qemu_chr_open_msmouse(void)
> +{
> +return 0;
> +}
> diff --git a/stubs/get-next-serial.c b/stubs/get-next-serial.c
> new file mode 100644
> index 000..40c56d1
> --- /dev/null
> +++ b/stubs/get-next-serial.c
> @@ -0,0 +1,3 @@
> +#include "qemu-common.h"
> +
> +CharDriverState *serial_hds[0];
> diff --git a/stubs/is-daemonized.c b/stubs/is-daemonized.c
> new file mode 100644
> index 000..16ce7c7
> --- /dev/null
> +++ b/stubs/is-daemonized.c
> @@ -0,0 +1,7 @@
> +#include "qemu-common.h"
> +#include "sysemu/os-posix.h"
> +
> +bool is_daemonized(void)
> +{
> +return true;
> +}
> diff --git a/stubs/machine-init-done.c b/stubs/machine-init-done.c
> new file mode 100644
> index 000..28a9255
> --- /dev/null
> +++ b/stubs/machine-init-done.c
> @@ -0,0 +1,6 @@
> +#include "qemu-common.h"
> +#include "sysemu/sysemu.h"
> +
> +void qemu_add_machine_init_done_notifier(Notifier *notify)
> +{
> +}
> diff --git a/stubs/monitor-init.c b/stubs/monitor-init.c
> new file mode 100644
> index 000..563902b
> --- /dev/null
> +++ b/stubs/monitor-init.c
> @@ -0,0 +1,6 @@
> +#include "qemu-common.h"
> +#include "monitor/monitor.h"
> +
> +void monitor_init(CharDriverState *chr, int flags)
> +{
> +}
> diff --git a/stubs/notify-event.c b/stubs/notify-event.c
> new file mode 100644
> index 000..32f7289
> --- /dev/null
> +++ b/stubs/notify-event.c
> @@ -0,0 +1,6 @@
> +#include "qemu-common.h"
> +#include "qemu/main-loop.h"
> +
> +void qemu_notify_event(void)
> +{
> +}
> diff --git a/stubs/vc-init.c b/stubs/vc-init.c
> new file mode 100644
> index 000..2af054f
> --- /dev/null
> +++ b/stubs/vc-init.c
> @@ -0,0 +1,7 @@
> +#include "qemu-common.h"
> +#include "ui/console.h"
> +
> +CharDriverState *vc_init(ChardevVC *vc)
> +{
> +return 0;
> +}
> diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
> index d99e2b9..5a0b917 100644
> --- a/stubs/Makefile.objs
> +++ b/stubs/Makefile.objs
> @@ -1,4 +1,6 @@
>  stub-obj-y += arch-query-cpu-def.o
> +stub-obj-y += bdrv-commit-all.o
> +stub-obj-y += chr-msmouse.o
>  stub-obj-y += clock-warp.o
>  stub-obj-y += cpu-get-clock.o
>  stub-obj-y += cpu-get-icount.o
> @@ -9,13 +11,18 @@ stub-obj-y += fdset-get-fd.o
>  stub-obj-y += fdset-remove-fd.o
>  stub-obj-y += gdbstub.o
>  stub-obj-y += get-fd.o
> +stub-obj-y += get-next-serial.o
>  stub-obj-y += get-vm-name.o
>  stub-obj-y += iothread-lock.o
> +stub-obj-y += is-daemonized.o
> +stub-obj-y += machine-init-done.o

Re: [Qemu-devel] extremely low IOPS performance of QCOW2 image format on an SSD RAID1

2014-06-22 Thread lihuiba
I think I have found the reason:
There's a cache in qemu that accelerates the transform of virtual LBA to 
cluster offset of qcow2 image.
The cache has a fixed size of 16x8192=128k in my configuration, which 
corresponds to a 8GB (128K*64KB)
mapping size. So when the "working set" of fio exceeds 8GB, the transform wil 
be degraded to reading
external table, and the performances goes extremely low.








At 2014-06-23 11:22:37, "Fam Zheng"  wrote:
>Cc'ing more qcow2 experts.
>
>On Mon, 06/23 11:14, lihuiba wrote:
>> >Did you prefill the image? Amplification could come from cluster allocation.
>> Yes! 
>> I forgot to mention that I created the qcow2 image with 
>> 'preallocation=metadata', and I have allocated
>> the data blocks with dd in VM.
>> 
>> 
>> Creating image in host:
>> qemu-img create -f qcow2 -preallocation=metadata test.qcow2 100G
>> 
>> 
>> Allocating the blocks in VM:
>> dd if=/dev/zero of=/dev/vdb bs=1M
>> where vdb is the target image.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> At 2014-06-23 11:01:20, "Fam Zheng"  wrote:
>> >On Mon, 06/23 10:06, lihuiba wrote:
>> >> Hi, all
>> >> 
>> >> 
>> >> I'm using a qcow2 image stored on a SSD RAID1 (2 x intel S3500), and I'm 
>> >> benchmarking the
>> >> system using fio. Although the throughput in VM (with KVM and virtio 
>> >> enabled) is acceptable (67%
>> >> of thoughtput in host), the IOPS performance seems is extremely low  
>> >> only 2% of IOPS in host.
>> >> 
>> >> 
>> >> I was initially using qemu-1.1.2, and I also tried qemu-1.7.1 for 
>> >> comparison. There was no significant
>> >> difference.
>> >> 
>> >> 
>> >> In contrast, raw image and LVM perform very well. They usually achieve 
>> >> 90%+ of throughput and
>> >> 60%+ of IOPS. So the problem must lie in the QCOW2 image format.
>> >> 
>> >> 
>> >> And I observed that, when I perform 4KB IOPS benchmark in VM with a QCOW2 
>> >> image, fio in VM reports
>> >> it is reading 9.x MB/s, while iostat in host reports the SSD is being 
>> >> read 150+ MB/s. So QEMU or QCOW2
>> >> must have amplified the amount of read by nearly 16 times.
>> >> 
>> >> 
>> >> So, how can I fix or tune the performance issue of qcow2?
>> >
>> >Did you prefill the image? Amplification could come from cluster allocation.
>> >
>> >Fam
>> >
>> >> 
>> >> 
>> >> Thanks!
>> >> 
>> >> 
>> >> 
>> >> 
>> >> PS:
>> >> 1. qemu parameters:
>> >> -enable-kvm -cpu qemu64 -rtc base=utc,clock=host,driftfix=none -usb 
>> >> -device usb-tablet -nodefaults -nodefconfig -no-kvm-pit-reinjection 
>> >> -global kvm-pit.lost_tick_policy=discard -machine pc,accel=kvm -vga std 
>> >> -k en-us -smp 8 -m 4096 -boot order=cdn -vnc :1 -drive 
>> >> file=$1,if=none,id=drive_0,cache=none,aio=native -device 
>> >> virtio-blk-pci,drive=drive_0,bus=pci.0,addr=0x5 -drive 
>> >> file=$2,if=none,id=drive_2,cache=none,aio=native -device 
>> >> virtio-blk-pci,drive=drive_2,bus=pci.0,addr=0x7
>> >> 
>> >> 
>> >> 2. fio parameters for IOPS:
>> >> fio --filename=/dev/vdb --direct=1 --ioengine=libaio --iodepth 32 
>> >> --thread --numjobs=1 --rw=randread --bs=4k --size=100% --runtime=60s 
>> >> --group_reporting --name=test
>> >> 
>> >> 
>> >> 3. fio parameters for throughput:
>> >> fio --filename=/dev/vdb--direct=1 --ioengine=psync --thread --numjobs=3 
>> >> --rw=randread --bs=1024k --size=100% --runtime=60s --name=randread 
>> >> --group_reporting -name=test
>> >> 
>> >> 
>> >> 
>> >
>