[Qemu-devel] [PATCH] qemu log: wrap long text

2016-07-03 Thread Mike Frysinger
The existing help output is a bit hard to read due to the ad-hoc wrapping.
This makes it a bit more programmatic (at least, it wraps it once, but
that should be good enough for now).

Signed-off-by: Mike Frysinger 
---
 util/log.c | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/util/log.c b/util/log.c
index 32e416051cdf..03d00768dbc2 100644
--- a/util/log.c
+++ b/util/log.c
@@ -239,7 +239,7 @@ const QEMULogItem qemu_log_items[] = {
 { CPU_LOG_TB_OP, "op",
   "show micro ops for each compiled TB" },
 { CPU_LOG_TB_OP_OPT, "op_opt",
-  "show micro ops (x86 only: before eflags optimization) and\n"
+  "show micro ops (x86 only: before eflags optimization) and "
   "after liveness analysis" },
 { CPU_LOG_INT, "int",
   "show interrupts/exceptions in short format" },
@@ -256,12 +256,12 @@ const QEMULogItem qemu_log_items[] = {
 { LOG_UNIMP, "unimp",
   "log unimplemented functionality" },
 { LOG_GUEST_ERROR, "guest_errors",
-  "log when the guest OS does something invalid (eg accessing a\n"
+  "log when the guest OS does something invalid (eg accessing a "
   "non-existent register)" },
 { CPU_LOG_PAGE, "page",
   "dump pages at beginning of user mode emulation" },
 { CPU_LOG_TB_NOCHAIN, "nochain",
-  "do not chain compiled TBs so that \"exec\" and \"cpu\" show\n"
+  "do not chain compiled TBs so that \"exec\" and \"cpu\" show "
   "complete traces" },
 { 0, NULL, NULL },
 };
@@ -318,12 +318,36 @@ int qemu_str_to_log_mask(const char *str)
 void qemu_print_log_usage(FILE *f)
 {
 const QEMULogItem *item;
+int name_len, help_len, disp_len, wrap_len = 80;
+char help[wrap_len + 1];
+
 fprintf(f, "Log items (comma separated):\n");
+
+name_len = 0;
+for (item = qemu_log_items; item->mask != 0; item++) {
+name_len = MAX(strlen(item->name), name_len);
+}
+#ifdef CONFIG_TRACE_LOG
+name_len = MAX(strlen("trace:PATTERN"), name_len);
+#endif
+help_len = wrap_len - name_len - 1;
+
 for (item = qemu_log_items; item->mask != 0; item++) {
-fprintf(f, "%-15s %s\n", item->name, item->help);
+disp_len = snprintf(help, help_len, "%s", item->help);
+if (disp_len >= help_len) {
+char *space = strrchr(help, ' ');
+*space = '\0';
+disp_len = space - help + 1;
+} else {
+disp_len = 0;
+}
+fprintf(f, "%-*s %s\n", name_len, item->name, help);
+if (disp_len) {
+fprintf(f, "%-*s  %s\n", name_len, "", item->help + disp_len - 1);
+}
 }
 #ifdef CONFIG_TRACE_LOG
-fprintf(f, "trace:PATTERN   enable trace events\n");
-fprintf(f, "\nUse \"-d trace:help\" to get a list of trace events.\n\n");
+fprintf(f, "%-*s %s\n", name_len, "trace:PATTERN", "enable trace events");
+fprintf(f, "\nUse \"-d trace:help\" to get a list of trace events.\n");
 #endif
 }
-- 
2.8.2




Re: [Qemu-devel] [PATCH v2 0/2]vhost-user: Extend protocol to seek response for any command.

2016-07-03 Thread Marc-André Lureau
Hi

On Fri, Jul 1, 2016 at 11:46 AM, Prerna Saxena  wrote:
> From: Prerna Saxena 
>
> The current vhost-user protocol requires the client to send responses to only 
> a
> few commands. For the remaining commands, it is impossible for QEMU to know 
> the
> status of the requested operation -- ie, did it succeed? If so, by what time?
>
> This is inconvenient, and can also lead to races. As an example:
>
> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net 
> application).
> Note that SET_MEM_TABLE does not require a reply according to the spec.
> (2) Qemu commits the memory to the guest.
> (3) Guest issues an I/O operation over a new memory region which was 
> configured on (1).
> (4) The application hasn't yet remapped the memory, but it sees the I/O 
> request.
> (5) The application cannot satisfy the request because it does not know about 
> those GPAs.
>
> Note that the kernel implementation does not suffer from this limitation 
> since messages are sent via an ioctl(). The ioctl() blocks until the backend 
> (eg. vhost-net) completes the command and returns (with an error code).
>
> Changing the behaviour of current vhost-user commands would break existing 
> applications.
> To work around this race, Patch 1 adds a get_features command to be sent 
> before returning from set_mem_table. While this is not a complete fix, it 
> will help client applications that strictly process messages in order.
>
> The second patch introduces a protocol extension, 
> VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to 
> request a response to any message by setting the newly introduced 
> "need_response" flag. The application must then respond to qemu by providing 
> a status about the requested operation.
>
> Changelog:
> -
> Changes since v1:
> Patch 1 : Ask for get_features before returning from set_mem_table(new).
> Patch 2 : * Improve documentation.
>   * Abstract out commonly used operations in the form of a function, 
> process_message_response(). Also implement this only for SET_MEM_TABLE.
>

Overall, that looks good to me.

Why do we have both "response" and "reply" which basically means the
same thing, right? I would rather stick with "reply".

I am not convinced the first patch is needed, imho it is a
workaround/hack, the solution is given with the patch 2 only.

> Prerna Saxena (2):
>   vhost-user: Attempt to prevent a race on set_mem_table.
>   vhost-user : Introduce a new feature VHOST_USER_PROTOCOL_F_REPLY_ACK.
>
>  docs/specs/vhost-user.txt |  40 
>  hw/virtio/vhost-user.c| 157 
> --
>  2 files changed, 150 insertions(+), 47 deletions(-)
>
> --
> 1.8.1.2
>



-- 
Marc-André Lureau



Re: [Qemu-devel] [PATCH] target-i386: Use struct X86XSaveArea in fpu_helper.c

2016-07-03 Thread Eduardo Habkost
On Sat, Jul 02, 2016 at 04:45:11PM -0700, Richard Henderson wrote:
> On 07/02/2016 01:02 PM, Eduardo Habkost wrote:
> > On Sat, Jul 02, 2016 at 09:44:31AM -0700, Richard Henderson wrote:
> > [...]
> > > @@ -1402,9 +1409,8 @@ void helper_xrstor(CPUX86State *env, target_ulong 
> > > ptr, uint64_t rfbm)
> > >  }
> > > 
> > >  /* The XCOMP field must be zero.  */
> > > -xcomp_bv0 = cpu_ldq_data_ra(env, ptr + 520, ra);
> > > -xcomp_bv1 = cpu_ldq_data_ra(env, ptr + 528, ra);
> > > -if (xcomp_bv0 || xcomp_bv1) {
> > > +xcomp_bv = cpu_ldq_data_ra(env, ptr + XO(header.xcomp_bv), ra);
> > > +if (xcomp_bv) {
> > >  raise_exception_ra(env, EXCP0D_GPF, ra);
> > 
> > You are changing the code to not check bytes 528-535 (bytes 16:23
> > of the XSAVE header) anymore, but Intel SDM says XRSTOR raises
> > #GP "If the standard form is executed and bytes 23:8 of the XSAVE
> > header are not all zero."
> 
> Hmm.  I must have an out-of-date version here, since mine just mentions the
> first 8 bytes, and I thought the current definition of X86XSaveHeader backed
> that up.
> 
> I can certainly modify the structure...

I was looking at a September 2015 version (Order Number
325462-056US). It is a bit confusing, because the header layout
documentation (Section 13.4.2) just says bytes 63:16 are
reserved, but the Instruction Set Reference for XRSTOR has the
following:

  Protected Mode Exceptions
  #GP(0)  [...]
  If the standard form is executed and bytes 23:8 of the
  XSAVE header are not all zero.

-- 
Eduardo



[Qemu-devel] [Bug 1598612] [NEW] Windows for Workgroups 3.11 installer crashes with a general protection fault

2016-07-03 Thread Julius Schwartzenberg
Public bug reported:

I used only disk images from here:
http://ia801606.us.archive.org/zipview.php?zip=/22/items/IBM_PC_Compatibles_TOSEC_2012_04_23/IBM_PC_Compatibles_TOSEC_2012_04_23.zip

When I try to install Windows for Workgroups 3.11 on either PC DOS 2000
or MS-DOS 6.22, the installer crashes after entering the graphical part
with two dialogs containing:

Application Error
WINSETUP caused a General Protection Fault in module 0EDF:7011WINSETUP 
will close.

Application Error
WINSETUP caused a General Protection Fault in module USER.EXE at 0001:40B6.

And then:
Standard Mode: Bad Fault in MS-DOS Extender.
Fault: 000D Stack Dump:   0070
Raw fault frame: EC= IP=5EF7 CS=037F FL=3087 SP=FFEE SS=02DF

This happens both with and without KVM. I tested with QEMU from Ubuntu
14.04 and 16.04 and recent GIT
(ef8757f1fe8095a256ee617e4dbac69d3b33ae94).

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1598612

Title:
  Windows for Workgroups 3.11 installer crashes with a general
  protection fault

Status in QEMU:
  New

Bug description:
  I used only disk images from here:
  
http://ia801606.us.archive.org/zipview.php?zip=/22/items/IBM_PC_Compatibles_TOSEC_2012_04_23/IBM_PC_Compatibles_TOSEC_2012_04_23.zip

  When I try to install Windows for Workgroups 3.11 on either PC DOS
  2000 or MS-DOS 6.22, the installer crashes after entering the
  graphical part with two dialogs containing:

  Application Error
  WINSETUP caused a General Protection Fault in module 
0EDF:7011WINSETUP will close.

  Application Error
  WINSETUP caused a General Protection Fault in module USER.EXE at 0001:40B6.

  And then:
  Standard Mode: Bad Fault in MS-DOS Extender.
  Fault: 000D Stack Dump:   0070
  Raw fault frame: EC= IP=5EF7 CS=037F FL=3087 SP=FFEE SS=02DF

  This happens both with and without KVM. I tested with QEMU from Ubuntu
  14.04 and 16.04 and recent GIT
  (ef8757f1fe8095a256ee617e4dbac69d3b33ae94).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1598612/+subscriptions



[Qemu-devel] [Bug 1598612] Re: Windows for Workgroups 3.11 installer crashes with a general protection fault

2016-07-03 Thread Julius Schwartzenberg
Windows 3.1 has the same problem, but the errors are slightly different:

Application Error
WINSETUP caused a General Protection Fault in module 0C77:7011WINSETUP 
will close.

Application Error
WINSETUP caused a General Protection Fault in module KRNL386.EXE at 0001:9F03.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1598612

Title:
  Windows for Workgroups 3.11 installer crashes with a general
  protection fault

Status in QEMU:
  New

Bug description:
  I used only disk images from here:
  
http://ia801606.us.archive.org/zipview.php?zip=/22/items/IBM_PC_Compatibles_TOSEC_2012_04_23/IBM_PC_Compatibles_TOSEC_2012_04_23.zip

  When I try to install Windows for Workgroups 3.11 on either PC DOS
  2000 or MS-DOS 6.22, the installer crashes after entering the
  graphical part with two dialogs containing:

  Application Error
  WINSETUP caused a General Protection Fault in module 
0EDF:7011WINSETUP will close.

  Application Error
  WINSETUP caused a General Protection Fault in module USER.EXE at 0001:40B6.

  And then:
  Standard Mode: Bad Fault in MS-DOS Extender.
  Fault: 000D Stack Dump:   0070
  Raw fault frame: EC= IP=5EF7 CS=037F FL=3087 SP=FFEE SS=02DF

  This happens both with and without KVM. I tested with QEMU from Ubuntu
  14.04 and 16.04 and recent GIT
  (ef8757f1fe8095a256ee617e4dbac69d3b33ae94).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1598612/+subscriptions



[Qemu-devel] [PULL 1/6] slirp: Split get_dns_addr

2016-07-03 Thread Samuel Thibault
Separate get_dns_addr into get_dns_addr_cached and get_dns_addr_resolv_conf
to make conversion to IPv6 easier.

Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
---
 slirp/slirp.c | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/slirp/slirp.c b/slirp/slirp.c
index 9f4bea3..e63c5e8 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -109,7 +109,28 @@ static void winsock_cleanup(void)
 
 static struct stat dns_addr_stat;
 
-int get_dns_addr(struct in_addr *pdns_addr)
+static int get_dns_addr_cached(struct in_addr *pdns_addr)
+{
+struct stat old_stat;
+if (curtime - dns_addr_time < TIMEOUT_DEFAULT) {
+*pdns_addr = dns_addr;
+return 0;
+}
+old_stat = dns_addr_stat;
+if (stat("/etc/resolv.conf", &dns_addr_stat) != 0) {
+return -1;
+}
+if (dns_addr_stat.st_dev == old_stat.st_dev
+&& dns_addr_stat.st_ino == old_stat.st_ino
+&& dns_addr_stat.st_size == old_stat.st_size
+&& dns_addr_stat.st_mtime == old_stat.st_mtime) {
+*pdns_addr = dns_addr;
+return 0;
+}
+return 1;
+}
+
+static int get_dns_addr_resolv_conf(struct in_addr *pdns_addr)
 {
 char buff[512];
 char buff2[257];
@@ -117,24 +138,6 @@ int get_dns_addr(struct in_addr *pdns_addr)
 int found = 0;
 struct in_addr tmp_addr;
 
-if (dns_addr.s_addr != 0) {
-struct stat old_stat;
-if ((curtime - dns_addr_time) < TIMEOUT_DEFAULT) {
-*pdns_addr = dns_addr;
-return 0;
-}
-old_stat = dns_addr_stat;
-if (stat("/etc/resolv.conf", &dns_addr_stat) != 0)
-return -1;
-if ((dns_addr_stat.st_dev == old_stat.st_dev)
-&& (dns_addr_stat.st_ino == old_stat.st_ino)
-&& (dns_addr_stat.st_size == old_stat.st_size)
-&& (dns_addr_stat.st_mtime == old_stat.st_mtime)) {
-*pdns_addr = dns_addr;
-return 0;
-}
-}
-
 f = fopen("/etc/resolv.conf", "r");
 if (!f)
 return -1;
@@ -174,6 +177,18 @@ int get_dns_addr(struct in_addr *pdns_addr)
 return 0;
 }
 
+int get_dns_addr(struct in_addr *pdns_addr)
+{
+if (dns_addr.s_addr != 0) {
+int ret;
+ret = get_dns_addr_cached(pdns_addr);
+if (ret <= 0) {
+return ret;
+}
+}
+return get_dns_addr_resolv_conf(pdns_addr);
+}
+
 #endif
 
 static void slirp_init_once(void)
-- 
2.8.1




[Qemu-devel] [PULL 4/6] slirp: Add RDNSS advertisement

2016-07-03 Thread Samuel Thibault
This adds the RDNSS option to IPv6 router advertisements, so that the guest
can autoconfigure the DNS server address.

Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 

---
Changes since last submission:
- Disable on windows, until we have support for it
---
 slirp/ip6_icmp.c | 27 ---
 slirp/ip6_icmp.h | 12 ++--
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/slirp/ip6_icmp.c b/slirp/ip6_icmp.c
index 48016a9..6d18e28 100644
--- a/slirp/ip6_icmp.c
+++ b/slirp/ip6_icmp.c
@@ -148,7 +148,11 @@ void ndp_send_ra(Slirp *slirp)
 rip->ip_nh = IPPROTO_ICMPV6;
 rip->ip_pl = htons(ICMP6_NDP_RA_MINLEN
 + NDPOPT_LINKLAYER_LEN
-+ NDPOPT_PREFIXINFO_LEN);
++ NDPOPT_PREFIXINFO_LEN
+#ifndef _WIN32
++ NDPOPT_RDNSS_LEN
+#endif
+);
 t->m_len = sizeof(struct ip6) + ntohs(rip->ip_pl);
 
 /* Build ICMPv6 packet */
@@ -166,16 +170,16 @@ void ndp_send_ra(Slirp *slirp)
 ricmp->icmp6_nra.lifetime = htons(NDP_AdvDefaultLifetime);
 ricmp->icmp6_nra.reach_time = htonl(NDP_AdvReachableTime);
 ricmp->icmp6_nra.retrans_time = htonl(NDP_AdvRetransTime);
+t->m_data += ICMP6_NDP_RA_MINLEN;
 
 /* Source link-layer address (NDP option) */
-t->m_data += ICMP6_NDP_RA_MINLEN;
 struct ndpopt *opt = mtod(t, struct ndpopt *);
 opt->ndpopt_type = NDPOPT_LINKLAYER_SOURCE;
 opt->ndpopt_len = NDPOPT_LINKLAYER_LEN / 8;
 in6_compute_ethaddr(rip->ip_src, opt->ndpopt_linklayer);
+t->m_data += NDPOPT_LINKLAYER_LEN;
 
 /* Prefix information (NDP option) */
-t->m_data += NDPOPT_LINKLAYER_LEN;
 struct ndpopt *opt2 = mtod(t, struct ndpopt *);
 opt2->ndpopt_type = NDPOPT_PREFIX_INFO;
 opt2->ndpopt_len = NDPOPT_PREFIXINFO_LEN / 8;
@@ -187,8 +191,25 @@ void ndp_send_ra(Slirp *slirp)
 opt2->ndpopt_prefixinfo.pref_lt = htonl(NDP_AdvPrefLifetime);
 opt2->ndpopt_prefixinfo.reserved2 = 0;
 opt2->ndpopt_prefixinfo.prefix = slirp->vprefix_addr6;
+t->m_data += NDPOPT_PREFIXINFO_LEN;
+
+#ifndef _WIN32
+/* Prefix information (NDP option) */
+/* disabled for windows for now, until get_dns6_addr is implemented */
+struct ndpopt *opt3 = mtod(t, struct ndpopt *);
+opt3->ndpopt_type = NDPOPT_RDNSS;
+opt3->ndpopt_len = NDPOPT_RDNSS_LEN / 8;
+opt3->ndpopt_rdnss.reserved = 0;
+opt3->ndpopt_rdnss.lifetime = htonl(2 * NDP_MaxRtrAdvInterval);
+opt3->ndpopt_rdnss.addr = slirp->vnameserver_addr6;
+t->m_data += NDPOPT_RDNSS_LEN;
+#endif
 
 /* ICMPv6 Checksum */
+#ifndef _WIN32
+t->m_data -= NDPOPT_RDNSS_LEN;
+#endif
+t->m_data -= NDPOPT_PREFIXINFO_LEN;
 t->m_data -= NDPOPT_LINKLAYER_LEN;
 t->m_data -= ICMP6_NDP_RA_MINLEN;
 t->m_data -= sizeof(struct ip6);
diff --git a/slirp/ip6_icmp.h b/slirp/ip6_icmp.h
index 9460bf8..2282d29 100644
--- a/slirp/ip6_icmp.h
+++ b/slirp/ip6_icmp.h
@@ -122,6 +122,7 @@ struct ndpopt {
 uint8_t ndpopt_len; /* /!\ In units of 8 octets */
 union {
 unsigned char   linklayer_addr[6];  /* Source/Target Link-layer */
+#define ndpopt_linklayer ndpopt_body.linklayer_addr
 struct prefixinfo { /* Prefix Information */
 uint8_t prefix_length;
 #ifdef HOST_WORDS_BIGENDIAN
@@ -134,19 +135,26 @@ struct ndpopt {
 uint32_treserved2;
 struct in6_addr prefix;
 } QEMU_PACKED prefixinfo;
-} ndpopt_body;
-#define ndpopt_linklayer ndpopt_body.linklayer_addr
 #define ndpopt_prefixinfo ndpopt_body.prefixinfo
+struct rdnss {
+uint16_t reserved;
+uint32_t lifetime;
+struct in6_addr addr;
+} QEMU_PACKED rdnss;
+#define ndpopt_rdnss ndpopt_body.rdnss
+} ndpopt_body;
 } QEMU_PACKED;
 
 /* NDP options type */
 #define NDPOPT_LINKLAYER_SOURCE 1   /* Source Link-Layer Address */
 #define NDPOPT_LINKLAYER_TARGET 2   /* Target Link-Layer Address */
 #define NDPOPT_PREFIX_INFO  3   /* Prefix Information */
+#define NDPOPT_RDNSS25  /* Recursive DNS Server Address */
 
 /* NDP options size, in octets. */
 #define NDPOPT_LINKLAYER_LEN8
 #define NDPOPT_PREFIXINFO_LEN   32
+#define NDPOPT_RDNSS_LEN24
 
 /*
  * Definition of type and code field values.
-- 
2.8.1




[Qemu-devel] [PULL 6/6] slirp: Add support for stateless DHCPv6

2016-07-03 Thread Samuel Thibault
From: Thomas Huth 

Provide basic support for stateless DHCPv6 (see RFC 3736) so
that guests can also automatically boot via IPv6 with SLIRP
(for IPv6 network booting, see RFC 5970 for details).

Tested with:

qemu-system-ppc64 -nographic -vga none -boot n -net nic \
-net user,ipv6=yes,ipv4=no,tftp=/path/to/tftp,bootfile=ppc64.img

Signed-off-by: Thomas Huth 
Signed-off-by: Samuel Thibault 
---
 slirp/Makefile.objs |   2 +-
 slirp/dhcpv6.c  | 209 
 slirp/dhcpv6.h  |  22 ++
 slirp/udp6.c|  13 +++-
 4 files changed, 244 insertions(+), 2 deletions(-)
 create mode 100644 slirp/dhcpv6.c
 create mode 100644 slirp/dhcpv6.h

diff --git a/slirp/Makefile.objs b/slirp/Makefile.objs
index 6748e4f..1baa1f1 100644
--- a/slirp/Makefile.objs
+++ b/slirp/Makefile.objs
@@ -1,5 +1,5 @@
 common-obj-y = cksum.o if.o ip_icmp.o ip6_icmp.o ip6_input.o ip6_output.o \
-   ip_input.o ip_output.o dnssearch.o
+   ip_input.o ip_output.o dnssearch.o dhcpv6.o
 common-obj-y += slirp.o mbuf.o misc.o sbuf.o socket.o tcp_input.o tcp_output.o
 common-obj-y += tcp_subr.o tcp_timer.o udp.o udp6.o bootp.o tftp.o arp_table.o 
\
 ndp_table.o
diff --git a/slirp/dhcpv6.c b/slirp/dhcpv6.c
new file mode 100644
index 000..02c51c7
--- /dev/null
+++ b/slirp/dhcpv6.c
@@ -0,0 +1,209 @@
+/*
+ * SLIRP stateless DHCPv6
+ *
+ * We only support stateless DHCPv6, e.g. for network booting.
+ * See RFC 3315, RFC 3736, RFC 3646 and RFC 5970 for details.
+ *
+ * Copyright 2016 Thomas Huth, Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "slirp.h"
+#include "dhcpv6.h"
+
+/* DHCPv6 message types */
+#define MSGTYPE_REPLY7
+#define MSGTYPE_INFO_REQUEST 11
+
+/* DHCPv6 option types */
+#define OPTION_CLIENTID  1
+#define OPTION_IAADDR5
+#define OPTION_ORO   6
+#define OPTION_DNS_SERVERS   23
+#define OPTION_BOOTFILE_URL  59
+
+struct requested_infos {
+uint8_t *client_id;
+int client_id_len;
+bool want_dns;
+bool want_boot_url;
+};
+
+/**
+ * Analyze the info request message sent by the client to see what data it
+ * provided and what it wants to have. The information is gathered in the
+ * "requested_infos" struct. Note that client_id (if provided) points into
+ * the odata region, thus the caller must keep odata valid as long as it
+ * needs to access the requested_infos struct.
+ */
+static int dhcpv6_parse_info_request(uint8_t *odata, int olen,
+ struct requested_infos *ri)
+{
+int i, req_opt;
+
+while (olen > 4) {
+/* Parse one option */
+int option = odata[0] << 8 | odata[1];
+int len = odata[2] << 8 | odata[3];
+
+if (len + 4 > olen) {
+qemu_log_mask(LOG_GUEST_ERROR, "Guest sent bad DHCPv6 packet!\n");
+return -E2BIG;
+}
+
+switch (option) {
+case OPTION_IAADDR:
+/* According to RFC3315, we must discard requests with IA option */
+return -EINVAL;
+case OPTION_CLIENTID:
+if (len > 256) {
+/* Avoid very long IDs which could cause problems later */
+return -E2BIG;
+}
+ri->client_id = odata + 4;
+ri->client_id_len = len;
+break;
+case OPTION_ORO:/* Option request option */
+if (len & 1) {
+return -EINVAL;
+}
+/* Check which options the client wants to have */
+for (i = 0; i < len; i += 2) {
+req_opt = odata[4 + i] << 8 | odata[4 + i + 1];
+switch (req_opt) {
+case OPTION_DNS_SERVERS:
+ri->want_dns = true;
+break;
+case OPTION_BOOTFILE_URL:
+ri->want_boot_url = true;
+break;
+default:
+DEBUG_MISC((dfd, "dhcpv6: Unsupported option request %d\n",
+req_opt));
+}
+}
+break;
+default:
+DEBUG_MISC((dfd, "dhcpv6 info req: Unsupported option %d, 
len=%d\n",
+option, len));
+}
+
+odata += len + 4;
+

[Qemu-devel] [PULL 3/6] slirp: Support link-local DNS addresses

2016-07-03 Thread Samuel Thibault
They look like fe80::%eth0

Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 

---
Changes since last submission:
- fix windows build
---
 slirp/libslirp.h |  2 +-
 slirp/slirp.c| 32 +++-
 slirp/socket.c   |  5 -
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/slirp/libslirp.h b/slirp/libslirp.h
index b0cfbc5..b6fc584 100644
--- a/slirp/libslirp.h
+++ b/slirp/libslirp.h
@@ -7,7 +7,7 @@ struct Slirp;
 typedef struct Slirp Slirp;
 
 int get_dns_addr(struct in_addr *pdns_addr);
-int get_dns6_addr(struct in6_addr *pdns6_addr);
+int get_dns6_addr(struct in6_addr *pdns6_addr, uint32_t *scope_id);
 
 Slirp *slirp_init(int restricted, bool in_enabled, struct in_addr vnetwork,
   struct in_addr vnetmask, struct in_addr vhost,
diff --git a/slirp/slirp.c b/slirp/slirp.c
index 197d9f2..7eb183d 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -30,6 +30,10 @@
 #include "hw/hw.h"
 #include "qemu/cutils.h"
 
+#ifndef _WIN32
+#include 
+#endif
+
 /* host loopback address */
 struct in_addr loopback_addr;
 /* host loopback network mask */
@@ -46,9 +50,13 @@ static QTAILQ_HEAD(slirp_instances, Slirp) slirp_instances =
 QTAILQ_HEAD_INITIALIZER(slirp_instances);
 
 static struct in_addr dns_addr;
+#ifndef _WIN32
 static struct in6_addr dns6_addr;
+#endif
 static u_int dns_addr_time;
+#ifndef _WIN32
 static u_int dns6_addr_time;
+#endif
 
 #define TIMEOUT_FAST 2  /* milliseconds */
 #define TIMEOUT_SLOW 499  /* milliseconds */
@@ -102,7 +110,7 @@ int get_dns_addr(struct in_addr *pdns_addr)
 return 0;
 }
 
-int get_dns6_addr(struct in6_addr *pdns_addr6)
+int get_dns6_addr(struct in6_addr *pdns6_addr, uint32_t *scope_id)
 {
 return -1;
 }
@@ -138,13 +146,15 @@ static int get_dns_addr_cached(void *pdns_addr, void 
*cached_addr,
 }
 
 static int get_dns_addr_resolv_conf(int af, void *pdns_addr, void *cached_addr,
-socklen_t addrlen, u_int *cached_time)
+socklen_t addrlen, uint32_t *scope_id,
+u_int *cached_time)
 {
 char buff[512];
 char buff2[257];
 FILE *f;
 int found = 0;
 void *tmp_addr = alloca(addrlen);
+unsigned if_index;
 
 f = fopen("/etc/resolv.conf", "r");
 if (!f)
@@ -155,6 +165,14 @@ static int get_dns_addr_resolv_conf(int af, void 
*pdns_addr, void *cached_addr,
 #endif
 while (fgets(buff, 512, f) != NULL) {
 if (sscanf(buff, "nameserver%*[ \t]%256s", buff2) == 1) {
+char *c = strchr(buff2, '%');
+if (c) {
+if_index = if_nametoindex(c + 1);
+*c = '\0';
+} else {
+if_index = 0;
+}
+
 if (!inet_pton(af, buff2, tmp_addr)) {
 continue;
 }
@@ -162,6 +180,9 @@ static int get_dns_addr_resolv_conf(int af, void 
*pdns_addr, void *cached_addr,
 if (!found) {
 memcpy(pdns_addr, tmp_addr, addrlen);
 memcpy(cached_addr, tmp_addr, addrlen);
+if (scope_id) {
+*scope_id = if_index;
+}
 *cached_time = curtime;
 }
 #ifdef DEBUG
@@ -205,10 +226,10 @@ int get_dns_addr(struct in_addr *pdns_addr)
 }
 }
 return get_dns_addr_resolv_conf(AF_INET, pdns_addr, &dns_addr,
-sizeof(dns_addr), &dns_addr_time);
+sizeof(dns_addr), NULL, &dns_addr_time);
 }
 
-int get_dns6_addr(struct in6_addr *pdns6_addr)
+int get_dns6_addr(struct in6_addr *pdns6_addr, uint32_t *scope_id)
 {
 static struct stat dns6_addr_stat;
 
@@ -221,7 +242,8 @@ int get_dns6_addr(struct in6_addr *pdns6_addr)
 }
 }
 return get_dns_addr_resolv_conf(AF_INET6, pdns6_addr, &dns6_addr,
-sizeof(dns6_addr), &dns6_addr_time);
+sizeof(dns6_addr),
+scope_id, &dns6_addr_time);
 }
 
 #endif
diff --git a/slirp/socket.c b/slirp/socket.c
index 8e8de88..02e89ce 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -816,7 +816,10 @@ void sotranslate_out(struct socket *so, struct 
sockaddr_storage *addr)
 if (in6_equal_net(&so->so_faddr6, &slirp->vprefix_addr6,
 slirp->vprefix_len)) {
 if (in6_equal(&so->so_faddr6, &slirp->vnameserver_addr6)) {
-if (get_dns6_addr(&sin6->sin6_addr) < 0) {
+uint32_t scope_id;
+if (get_dns6_addr(&sin6->sin6_addr, &scope_id) >= 0) {
+sin6->sin6_scope_id = scope_id;
+} else {
 sin6->sin6_addr = in6addr_loopback;
 }
 } else {
-- 
2.8.1




[Qemu-devel] [PULL 0/6] slirp: ipv6 dns & dhcp support

2016-07-03 Thread Samuel Thibault
The following changes since commit 9a48e3670030148a8d00c8d4d4cd7f051c0d9f39:

  Added Bulgarian translation (2016-07-01 16:06:57 +0100)

are available in the git repository at:

  http://people.debian.org/~sthibault/qemu.git tags/samuel-thibault

for you to fetch changes up to 7b143999f2fbbd576d60a180add163966634fca6:

  slirp: Add support for stateless DHCPv6 (2016-07-03 23:59:42 +0200)


slirp updates


Samuel Thibault (4):
  slirp: Split get_dns_addr
  slirp: Add dns6 resolution
  slirp: Support link-local DNS addresses
  slirp: Add RDNSS advertisement

Thomas Huth (2):
  slirp: Remove superfluous memset() calls from the TFTP code
  slirp: Add support for stateless DHCPv6

 slirp/Makefile.objs |   2 +-
 slirp/dhcpv6.c  | 209 
 slirp/dhcpv6.h  |  22 ++
 slirp/ip6.h |   9 +++
 slirp/ip6_icmp.c|  27 ++-
 slirp/ip6_icmp.h|  12 ++-
 slirp/libslirp.h|   1 +
 slirp/slirp.c   | 126 ---
 slirp/socket.c  |   7 +-
 slirp/tftp.c|   4 -
 slirp/udp6.c|  13 +++-
 11 files changed, 392 insertions(+), 40 deletions(-)
 create mode 100644 slirp/dhcpv6.c
 create mode 100644 slirp/dhcpv6.h



[Qemu-devel] [PULL 2/6] slirp: Add dns6 resolution

2016-07-03 Thread Samuel Thibault
This makes get_dns_addr address family-agnostic, thus allowing to add the
IPv6 case.

Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
---
 slirp/ip6.h  |  9 +++
 slirp/libslirp.h |  1 +
 slirp/slirp.c| 79 
 slirp/socket.c   |  4 +--
 4 files changed, 69 insertions(+), 24 deletions(-)

diff --git a/slirp/ip6.h b/slirp/ip6.h
index 8ddfa24..da23de6 100644
--- a/slirp/ip6.h
+++ b/slirp/ip6.h
@@ -26,6 +26,12 @@
 0x00, 0x00, 0x00, 0x00,\
 0x00, 0x00, 0x00, 0x02 } }
 
+#define ZERO_ADDR  { .s6_addr = \
+{ 0x00, 0x00, 0x00, 0x00,\
+0x00, 0x00, 0x00, 0x00,\
+0x00, 0x00, 0x00, 0x00,\
+0x00, 0x00, 0x00, 0x00 } }
+
 static inline bool in6_equal(const struct in6_addr *a, const struct in6_addr 
*b)
 {
 return memcmp(a, b, sizeof(*a)) == 0;
@@ -84,6 +90,9 @@ static inline bool in6_equal_mach(const struct in6_addr *a,
 #define in6_solicitednode_multicast(a)\
 (in6_equal_net(a, &(struct in6_addr)SOLICITED_NODE_PREFIX, 104))
 
+#define in6_zero(a)\
+(in6_equal(a, &(struct in6_addr)ZERO_ADDR))
+
 /* Compute emulated host MAC address from its ipv6 address */
 static inline void in6_compute_ethaddr(struct in6_addr ip,
uint8_t eth[ETH_ALEN])
diff --git a/slirp/libslirp.h b/slirp/libslirp.h
index 127aa41..b0cfbc5 100644
--- a/slirp/libslirp.h
+++ b/slirp/libslirp.h
@@ -7,6 +7,7 @@ struct Slirp;
 typedef struct Slirp Slirp;
 
 int get_dns_addr(struct in_addr *pdns_addr);
+int get_dns6_addr(struct in6_addr *pdns6_addr);
 
 Slirp *slirp_init(int restricted, bool in_enabled, struct in_addr vnetwork,
   struct in_addr vnetmask, struct in_addr vhost,
diff --git a/slirp/slirp.c b/slirp/slirp.c
index e63c5e8..197d9f2 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -46,7 +46,9 @@ static QTAILQ_HEAD(slirp_instances, Slirp) slirp_instances =
 QTAILQ_HEAD_INITIALIZER(slirp_instances);
 
 static struct in_addr dns_addr;
+static struct in6_addr dns6_addr;
 static u_int dns_addr_time;
+static u_int dns6_addr_time;
 
 #define TIMEOUT_FAST 2  /* milliseconds */
 #define TIMEOUT_SLOW 499  /* milliseconds */
@@ -100,6 +102,11 @@ int get_dns_addr(struct in_addr *pdns_addr)
 return 0;
 }
 
+int get_dns6_addr(struct in6_addr *pdns_addr6)
+{
+return -1;
+}
+
 static void winsock_cleanup(void)
 {
 WSACleanup();
@@ -107,36 +114,37 @@ static void winsock_cleanup(void)
 
 #else
 
-static struct stat dns_addr_stat;
-
-static int get_dns_addr_cached(struct in_addr *pdns_addr)
+static int get_dns_addr_cached(void *pdns_addr, void *cached_addr,
+   socklen_t addrlen,
+   struct stat *cached_stat, u_int *cached_time)
 {
 struct stat old_stat;
-if (curtime - dns_addr_time < TIMEOUT_DEFAULT) {
-*pdns_addr = dns_addr;
+if (curtime - *cached_time < TIMEOUT_DEFAULT) {
+memcpy(pdns_addr, cached_addr, addrlen);
 return 0;
 }
-old_stat = dns_addr_stat;
-if (stat("/etc/resolv.conf", &dns_addr_stat) != 0) {
+old_stat = *cached_stat;
+if (stat("/etc/resolv.conf", cached_stat) != 0) {
 return -1;
 }
-if (dns_addr_stat.st_dev == old_stat.st_dev
-&& dns_addr_stat.st_ino == old_stat.st_ino
-&& dns_addr_stat.st_size == old_stat.st_size
-&& dns_addr_stat.st_mtime == old_stat.st_mtime) {
-*pdns_addr = dns_addr;
+if (cached_stat->st_dev == old_stat.st_dev
+&& cached_stat->st_ino == old_stat.st_ino
+&& cached_stat->st_size == old_stat.st_size
+&& cached_stat->st_mtime == old_stat.st_mtime) {
+memcpy(pdns_addr, cached_addr, addrlen);
 return 0;
 }
 return 1;
 }
 
-static int get_dns_addr_resolv_conf(struct in_addr *pdns_addr)
+static int get_dns_addr_resolv_conf(int af, void *pdns_addr, void *cached_addr,
+socklen_t addrlen, u_int *cached_time)
 {
 char buff[512];
 char buff2[257];
 FILE *f;
 int found = 0;
-struct in_addr tmp_addr;
+void *tmp_addr = alloca(addrlen);
 
 f = fopen("/etc/resolv.conf", "r");
 if (!f)
@@ -147,13 +155,14 @@ static int get_dns_addr_resolv_conf(struct in_addr 
*pdns_addr)
 #endif
 while (fgets(buff, 512, f) != NULL) {
 if (sscanf(buff, "nameserver%*[ \t]%256s", buff2) == 1) {
-if (!inet_aton(buff2, &tmp_addr))
+if (!inet_pton(af, buff2, tmp_addr)) {
 continue;
+}
 /* If it's the first one, set it to dns_addr */
 if (!found) {
-*pdns_addr = tmp_addr;
-dns_addr = tmp_addr;
-dns_addr_time = curtime;
+memcpy(pdns_addr, tmp_addr, addrlen);
+memcpy(cached_addr, tmp_addr, addrlen);
+*cached_time = curtim

[Qemu-devel] [PULL 5/6] slirp: Remove superfluous memset() calls from the TFTP code

2016-07-03 Thread Samuel Thibault
From: Thomas Huth 

Commit fad7fb9ccd8013ea03  ("Add IPv6 support to the TFTP code")
refactored some common code for preparing the mbuf into a new
function called tftp_prep_mbuf_data(). One part of this common
code is to do a "memset(m->m_data, 0, m->m_size);" for the related
buffer first. However, at two spots, the memset() was not removed
from the calling function, so it currently done twice in these code
paths. Thus let's delete these superfluous memsets in the calling
functions now.

Signed-off-by: Thomas Huth 
Signed-off-by: Samuel Thibault 
---
 slirp/tftp.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/slirp/tftp.c b/slirp/tftp.c
index 12b5ff6..3673402 100644
--- a/slirp/tftp.c
+++ b/slirp/tftp.c
@@ -208,8 +208,6 @@ static void tftp_send_error(struct tftp_session *spt,
 goto out;
   }
 
-  memset(m->m_data, 0, m->m_size);
-
   tp = tftp_prep_mbuf_data(spt, m);
 
   tp->tp_op = htons(TFTP_ERROR);
@@ -237,8 +235,6 @@ static void tftp_send_next_block(struct tftp_session *spt,
 return;
   }
 
-  memset(m->m_data, 0, m->m_size);
-
   tp = tftp_prep_mbuf_data(spt, m);
 
   tp->tp_op = htons(TFTP_DATA);
-- 
2.8.1




Re: [Qemu-devel] [PATCH v20 Resend 09/10] tests: add unit test case for replication

2016-07-03 Thread Changlong Xie

On 06/14/2016 03:53 PM, Changlong Xie wrote:

Signed-off-by: Wen Congyang 
Signed-off-by: Changlong Xie 
---
  tests/.gitignore |   1 +
  tests/Makefile   |   4 +
  tests/test-replication.c | 555 +++
  3 files changed, 560 insertions(+)
  create mode 100644 tests/test-replication.c

diff --git a/tests/.gitignore b/tests/.gitignore
index a06a8ba..d22ab06 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -58,6 +58,7 @@ test-qmp-introspect.[ch]
  test-qmp-marshal.c
  test-qmp-output-visitor
  test-rcu-list
+test-replication
  test-rfifolock
  test-string-input-visitor
  test-string-output-visitor
diff --git a/tests/Makefile b/tests/Makefile
index a3e20e3..901b8e4 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -103,6 +103,7 @@ check-unit-y += tests/test-crypto-xts$(EXESUF)
  check-unit-y += tests/test-crypto-block$(EXESUF)
  gcov-files-test-logging-y = tests/test-logging.c
  check-unit-y += tests/test-logging$(EXESUF)
+check-unit-y += tests/test-replication$(EXESUF)

  check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh

@@ -451,6 +452,9 @@ tests/test-base64$(EXESUF): tests/test-base64.o \

  tests/test-logging$(EXESUF): tests/test-logging.o $(test-util-obj-y)

+tests/test-replication$(EXESUF): tests/test-replication.o $(test-util-obj-y) \
+   $(test-block-obj-y)
+
  tests/test-qapi-types.c tests/test-qapi-types.h :\
  $(SRC_PATH)/tests/qapi-schema/qapi-schema-test.json 
$(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \
diff --git a/tests/test-replication.c b/tests/test-replication.c
new file mode 100644
index 000..b5bb2eb
--- /dev/null
+++ b/tests/test-replication.c
@@ -0,0 +1,555 @@
+/*
+ * Block replication tests
+ *
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Author: Changlong Xie 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "replication.h"
+#include "block/block_int.h"
+#include "sysemu/block-backend.h"
+
+#define IMG_SIZE (64 * 1024 * 1024)
+
+/* primary */
+static char p_local_disk[] = "/tmp/p_local_disk.XX";
+
+/* secondary */
+#define S_ID "secondary-id"
+#define S_LOCAL_DISK_ID "secondary-local-disk-id"
+static char s_local_disk[] = "/tmp/s_local_disk.XX";
+static char s_active_disk[] = "/tmp/s_active_disk.XX";
+static char s_hidden_disk[] = "/tmp/s_hidden_disk.XX";
+
+/* FIXME: steal from blockdev.c */
+QemuOptsList qemu_drive_opts = {
+.name = "drive",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_drive_opts.head),
+.desc = {
+{ /* end of list */ }
+},
+};
+
+static void io_read(BlockDriverState *bs, long pattern, int64_t pattern_offset,
+int64_t pattern_count, int64_t offset, int64_t count,
+bool expect_failed)
+{
+char *buf;
+void *cmp_buf = NULL;
+int ret;
+
+/* alloc pattern buffer */
+if (pattern) {
+cmp_buf = g_malloc(pattern_count);
+memset(cmp_buf, pattern, pattern_count);
+}
+
+/* alloc read buffer */
+buf = qemu_blockalign(bs, count);
+memset(buf, 0xab, count);
+
+/* do read */
+ret = bdrv_read(bs, offset >> 9, (uint8_t *)buf, count >> 9);
+
+/* assert and compare buf */
+if (expect_failed) {
+g_assert(ret < 0);
+} else {
+g_assert(ret >= 0);
+if (pattern) {
+g_assert(memcmp(buf + pattern_offset, cmp_buf, pattern_count) <= 
0);
+}
+}
+
+g_free(cmp_buf);
+qemu_vfree(buf);
+}
+
+static void io_write(BlockDriverState *bs, long pattern, int64_t offset,
+ int64_t count, bool expect_failed)
+{
+void *pattern_buf = NULL;
+int ret;
+
+/* alloc pattern buffer */
+if (pattern) {
+pattern_buf = qemu_blockalign(bs, count);
+memset(pattern_buf, pattern, count);
+}
+
+/* do write */
+if (pattern) {
+ret = bdrv_write(bs, offset >> 9, (uint8_t *)pattern_buf, count >> 9);
+} else {
+ret = bdrv_write_zeroes(bs, offset >> 9, count >> 9, 0);


Commit 74021bc "block: Switch bdrv_write_zeroes() to byte interface", so 
i'll use bdrv_pwrite_zeroes() in next version. Also will 
s/9/BDRV_SECTOR_BITS/



+}
+
+/* assert */
+if (expect_failed) {
+g_assert(ret < 0);
+} else {
+g_assert(ret >= 0);
+}
+
+qemu_vfree(pattern_buf);
+}
+
+/*
+ * Create a uniquely-named empty temporary file.
+ */
+static void make_temp(char *template)
+{
+int fd;
+
+fd = mkstemp(template);
+g_assert(fd >= 0);
+close(fd);
+}
+
+
+static void prepare_imgs(void)
+{
+Error *local_err = NULL;
+
+make_temp(p_local_disk);
+make_temp(s_local_disk);
+make_temp(s_active_disk);
+make_temp(s_hidden_disk);
+
+/* Primary */
+bdrv_img_create(p_local_disk, "qcow2", NULL, NULL, NULL, I

Re: [Qemu-devel] [PATCH 1/3] Mediated device Core driver

2016-07-03 Thread Jike Song
On 06/21/2016 12:31 AM, Kirti Wankhede wrote:
> +/*
> + * mdev_register_device : Register a device
> + * @dev: device structure representing parent device.
> + * @ops: Parent device operation structure to be registered.
> + *
> + * Add device to list of registered parent devices.
> + * Returns a negative value on error, otherwise 0.
> + */
> +int mdev_register_device(struct device *dev, const struct parent_ops *ops)
> +{
> + int ret = 0;
> + struct parent_device *parent;
> +
> + if (!dev || !ops)
> + return -EINVAL;
> +
> + mutex_lock(&parent_devices.list_lock);
> +
> + /* Check for duplicate */
> + parent = find_parent_device(dev);
> + if (parent) {
> + ret = -EEXIST;
> + goto add_dev_err;
> + }
> +
> + parent = kzalloc(sizeof(*parent), GFP_KERNEL);
> + if (!parent) {
> + ret = -ENOMEM;
> + goto add_dev_err;
> + }
> +
> + kref_init(&parent->ref);
> + list_add(&parent->next, &parent_devices.dev_list);
> + mutex_unlock(&parent_devices.list_lock);
> +
> + parent->dev = dev;
> + parent->ops = ops;
> + mutex_init(&parent->ops_lock);
> + mutex_init(&parent->mdev_list_lock);
> + INIT_LIST_HEAD(&parent->mdev_list);
> + init_waitqueue_head(&parent->release_done);
> +
> + ret = mdev_create_sysfs_files(dev);
> + if (ret)
> + goto add_sysfs_error;
> +
> + ret = mdev_add_attribute_group(dev, ops->dev_attr_groups);
> + if (ret)
> + goto add_group_error;
> +
> + dev_info(dev, "MDEV: Registered\n");
> + return 0;
> +
> +add_group_error:
> + mdev_remove_sysfs_files(dev);
> +add_sysfs_error:
> + mutex_lock(&parent_devices.list_lock);
> + list_del(&parent->next);
> + mutex_unlock(&parent_devices.list_lock);
> + mdev_put_parent(parent);
> + return ret;
> +
> +add_dev_err:
> + mutex_unlock(&parent_devices.list_lock);
> + return ret;
> +}
> +EXPORT_SYMBOL(mdev_register_device);
> +
...
> +static int __init mdev_init(void)
> +{
> + int ret;
> +
> + mutex_init(&parent_devices.list_lock);
> + INIT_LIST_HEAD(&parent_devices.dev_list);
> +
> + ret = class_register(&mdev_class);
> + if (ret) {
> + pr_err("Failed to register mdev class\n");
> + return ret;
> + }
> +
> + ret = mdev_bus_register();
> + if (ret) {
> + pr_err("Failed to register mdev bus\n");
> + class_unregister(&mdev_class);
> + return ret;
> + }
> +
> + return ret;
> +}
> +
> +static void __exit mdev_exit(void)
> +{
> + mdev_bus_unregister();
> + class_unregister(&mdev_class);
> +}
> +
> +module_init(mdev_init)
> +module_exit(mdev_exit)

Hi Kirti,

I have a question about the order of initialization,

phy_driver calls mdev_register_device in its __init function;
mdev_register_device accesses parent_devices.list_lock;
parent.list_lock is initialized in __init of mdev;

The __init function of both phy driver and mdev are classified with
module_init, if they are selected to be 'Y' in .config, it's possible that in
mdev_register_device(), the mutex is still uninitialized.

The problem here I think is both mdev and phy driver are actually *drivers*,
so once they are builtin, the initialization order is hard to assume.

Do you have any idea to avoid this? Thanks!

 
--
Thanks,
Jike




Re: [Qemu-devel] [PATCH] ppc: Fix xsrdpi, xvrdpi and xvrspi rounding

2016-07-03 Thread David Gibson
On Mon, Jul 04, 2016 at 09:20:12AM +1000, Anton Blanchard wrote:
> From: Anton Blanchard 
> 
> xsrdpi, xvrdpi and xvrspi use the round ties away method, not round
> nearest even.
> 
> Signed-off-by: Anton Blanchard 

Applied to ppc-for-2.7.

I take it float_round_ties_away is the same thing the architecture
refers to as "round to Nearest Away"?


> ---
>  target-ppc/fpu_helper.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
> index 4ef893b..d9795d0 100644
> --- a/target-ppc/fpu_helper.c
> +++ b/target-ppc/fpu_helper.c
> @@ -2689,19 +2689,19 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)   
>  \
>  helper_float_check_status(env);\
>  }
>  
> -VSX_ROUND(xsrdpi, 1, float64, VsrD(0), float_round_nearest_even, 1)
> +VSX_ROUND(xsrdpi, 1, float64, VsrD(0), float_round_ties_away, 1)
>  VSX_ROUND(xsrdpic, 1, float64, VsrD(0), FLOAT_ROUND_CURRENT, 1)
>  VSX_ROUND(xsrdpim, 1, float64, VsrD(0), float_round_down, 1)
>  VSX_ROUND(xsrdpip, 1, float64, VsrD(0), float_round_up, 1)
>  VSX_ROUND(xsrdpiz, 1, float64, VsrD(0), float_round_to_zero, 1)
>  
> -VSX_ROUND(xvrdpi, 2, float64, VsrD(i), float_round_nearest_even, 0)
> +VSX_ROUND(xvrdpi, 2, float64, VsrD(i), float_round_ties_away, 0)
>  VSX_ROUND(xvrdpic, 2, float64, VsrD(i), FLOAT_ROUND_CURRENT, 0)
>  VSX_ROUND(xvrdpim, 2, float64, VsrD(i), float_round_down, 0)
>  VSX_ROUND(xvrdpip, 2, float64, VsrD(i), float_round_up, 0)
>  VSX_ROUND(xvrdpiz, 2, float64, VsrD(i), float_round_to_zero, 0)
>  
> -VSX_ROUND(xvrspi, 4, float32, VsrW(i), float_round_nearest_even, 0)
> +VSX_ROUND(xvrspi, 4, float32, VsrW(i), float_round_ties_away, 0)
>  VSX_ROUND(xvrspic, 4, float32, VsrW(i), FLOAT_ROUND_CURRENT, 0)
>  VSX_ROUND(xvrspim, 4, float32, VsrW(i), float_round_down, 0)
>  VSX_ROUND(xvrspip, 4, float32, VsrW(i), float_round_up, 0)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v0] spapr: Ensure thread0 of CPU core is always realized first

2016-07-03 Thread David Gibson
On Fri, Jul 01, 2016 at 10:44:39AM +0530, Bharata B Rao wrote:
> During CPU core realization, we create all the thread objects and parent
> them to the core object in a loop. However, the realization of thread
> objects is done separately by walking the threads of a core using
> object_child_foreach(). With this, there is no guarantee on the order
> in which the child thread objects get realized. Since CPU device tree
> properties are currently derived from the CPU thread object, we assume
> thread0 of the core to be the representative thread of the core when
> creating device tree properties for the core. If thread0 is not the
> first thread that gets realized, then we would end up having an
> incorrect dt_id for the core and this causes hotplug failures from
> the guest.
> 
> Fix this by realizing each thread object by walking the core's thread
> object list thereby ensuring that thread0 and other threads are always
> realized in the correct order.
> 
> Future TODO: CPU DT nodes are per-core properties and we should
> ideally base the creation of CPU DT nodes on core objects rather than
> the thread objects.
> 
> Signed-off-by: Bharata B Rao 

Applied to ppc-for-2.7, thanks.

> ---
>  hw/ppc/spapr_cpu_core.c | 29 -
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index a384db5..70b6b0b 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -259,9 +259,9 @@ out:
>  error_propagate(errp, local_err);
>  }
>  
> -static int spapr_cpu_core_realize_child(Object *child, void *opaque)
> +static void spapr_cpu_core_realize_child(Object *child, Error **errp)
>  {
> -Error **errp = opaque, *local_err = NULL;
> +Error *local_err = NULL;
>  sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  CPUState *cs = CPU(child);
>  PowerPCCPU *cpu = POWERPC_CPU(cs);
> @@ -269,15 +269,14 @@ static int spapr_cpu_core_realize_child(Object *child, 
> void *opaque)
>  object_property_set_bool(child, true, "realized", &local_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
> -return 1;
> +return;
>  }
>  
>  spapr_cpu_init(spapr, cpu, &local_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
> -return 1;
> +return;
>  }
> -return 0;
>  }
>  
>  static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
> @@ -287,13 +286,13 @@ static void spapr_cpu_core_realize(DeviceState *dev, 
> Error **errp)
>  const char *typename = object_class_get_name(sc->cpu_class);
>  size_t size = object_type_get_instance_size(typename);
>  Error *local_err = NULL;
> -Object *obj;
> -int i;
> +void *obj;
> +int i, j;
>  
>  sc->threads = g_malloc0(size * cc->nr_threads);
>  for (i = 0; i < cc->nr_threads; i++) {
>  char id[32];
> -void *obj = sc->threads + i * size;
> +obj = sc->threads + i * size;
>  
>  object_initialize(obj, size, typename);
>  snprintf(id, sizeof(id), "thread[%d]", i);
> @@ -303,12 +302,16 @@ static void spapr_cpu_core_realize(DeviceState *dev, 
> Error **errp)
>  }
>  object_unref(obj);
>  }
> -object_child_foreach(OBJECT(dev), spapr_cpu_core_realize_child, 
> &local_err);
> -if (local_err) {
> -goto err;
> -} else {
> -return;
> +
> +for (j = 0; j < cc->nr_threads; j++) {
> +obj = sc->threads + j * size;
> +
> +spapr_cpu_core_realize_child(obj, &local_err);
> +if (local_err) {
> +goto err;
> +}
>  }
> +return;
>  
>  err:
>  while (--i >= 0) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC PATCH V2 1/3] filter-rewriter: introduce filter-rewriter initialization

2016-07-03 Thread Jason Wang



On 2016年07月02日 14:22, Zhang Chen wrote:

Filter-rewriter is a part of COLO project.
It will rewrite some of secondary packet to make
secondary guest's connection established successfully.


Probably need to be more verbose here. E.g we only care about tcp and 
only rewrite ack now.




usage:

colo secondary:
-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
-object filter-rewriter,id=rew0,netdev=hn0,queue=all

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  net/Makefile.objs |   1 +
  net/filter-rewriter.c | 112 ++
  qemu-options.hx   |  10 +
  vl.c  |   3 +-
  4 files changed, 125 insertions(+), 1 deletion(-)
  create mode 100644 net/filter-rewriter.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index 119589f..645bd10 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -18,3 +18,4 @@ common-obj-y += filter-buffer.o
  common-obj-y += filter-mirror.o
  common-obj-y += colo-compare.o
  common-obj-y += colo-base.o
+common-obj-y += filter-rewriter.o
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
new file mode 100644
index 000..08b015d
--- /dev/null
+++ b/net/filter-rewriter.c
@@ -0,0 +1,112 @@
+/*
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * Author: Zhang Chen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "net/colo-base.h"
+#include "net/filter.h"
+#include "net/net.h"
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi-visit.h"
+#include "qom/object.h"
+#include "qemu/main-loop.h"
+#include "qemu/iov.h"
+#include "net/checksum.h"
+
+#define FILTER_COLO_REWRITER(obj) \
+OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
+
+#define TYPE_FILTER_REWRITER "filter-rewriter"
+
+enum {
+PRIMARY = 0,
+SECONDARY,
+};
+
+typedef struct RewriterState {
+NetFilterState parent_obj;
+/* connection list: the connections belonged to this NIC could be found
+ * in this list.
+ * element type: Connection
+ */
+GQueue conn_list;
+NetQueue *incoming_queue;
+/* to protect conn_list */
+QemuMutex conn_list_lock;
+/* hashtable to save connection */
+GHashTable *connection_track_table;
+/* to save unprocessed_connections */
+GQueue unprocessed_connections;
+/* current hash size */
+uint32_t hashtable_size;
+} RewriterState;
+
+static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
+ NetClientState *sender,
+ unsigned flags,
+ const struct iovec *iov,
+ int iovcnt,
+ NetPacketSent *sent_cb)
+{
+/*
+ * if we get tcp packet
+ * we will rewrite it to make secondary guest's
+ * connection established successfully
+ */
+return 0;
+}
+
+static void colo_rewriter_cleanup(NetFilterState *nf)
+{
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+
+qemu_mutex_destroy(&s->conn_list_lock);
+g_queue_free(&s->conn_list);
+}
+
+static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
+{
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+
+g_queue_init(&s->conn_list);
+qemu_mutex_init(&s->conn_list_lock);
+s->hashtable_size = 0;
+
+s->connection_track_table = g_hash_table_new_full(connection_key_hash,
+  connection_key_equal,
+  g_free,
+  connection_destroy);
+s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
+}
+
+static void colo_rewriter_class_init(ObjectClass *oc, void *data)
+{
+NetFilterClass *nfc = NETFILTER_CLASS(oc);
+
+nfc->setup = colo_rewriter_setup;
+nfc->cleanup = colo_rewriter_cleanup;
+nfc->receive_iov = colo_rewriter_receive_iov;
+}
+
+static const TypeInfo colo_rewriter_info = {
+.name = TYPE_FILTER_REWRITER,
+.parent = TYPE_NETFILTER,
+.class_init = colo_rewriter_class_init,
+.instance_size = sizeof(RewriterState),
+};
+
+static void register_types(void)
+{
+type_register_static(&colo_rewriter_info);
+}
+
+type_init(register_types);
diff --git a/qemu-options.hx b/qemu-options.hx
index 14bade5..d7ab165 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3859,6 +3859,16 @@ Create a filter-redirector we need to differ outdev id 
from indev id, id can not
  be the same. we can just use indev or outdev, but at least one of indev or 
outdev
  need to be specified.
  
+@item -object filter-rewr

Re: [Qemu-devel] [PATCH v2 0/2]vhost-user: Extend protocol to seek response for any command.

2016-07-03 Thread Prerna Saxena
Hi Marc-Andre,
Thank you for taking a look.





On 03/07/16 5:17 pm, "Marc-André Lureau"  wrote:

>Hi
>
>On Fri, Jul 1, 2016 at 11:46 AM, Prerna Saxena  wrote:
>> From: Prerna Saxena 
>>
>> The current vhost-user protocol requires the client to send responses to 
>> only a
>> few commands. For the remaining commands, it is impossible for QEMU to know 
>> the
>> status of the requested operation -- ie, did it succeed? If so, by what time?
>>
>> This is inconvenient, and can also lead to races. As an example:
>>
>> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net 
>> application).
>> Note that SET_MEM_TABLE does not require a reply according to the spec.
>> (2) Qemu commits the memory to the guest.
>> (3) Guest issues an I/O operation over a new memory region which was 
>> configured on (1).
>> (4) The application hasn't yet remapped the memory, but it sees the I/O 
>> request.
>> (5) The application cannot satisfy the request because it does not know 
>> about those GPAs.
>>
>> Note that the kernel implementation does not suffer from this limitation 
>> since messages are sent via an ioctl(). The ioctl() blocks until the backend 
>> (eg. vhost-net) completes the command and returns (with an error code).
>>
>> Changing the behaviour of current vhost-user commands would break existing 
>> applications.
>> To work around this race, Patch 1 adds a get_features command to be sent 
>> before returning from set_mem_table. While this is not a complete fix, it 
>> will help client applications that strictly process messages in order.
>>
>> The second patch introduces a protocol extension, 
>> VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to 
>> request a response to any message by setting the newly introduced 
>> "need_response" flag. The application must then respond to qemu by providing 
>> a status about the requested operation.
>>
>> Changelog:
>> -
>> Changes since v1:
>> Patch 1 : Ask for get_features before returning from set_mem_table(new).
>> Patch 2 : * Improve documentation.
>>   * Abstract out commonly used operations in the form of a function, 
>> process_message_response(). Also implement this only for SET_MEM_TABLE.
>>
>
>Overall, that looks good to me.
>
>Why do we have both "response" and "reply" which basically means the
>same thing, right? I would rather stick with "reply".

Allright, will rename this function to process_message_reply().

>
>I am not convinced the first patch is needed, imho it is a
>workaround/hack, the solution is given with the patch 2 only.

Great, I’ll post a v3 with just Patch2.

Regards,
Prerna

>
>> Prerna Saxena (2):
>>   vhost-user: Attempt to prevent a race on set_mem_table.
>>   vhost-user : Introduce a new feature VHOST_USER_PROTOCOL_F_REPLY_ACK.
>>
>>  docs/specs/vhost-user.txt |  40 
>>  hw/virtio/vhost-user.c| 157 
>> --
>>  2 files changed, 150 insertions(+), 47 deletions(-)
>>
>> --
>> 1.8.1.2
>>
>
>
>
>-- 
>Marc-André Lureau
>


Re: [Qemu-devel] [RFC PATCH V2 2/3] filter-rewriter: track connection and parse packet

2016-07-03 Thread Jason Wang



On 2016年07月02日 14:22, Zhang Chen wrote:

We use colo-base.h to track connection and parse packet

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  net/filter-rewriter.c | 52 +++
  1 file changed, 52 insertions(+)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 08b015d..c38ab24 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -50,6 +50,20 @@ typedef struct RewriterState {
  uint32_t hashtable_size;
  } RewriterState;
  
+/*

+ * Return 1 on success, if return 0 means the pkt
+ * is not TCP packet
+ */
+static int is_tcp_packet(Packet *pkt)
+{
+if (!parse_packet_early(pkt) &&
+pkt->ip->ip_p == IPPROTO_TCP) {
+return 1;
+} else {
+return 0;
+}
+}
+
  static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
   NetClientState *sender,
   unsigned flags,
@@ -57,11 +71,49 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
   int iovcnt,
   NetPacketSent *sent_cb)
  {
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+Connection *conn;
+ConnectionKey key = {{ 0 } };
+Packet *pkt;
+ssize_t size = iov_size(iov, iovcnt);
+char *buf = g_malloc0(size);
+
+iov_to_buf(iov, iovcnt, 0, buf, size);
+pkt = packet_new(buf, size);
+
  /*
   * if we get tcp packet
   * we will rewrite it to make secondary guest's
   * connection established successfully
   */
+if (is_tcp_packet(pkt)) {
+if (sender == nf->netdev) {
+fill_connection_key(pkt, &key, SECONDARY);
+} else {
+fill_connection_key(pkt, &key, PRIMARY);
+}
+
+conn = connection_get(s->connection_track_table,
+  &key,
+  &s->hashtable_size);
+if (!conn->processing) {
+qemu_mutex_lock(&s->conn_list_lock);
+g_queue_push_tail(&s->conn_list, conn);
+qemu_mutex_unlock(&s->conn_list_lock);


conn_list was never used in this series, and I fail to understand why 
conn_list is needed?



+conn->processing = true;
+}
+
+if (sender == nf->netdev) {
+/* NET_FILTER_DIRECTION_TX */
+/* handle_primary_tcp_pkt */
+} else {
+/* NET_FILTER_DIRECTION_RX */
+/* handle_secondary_tcp_pkt */
+}
+}
+
+packet_destroy(pkt, NULL);
+pkt = NULL;
  return 0;
  }
  





Re: [Qemu-devel] [RFC PATCH V2 3/3] filter-rewriter: rewrite tcp packet to keep secondary connection

2016-07-03 Thread Jason Wang



On 2016年07月02日 14:22, Zhang Chen wrote:

We will rewrite tcp packet secondary received and sent.
When colo guest is a tcp server.

Firstly, client start a tcp handshake. the packet's seq=client_seq,
ack=0,flag=SYN. COLO primary guest get this pkt and mirror(filter-mirror)
to secondary guest, secondary get it use filter-redirector.
Then,primary guest response pkt
(seq=primary_seq,ack=client_seq+1,flag=ACK|SYN).
secondary guest response pkt
(seq=secondary_seq,ack=client_seq+1,flag=ACK|SYN).
In here,we use filter-rewriter save the secondary_seq to it's tcp connection.
Finally handshake,client send pkt
(seq=client_seq+1,ack=primary_seq+1,flag=ACK).
Here,filter-rewriter can get primary_seq, and rewrite ack from primary_seq+1
to secondary_seq+1, recalculate checksum. So the secondary tcp connection
kept good.

When we send/recv packet.
client send pkt(seq=client_seq+1+data_len,ack=primary_seq+1,flag=ACK|PSH).
filter-rewriter rewrite ack and send to secondary guest.

primary guest response pkt
(seq=primary_seq+1,ack=client_seq+1+data_len,flag=ACK)
secondary guest response pkt
(seq=secondary_seq+1,ack=client_seq+1+data_len,flag=ACK)
we rewrite secondary guest seq from secondary_seq+1 to primary_seq+1.
So tcp connection kept good.

In code We use offset( = secondary_seq - primary_seq )
to rewrite seq or ack.
handle_primary_tcp_pkt: tcp_pkt->th_ack += offset;
handle_secondary_tcp_pkt: tcp_pkt->th_seq -= offset;

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  net/colo-base.h   |   2 +
  net/filter-rewriter.c | 110 +-
  trace-events  |   5 +++
  3 files changed, 115 insertions(+), 2 deletions(-)

diff --git a/net/colo-base.h b/net/colo-base.h
index 62460c5..7b32648 100644
--- a/net/colo-base.h
+++ b/net/colo-base.h
@@ -71,6 +71,8 @@ typedef struct Connection {
  uint8_t ip_proto;
  /* be used by filter-rewriter */
  colo_conn_state state;
+/* offset = secondary_seq - primary_seq */
+tcp_seq  offset;


Fail to find the definition of 'tcp_seq'.


  } Connection;
  
  uint32_t connection_key_hash(const void *opaque);

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index c38ab24..9f63c75 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -21,6 +21,7 @@
  #include "qemu/main-loop.h"
  #include "qemu/iov.h"
  #include "net/checksum.h"
+#include "trace.h"
  
  #define FILTER_COLO_REWRITER(obj) \

  OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
@@ -64,6 +65,91 @@ static int is_tcp_packet(Packet *pkt)
  }
  }
  
+/* handle tcp packet from primary guest */

+static int handle_primary_tcp_pkt(NetFilterState *nf,
+  Connection *conn,
+  Packet *pkt)
+{
+struct tcphdr *tcp_pkt;
+static int syn_flag;
+
+tcp_pkt = (struct tcphdr *)pkt->transport_layer;
+if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
+char *sdebug, *ddebug;
+sdebug = strdup(inet_ntoa(pkt->ip->ip_src));
+ddebug = strdup(inet_ntoa(pkt->ip->ip_dst));
+trace_colo_filter_rewriter_pkt_info(__func__, sdebug, ddebug,
+ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack),
+tcp_pkt->th_flags);
+trace_colo_filter_rewriter_conn_offset(conn->offset);
+g_free(sdebug);
+g_free(ddebug);
+}
+
+if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
+/*
+ * this flag update offset func run oncs


typo?


+ * in independent tcp connection
+ */
+syn_flag = 1;


Does this really work if you have more than one tcp connections? You 
probably need a conn->syn_flag.



+}
+
+if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) {
+if (syn_flag) {
+/* offset = secondary_seq - primary seq */
+conn->offset -= (ntohl(tcp_pkt->th_ack));
+syn_flag = 0;
+
+}
+/* handle packets to the secondary from the primary */
+tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset + 1);


Maybe I miss something, but why +1 here?




[Qemu-devel] [PATCH qemu v19 4/5] vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)

2016-07-03 Thread Alexey Kardashevskiy
New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
This adds ability to VFIO common code to dynamically allocate/remove
DMA windows in the host kernel when new VFIO container is added/removed.

This adds a helper to vfio_listener_region_add which makes
VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into
the host IOMMU list; the opposite action is taken in
vfio_listener_region_del.

When creating a new window, this uses heuristic to decide on the TCE table
levels number.

This should cause no guest visible change in behavior.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
Changes:
v19:
* fixed compiler warning

v18:
* moved trace definitions under hw/vfio/spapr.c section
* moved trace_vfio_spapr_remove_window to vfio_spapr_remove_window()
* vfio_host_win_del() now checks for exact window size
* one ctz() less in vfio_spapr_create_window()

v17:
* moved spapr window create/remove helpers to separate file
* added hw_error() if vfio_host_win_del() failed

v16:
* used memory_region_iommu_get_page_sizes() in vfio_listener_region_add()
* enforced no intersections between windows

v14:
* new to the series
---
 hw/vfio/common.c  | 80 +--
 hw/vfio/spapr.c   | 71 ++
 hw/vfio/trace-events  |  2 ++
 include/hw/vfio/vfio-common.h |  6 
 4 files changed, 149 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7c8a531..f3c0522 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -265,6 +265,21 @@ static void vfio_host_win_add(VFIOContainer *container,
 QLIST_INSERT_HEAD(&container->hostwin_list, hostwin, hostwin_next);
 }
 
+static int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova,
+ hwaddr max_iova)
+{
+VFIOHostDMAWindow *hostwin;
+
+QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+if (hostwin->min_iova == min_iova && hostwin->max_iova == max_iova) {
+QLIST_REMOVE(hostwin, hostwin_next);
+return 0;
+}
+}
+
+return -1;
+}
+
 static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
 return (!memory_region_is_ram(section->mr) &&
@@ -380,6 +395,31 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 end = int128_get64(int128_sub(llend, int128_one()));
 
+if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
+VFIOHostDMAWindow *hostwin;
+hwaddr pgsize = 0;
+
+/* For now intersections are not allowed, we may relax this later */
+QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+if (ranges_overlap(hostwin->min_iova,
+   hostwin->max_iova - hostwin->min_iova + 1,
+   section->offset_within_address_space,
+   int128_get64(section->size))) {
+ret = -1;
+goto fail;
+}
+}
+
+ret = vfio_spapr_create_window(container, section, &pgsize);
+if (ret) {
+goto fail;
+}
+
+vfio_host_win_add(container, section->offset_within_address_space,
+  section->offset_within_address_space +
+  int128_get64(section->size) - 1, pgsize);
+}
+
 hostwin_found = false;
 QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
 if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
@@ -523,6 +563,18 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
  "0x%"HWADDR_PRIx") = %d (%m)",
  container, iova, int128_get64(llsize), ret);
 }
+
+if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
+vfio_spapr_remove_window(container,
+ section->offset_within_address_space);
+if (vfio_host_win_del(container,
+  section->offset_within_address_space,
+  section->offset_within_address_space +
+  int128_get64(section->size) - 1) < 0) {
+hw_error("%s: Cannot delete missing window at %"HWADDR_PRIx,
+ __func__, section->offset_within_address_space);
+}
+}
 }
 
 static const MemoryListener vfio_memory_listener = {
@@ -961,11 +1013,6 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 }
 }
 
-/*
- * This only considers the host IOMMU's 32-bit window.  At
- * some point we need to add support for the optional 64-bit
- * window and dynamic windows
- */
 info.argsz = sizeof(info);
 ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
 if (ret) {
@@ -977,11 +1024,24 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
  

[Qemu-devel] [PATCH qemu v19 1/5] spapr_iommu: Realloc guest visible TCE table when starting/stopping listening

2016-07-03 Thread Alexey Kardashevskiy
The sPAPR TCE tables manage 2 copies when VFIO is using an IOMMU -
a guest view of the table and a hardware TCE table. If there is no VFIO
presense in the address space, then just the guest view is used, if
this is the case, it is allocated in the KVM. However since there is no
support yet for VFIO in KVM TCE hypercalls, when we start using VFIO,
we need to move the guest view from KVM to the userspace; and we need
to do this for every IOMMU on a bus with VFIO devices.

This implements the callbacks for the sPAPR IOMMU - notify_started()
reallocated the guest view to the user space, notify_stopped() does
the opposite.

This removes explicit spapr_tce_set_need_vfio() call from PCI hotplug
path as the new callbacks do this better - they notify IOMMU at
the exact moment when the configuration is changed, and this also
includes the case of PCI hot unplug.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
Acked-by: Alex Williamson 
---
Changes:
v18:
* split into 2 patches

v17:
* replaced IOMMU users counting with simple QLIST_EMPTY()
* renamed the callbacks
* removed requirement for region_del() to be called on 
memory_listener_unregister()

v16:
* added a use counter in VFIOAddressSpace->VFIOIOMMUMR

v15:
* s/need_vfio/vfio-Users/g
---
 hw/ppc/spapr_iommu.c | 12 
 hw/ppc/spapr_pci.c   |  6 --
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index e230bac..d57b05d 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -156,6 +156,16 @@ static uint64_t spapr_tce_get_min_page_size(MemoryRegion 
*iommu)
 return 1ULL << tcet->page_shift;
 }
 
+static void spapr_tce_notify_started(MemoryRegion *iommu)
+{
+spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), true);
+}
+
+static void spapr_tce_notify_stopped(MemoryRegion *iommu)
+{
+spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), false);
+}
+
 static int spapr_tce_table_post_load(void *opaque, int version_id)
 {
 sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
@@ -236,6 +246,8 @@ static const VMStateDescription vmstate_spapr_tce_table = {
 static MemoryRegionIOMMUOps spapr_iommu_ops = {
 .translate = spapr_tce_translate_iommu,
 .get_min_page_size = spapr_tce_get_min_page_size,
+.notify_started = spapr_tce_notify_started,
+.notify_stopped = spapr_tce_notify_stopped,
 };
 
 static int spapr_tce_table_realize(DeviceState *dev)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 8c1e6b1..cbb7cdd 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1087,12 +1087,6 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector 
*drc,
 void *fdt = NULL;
 int fdt_start_offset = 0, fdt_size;
 
-if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
-sPAPRTCETable *tcet = spapr_tce_find_by_liobn(phb->dma_liobn);
-
-spapr_tce_set_need_vfio(tcet, true);
-}
-
 fdt = create_device_tree(&fdt_size);
 fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
 if (!fdt_start_offset) {
-- 
2.5.0.rc3




[Qemu-devel] [PATCH qemu v19 5/5] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2016-07-03 Thread Alexey Kardashevskiy
This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)

The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.6 machine and older disable it.
This also creates a single DMA window for the older machines to
maintain backward migration.

This implements DDW for PHB with emulated and VFIO devices. The host
kernel support is required. The advertised IOMMU page sizes are 4K and
64K; 16M pages are supported but not advertised by default, in order to
enable them, the user has to specify "pgsz" property for PHB and
enable huge pages for RAM.

The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800... as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.

This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.

These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.

This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs
and updates all references to dma_liobn. However this does not add
64bit LIOBN to the migration stream as in fact even 32bit LIOBN is
rather pointless there (as it is a PHB property and the management
software can/should pass LIOBNs via CLI) but we keep it for the backward
migration support.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v19:
* updated commit log with a note about migration
* kept 32bit dma_liobn in the migration stream for backward migration

v18:
* fixed bug when ddw-create rtas call was always creating window at 1<<59
offset
* update minimum supported machine version
* s/dma64_window_addr/dma_win_addr/ to match dma_win_addr

v17:
* fixed: "query" did return non-page-shifted value when memory hotplug is 
enabled

v16:
* s/dma_liobn/dma_liobn[SPAPR_PCI_DMA_MAX_WINDOWS]/
* s/SPAPR_PCI_LIOBN()/dma_liobn[]/

v15:
* moved page mask filtering to PHB realize(), use "-mempath" to know
if there are huge pages
* fixed error reporting in RTAS handlers
* max window size accounts now hotpluggable memory boundaries
---
 hw/ppc/Makefile.objs|   1 +
 hw/ppc/spapr.c  |   7 +-
 hw/ppc/spapr_pci.c  |  75 ---
 hw/ppc/spapr_rtas_ddw.c | 295 
 hw/ppc/trace-events |   4 +
 include/hw/pci-host/spapr.h |   8 +-
 include/hw/ppc/spapr.h  |  16 ++-
 7 files changed, 385 insertions(+), 21 deletions(-)
 create mode 100644 hw/ppc/spapr_rtas_ddw.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 5cc6608..91a3420 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -8,6 +8,7 @@ obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
+obj-$(CONFIG_PSERIES) += spapr_rtas_ddw.o
 # PowerPC 4xx boards
 obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
 obj-y += ppc4xx_pci.o
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 78ebd9e..9c1c2c1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2489,7 +2489,12 @@ DEFINE_SPAPR_MACHINE(2_7, "2.7", true);
  * pseries-2.6
  */
 #define SPAPR_COMPAT_2_6 \
-HW_COMPAT_2_6
+HW_COMPAT_2_6 \
+{ \
+.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
+.property = "ddw",\
+.value= stringify(off),\
+},
 
 static void spapr_machine_2_6_instance_options(MachineState *machine)
 {
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index cbb7cdd..949c44f 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -35,6 +35,7 @@
 #include "hw/ppc/spapr.h"
 #include "hw/pci-host/spapr.h"
 #include "exec/address-spaces.h"
+#include "exec/ram_addr.h"
 #include 
 #include "trace.h"
 #include "qemu/error-report.h"
@@ -45,6 +46,7 @@
 #include "hw/ppc/spapr_drc.h"
 #include "sysemu/device_tree.h"
 #include "sysemu/kvm.h"
+#include "sysemu/hostmem.h"
 
 #include "hw/vfio/vfio.h"
 
@@ -1304,11 +1306,14 @@ static void spapr_phb_realize(DeviceState *dev, Error 
**errp)
 PCIBus *bus;
 uint64_t msi_window_size = 4096;
 sPAPRTCETable *tcet;
+const unsigned windows_supported =
+sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1;
 
 if (sphb->index != (uint32_t)-1) {
 hwaddr windows_base;
 
-if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn != (uint32_t)-1)
+if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn[0] != 
(uint32_t)-1)
+|| (sphb->dma_liobn[1] != (uint32

[Qemu-devel] [PATCH qemu v19 3/5] vfio: Add host side DMA window capabilities

2016-07-03 Thread Alexey Kardashevskiy
There are going to be multiple IOMMUs per a container. This moves
the single host IOMMU parameter set to a list of VFIOHostDMAWindow.

This should cause no behavioral change and will be used later by
the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
Changes:
v18:
* vfio_host_win_add() checks for non-overlapping windows instead of calling
vfio_host_win_lookup() which checks for inclusion
* inlined vfio_host_win_lookup() as I ended up using it just once
* put VFIOHostDMAWindow::max_iova in new line in include/hw/vfio/vfio-common.h

v17:
* vfio_host_win_add() uses vfio_host_win_lookup() for overlap check and
aborts if any found instead of returning an error (as recovery is not
possible anyway)
* hw_error() when overlapped iommu is detected

v16:
* adjusted commit log with changes from v15

v15:
* s/vfio_host_iommu_add/vfio_host_win_add/
* s/VFIOHostIOMMU/VFIOHostDMAWindow/
---
 hw/vfio/common.c  | 60 +++
 include/hw/vfio/vfio-common.h | 10 ++--
 2 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 46381e6..7c8a531 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -28,6 +28,7 @@
 #include "exec/memory.h"
 #include "hw/hw.h"
 #include "qemu/error-report.h"
+#include "qemu/range.h"
 #include "sysemu/kvm.h"
 #ifdef CONFIG_KVM
 #include "linux/kvm.h"
@@ -241,6 +242,29 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 return -errno;
 }
 
+static void vfio_host_win_add(VFIOContainer *container,
+  hwaddr min_iova, hwaddr max_iova,
+  uint64_t iova_pgsizes)
+{
+VFIOHostDMAWindow *hostwin;
+
+QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+if (ranges_overlap(hostwin->min_iova,
+   hostwin->max_iova - hostwin->min_iova + 1,
+   min_iova,
+   max_iova - min_iova + 1)) {
+hw_error("%s: Overlapped IOMMU are not enabled", __func__);
+}
+}
+
+hostwin = g_malloc0(sizeof(*hostwin));
+
+hostwin->min_iova = min_iova;
+hostwin->max_iova = max_iova;
+hostwin->iova_pgsizes = iova_pgsizes;
+QLIST_INSERT_HEAD(&container->hostwin_list, hostwin, hostwin_next);
+}
+
 static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
 return (!memory_region_is_ram(section->mr) &&
@@ -329,6 +353,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 Int128 llend, llsize;
 void *vaddr;
 int ret;
+VFIOHostDMAWindow *hostwin;
+bool hostwin_found;
 
 if (vfio_listener_skipped_section(section)) {
 trace_vfio_listener_region_add_skip(
@@ -354,7 +380,15 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 end = int128_get64(int128_sub(llend, int128_one()));
 
-if ((iova < container->min_iova) || (end > container->max_iova)) {
+hostwin_found = false;
+QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
+hostwin_found = true;
+break;
+}
+}
+
+if (!hostwin_found) {
 error_report("vfio: IOMMU container %p can't map guest IOVA region"
  " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
  container, iova, end);
@@ -369,10 +403,6 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 
 trace_vfio_listener_region_add_iommu(iova, end);
 /*
- * FIXME: We should do some checking to see if the
- * capabilities of the host VFIO IOMMU are adequate to model
- * the guest IOMMU
- *
  * FIXME: For VFIO iommu types which have KVM acceleration to
  * avoid bouncing all map/unmaps through qemu this way, this
  * would be the right place to wire that up (tell the KVM
@@ -879,17 +909,14 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
  * existing Type1 IOMMUs generally support any IOVA we're
  * going to actually try in practice.
  */
-container->min_iova = 0;
-container->max_iova = (hwaddr)-1;
-
-/* Assume just 4K IOVA page size */
-container->iova_pgsizes = 0x1000;
 info.argsz = sizeof(info);
 ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
 /* Ignore errors */
-if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
-container->iova_pgsizes = info.iova_pgsizes;
+if (ret || !(info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
+/* Assume 4k IOVA page size */
+info.iova_pgsizes = 4096;
 }
+vfio_host_win_add(container, 0, (hwaddr)-1, info.iova_pgsizes);
 } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
ioc

[Qemu-devel] [PATCH qemu v19 0/5] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2016-07-03 Thread Alexey Kardashevskiy
Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
where devices are allowed to do DMA. These ranges are called DMA windows.
By default, there is a single DMA window, 1 or 2GB big, mapped at zero
on a PCI bus.

PAPR defines a DDW RTAS API which allows pseries guests
querying the hypervisor about DDW support and capabilities (page size mask
for now). A pseries guest may request an additional (to the default)
DMA windows using this RTAS API.
The existing pseries Linux guests request an additional window as big as
the guest RAM and map the entire guest window which effectively creates
direct mapping of the guest memory to a PCI bus.

This patchset reworks PPC64 IOMMU code and adds necessary structures
to support big windows on pseries.

This patchset is based on today's upstream sha1 9a48e36 and was tested
with the today's upstream kernel sha1 0b295dd.

Please comment. Thanks!


Alexey Kardashevskiy (5):
  spapr_iommu: Realloc guest visible TCE table when starting/stopping
listening
  vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)
  vfio: Add host side DMA window capabilities
  vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)
  spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

 hw/ppc/Makefile.objs  |   1 +
 hw/ppc/spapr.c|   7 +-
 hw/ppc/spapr_iommu.c  |  12 ++
 hw/ppc/spapr_pci.c|  81 
 hw/ppc/spapr_rtas_ddw.c   | 295 ++
 hw/ppc/trace-events   |   4 +
 hw/vfio/Makefile.objs |   1 +
 hw/vfio/common.c  | 170 +++-
 hw/vfio/spapr.c   | 210 ++
 hw/vfio/trace-events  |   8 ++
 include/hw/pci-host/spapr.h   |   8 +-
 include/hw/ppc/spapr.h|  16 ++-
 include/hw/vfio/vfio-common.h |  20 ++-
 13 files changed, 774 insertions(+), 59 deletions(-)
 create mode 100644 hw/ppc/spapr_rtas_ddw.c
 create mode 100644 hw/vfio/spapr.c

-- 
2.5.0.rc3




[Qemu-devel] [PATCH qemu v19 2/5] vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)

2016-07-03 Thread Alexey Kardashevskiy
This makes use of the new "memory registering" feature. The idea is
to provide the userspace ability to notify the host kernel about pages
which are going to be used for DMA. Having this information, the host
kernel can pin them all once per user process, do locked pages
accounting (once) and not spent time on doing that in real time with
possible failures which cannot be handled nicely in some cases.

This adds a prereg memory listener which listens on address_space_memory
and notifies a VFIO container about memory which needs to be
pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped.

The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
not call it when v2 is detected and enabled.

This enforces guest RAM blocks to be host page size aligned; however
this is not new as KVM already requires memory slots to be host page
size aligned.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v19:
* moved and reduced trace point messages
* fixed wrong goto to listener_release_exit if VFIO_IOMMU_SPAPR_TCE_GET_INFO
failed

v18:
* made a copy of listener trace points in spapr.c
* fixed cleanup in vfio_connect_container
* removed assert in vfio_prereg_listener_region_add()
* created "prereg" copy of traces

v17:
* s/prereg\.c/spapr.c/
* s/vfio_prereg_gpa_to_ua/vfio_prereg_gpa_to_vaddr/
* vfio_prereg_listener_skipped_section does hw_error() on IOMMUs

v16:
* switched to 64bit math everywhere as there is no chance to see
region_add on RAM blocks even remotely close to 1<<64bytes.

v15:
* banned unaligned sections
* added an vfio_prereg_gpa_to_ua() helper

v14:
* s/free_container_exit/listener_release_exit/g
* added "if memory_region_is_iommu()" to vfio_prereg_listener_skipped_section
---
 hw/vfio/Makefile.objs |   1 +
 hw/vfio/common.c  |  40 +---
 hw/vfio/spapr.c   | 139 ++
 hw/vfio/trace-events  |   6 ++
 include/hw/vfio/vfio-common.h |   4 ++
 5 files changed, 181 insertions(+), 9 deletions(-)
 create mode 100644 hw/vfio/spapr.c

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index ceddbb8..c25e32b 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -4,4 +4,5 @@ obj-$(CONFIG_PCI) += pci.o pci-quirks.o
 obj-$(CONFIG_SOFTMMU) += platform.o
 obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
 obj-$(CONFIG_SOFTMMU) += amd-xgbe.o
+obj-$(CONFIG_SOFTMMU) += spapr.o
 endif
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7be638e..46381e6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -503,6 +503,9 @@ static const MemoryListener vfio_memory_listener = {
 static void vfio_listener_release(VFIOContainer *container)
 {
 memory_listener_unregister(&container->listener);
+if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
+memory_listener_unregister(&container->prereg_listener);
+}
 }
 
 static struct vfio_info_cap_header *
@@ -861,8 +864,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 goto free_container_exit;
 }
 
-ret = ioctl(fd, VFIO_SET_IOMMU,
-v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU);
+container->iommu_type = v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU;
+ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
 if (ret) {
 error_report("vfio: failed to set iommu for container: %m");
 ret = -errno;
@@ -887,8 +890,10 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
 container->iova_pgsizes = info.iova_pgsizes;
 }
-} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
+   ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
 struct vfio_iommu_spapr_tce_info info;
+bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU);
 
 ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
 if (ret) {
@@ -896,7 +901,9 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 ret = -errno;
 goto free_container_exit;
 }
-ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
+container->iommu_type =
+v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
+ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
 if (ret) {
 error_report("vfio: failed to set iommu for container: %m");
 ret = -errno;
@@ -908,11 +915,23 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
  * when container fd is closed so we do not call it explicitly
  * in this file.
  */
-ret = ioctl(fd, VFIO_IOMMU_ENABLE);
-if (ret) {
-error_report("vf

Re: [Qemu-devel] [PATCH v3 3/5] numa: reduce code duplication by adding helper numa_get_node_for_cpu()

2016-07-03 Thread David Gibson
On Fri, Jul 01, 2016 at 01:50:24PM +0200, Igor Mammedov wrote:
> Replace repeated pattern
> 
> for (i = 0; i < nb_numa_nodes; i++) {
> if (test_bit(idx, numa_info[i].node_cpu)) {
>...
>break;
> 
> with a helper function to lookup numa node index for cpu.
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Igor Mammedov 

Reviewed-by: David Gibson 

> ---
>  hw/arm/virt-acpi-build.c |  6 ++
>  hw/arm/virt.c|  7 +++
>  hw/i386/acpi-build.c |  7 ++-
>  hw/i386/pc.c |  8 +++-
>  hw/ppc/spapr_cpu_core.c  |  6 ++
>  include/sysemu/numa.h|  3 +++
>  numa.c   | 12 
>  7 files changed, 27 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 28fc59c..5923b3d 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -426,11 +426,9 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
> VirtGuestInfo *guest_info)
>  uint32_t *cpu_node = g_malloc0(guest_info->smp_cpus * sizeof(uint32_t));
>  
>  for (i = 0; i < guest_info->smp_cpus; i++) {
> -for (j = 0; j < nb_numa_nodes; j++) {
> -if (test_bit(i, numa_info[j].node_cpu)) {
> +j = numa_get_node_for_cpu(i);
> +if (j < nb_numa_nodes) {
>  cpu_node[i] = j;
> -break;
> -}
>  }
>  }
>  
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index c5c125e..b066f15 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -411,10 +411,9 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
>armcpu->mp_affinity);
>  }
>  
> -for (i = 0; i < nb_numa_nodes; i++) {
> -if (test_bit(cpu, numa_info[i].node_cpu)) {
> -qemu_fdt_setprop_cell(vbi->fdt, nodename, "numa-node-id", i);
> -}
> +i = numa_get_node_for_cpu(cpu);
> +if (i < nb_numa_nodes) {
> +qemu_fdt_setprop_cell(vbi->fdt, nodename, "numa-node-id", i);
>  }
>  
>  g_free(nodename);
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 5a594be..60be550 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2344,18 +2344,15 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
> MachineState *machine)
>  srat->reserved1 = cpu_to_le32(1);
>  
>  for (i = 0; i < apic_ids->len; i++) {
> -int j;
> +int j = numa_get_node_for_cpu(i);
>  int apic_id = apic_ids->cpus[i].arch_id;
>  
>  core = acpi_data_push(table_data, sizeof *core);
>  core->type = ACPI_SRAT_PROCESSOR_APIC;
>  core->length = sizeof(*core);
>  core->local_apic_id = apic_id;
> -for (j = 0; j < nb_numa_nodes; j++) {
> -if (test_bit(i, numa_info[j].node_cpu)) {
> +if (j < nb_numa_nodes) {
>  core->proximity_lo = j;
> -break;
> -}
>  }
>  memset(core->proximity_hi, 0, 3);
>  core->local_sapic_eid = 0;
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 44a8f3b..fef34e7 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -780,11 +780,9 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, 
> PCMachineState *pcms)
>  for (i = 0; i < max_cpus; i++) {
>  unsigned int apic_id = x86_cpu_apic_id_from_index(i);
>  assert(apic_id < pcms->apic_id_limit);
> -for (j = 0; j < nb_numa_nodes; j++) {
> -if (test_bit(i, numa_info[j].node_cpu)) {
> -numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
> -break;
> -}
> +j = numa_get_node_for_cpu(i);
> +if (j < nb_numa_nodes) {
> +numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
>  }
>  }
>  for (i = 0; i < nb_numa_nodes; i++) {
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 3a5da09..030016c 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -69,11 +69,9 @@ void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU 
> *cpu, Error **errp)
>  }
>  
>  /* Set NUMA node for the added CPUs  */
> -for (i = 0; i < nb_numa_nodes; i++) {
> -if (test_bit(cs->cpu_index, numa_info[i].node_cpu)) {
> +i = numa_get_node_for_cpu(cs->cpu_index);
> +if (i < nb_numa_nodes) {
>  cs->numa_node = i;
> -break;
> -}
>  }
>  
>  xics_cpu_setup(spapr->icp, cpu);
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index bb184c9..4da808a 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -32,4 +32,7 @@ void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, 
> uint32_t node);
>  void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
>  uint32_t numa_get_node(ram_addr_t addr, Error **errp);
>  
> +/* on success returns node index in numa_info,
> + * on failure returns nb_numa_nodes */

Re: [Qemu-devel] [PATCH v2 4/7] ppc: open code cpu creation for machine types

2016-07-03 Thread David Gibson
On Sat, Jul 02, 2016 at 10:33:33AM +0200, Greg Kurz wrote:
> On Sat, 2 Jul 2016 13:36:22 +0530
> Bharata B Rao  wrote:
> 
> > On Sat, Jul 02, 2016 at 12:41:48AM +0200, Greg Kurz wrote:
> > > If we want to generate cpu_dt_id in the machine code, this must occur
> > > before the cpu gets realized. We must open code the cpu creation to be
> > > able to do this.
> > > 
> > > This patch just does that. It borrows some lines from previous work
> > > from Bharata to handle the feature parsing.
> > > 
> > > Signed-off-by: Greg Kurz 
> > > ---
> > >  hw/ppc/ppc.c |   39 ++-
> > >  1 file changed, 38 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> > > index dc3d214009c5..57f4ddd073d0 100644
> > > --- a/hw/ppc/ppc.c
> > > +++ b/hw/ppc/ppc.c
> > > @@ -32,6 +32,7 @@
> > >  #include "sysemu/cpus.h"
> > >  #include "hw/timer/m48t59.h"
> > >  #include "qemu/log.h"
> > > +#include "qapi/error.h"
> > >  #include "qemu/error-report.h"
> > >  #include "hw/loader.h"
> > >  #include "sysemu/kvm.h"
> > > @@ -1353,5 +1354,41 @@ PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id)
> > > 
> > >  PowerPCCPU *ppc_cpu_init(const char *cpu_model)
> > >  {
> > > -return POWERPC_CPU(cpu_generic_init(TYPE_POWERPC_CPU, cpu_model));
> > > +PowerPCCPU *cpu;
> > > +CPUClass *cc;
> > > +ObjectClass *oc;
> > > +gchar **model_pieces;
> > > +Error *err = NULL;
> > > +
> > > +model_pieces = g_strsplit(cpu_model, ",", 2);
> > > +if (!model_pieces[0]) {
> > > +error_report("Invalid/empty CPU model name");
> > > +return NULL;
> > > +}
> > > +
> > > +oc = cpu_class_by_name(TYPE_POWERPC_CPU, model_pieces[0]);
> > > +if (oc == NULL) {
> > > +error_report("Unable to find CPU definition: %s", 
> > > model_pieces[0]);
> > > +return NULL;
> > > +}
> > > +
> > > +cpu = POWERPC_CPU(object_new(object_class_get_name(oc)));
> > > +
> > > +cc = CPU_CLASS(oc);
> > > +cc->parse_features(CPU(cpu), model_pieces[1], &err);  
> > 
> > Igor is working on a patchset to convert -cpu features into global 
> > properties.
> > IIUC, after that patchset, it is not recommended to parse the -cpu features
> > for every CPU but do it only once.
> > 
> 
> cpu_generic_init() in the current code also does the parsing, and as the title
> says, this patch is just about open coding the creation. I don't want to
> change behavior yet.
> 
> But yes, I agree that we should only parse features once and I'll be more than
> happy to fix this in a followup patch, based on Igor's work.
> 
> In the meantime, maybe I can add a comment stating that the parsing should go
> away ?

Right.  But the thing is by open coding here, you're making two copies
that need to be fixed instead of one, which increases the chances of
error.

It seems like it would be safer to change the generic code so there's
a new generic function which doesn't do the realize which we can use
on ppc (and other platforms when/if they need it).

Doing the change just on ppc by making our own copy of
cpu_generic_init() seems more like to lead to future mistakes.

> > That is what I attempted here in the context of supporting compat cpu type
> > for pseries-2.7:
> > 
> > https://www.mail-archive.com/qemu-devel@nongnu.org/msg381660.html
> > 
> 
> Yeah and this is where I borrowed some lines. :)
> 
> > Regards,
> > Bharata.
> > 
> 
> Cheers.
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 2/7] ppc: simplify max_smt initialization in ppc_cpu_realizefn()

2016-07-03 Thread David Gibson
On Sat, Jul 02, 2016 at 12:41:32AM +0200, Greg Kurz wrote:
> kvmppc_smt_threads() returns 1 if KVM is not enabled.
> 
> Signed-off-by: Greg Kurz 

Applied to ppc-for-2.7

> ---
>  target-ppc/translate_init.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index 843f19b748fb..a06bf50b65d4 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -9516,7 +9516,7 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>  Error *local_err = NULL;
>  #if !defined(CONFIG_USER_ONLY)
> -int max_smt = kvm_enabled() ? kvmppc_smt_threads() : 1;
> +int max_smt = kvmppc_smt_threads();
>  #endif
>  
>  #if !defined(CONFIG_USER_ONLY)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [V12 3/4] hw/i386: Introduce AMD IOMMU

2016-07-03 Thread David Kiarie
On Wed, Jun 22, 2016 at 11:24 PM, Jan Kiszka  wrote:
> On 2016-06-15 14:21, David Kiarie wrote:
>> +
>> +/* System Software might never read from some of this fields but anyways */
>
> No read-modify-write accesses observed in the field? And fields like
> AMDVI_MMIO_STATUS or AMDVI_MMIO_EXT_FEATURES sound a lot like they are
> rather about reading than writing. Misleading comment?

Yeah, misleading comment. AMDVI_MMIO_EXT_FEATURES is read only while
some AMDVI_MMIO_STATUS is r/w1c and yes, I'm enforcing that in the
code.

>
>> +static uint64_t amdvi_mmio_read(void *opaque, hwaddr addr, unsigned size)
>> +{
>> +AMDVIState *s = opaque;
>> +
>> +uint64_t val = -1;
>> +if (addr + size > AMDVI_MMIO_SIZE) {
>> +trace_amdvi_mmio_read("error: addr outside region: max ",
>> +(uint64_t)AMDVI_MMIO_SIZE, addr, size);
>> +return (uint64_t)-1;
>> +}
>> +
>> +if (size == 2) {
>> +val = amdvi_readw(s, addr);
>> +} else if (size == 4) {
>> +val = amdvi_readl(s, addr);
>> +} else if (size == 8) {
>> +val = amdvi_readq(s, addr);
>> +}
>> +
>> +switch (addr & ~0x07) {
>> +case AMDVI_MMIO_DEVICE_TABLE:
>> +trace_amdvi_mmio_read("MMIO_DEVICE_TABLE", addr, size, addr & 
>> ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_COMMAND_BASE:
>> +trace_amdvi_mmio_read("MMIO_COMMAND_BASE", addr, size, addr & 
>> ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_EVENT_BASE:
>> +trace_amdvi_mmio_read("MMIO_EVENT_BASE", addr, size, addr & ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_CONTROL:
>> +trace_amdvi_mmio_read("MMIO_MMIO_CONTROL", addr, size, addr & 
>> ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_EXCL_BASE:
>> +trace_amdvi_mmio_read("MMIO_EXCL_BASE", addr, size, addr & ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_EXCL_LIMIT:
>> +trace_amdvi_mmio_read("MMIO_EXCL_LIMIT", addr, size, addr & ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_COMMAND_HEAD:
>> +trace_amdvi_mmio_read("MMIO_COMMAND_HEAD", addr, size, addr & 
>> ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_COMMAND_TAIL:
>> +trace_amdvi_mmio_read("MMIO_COMMAND_TAIL", addr, size, addr & 
>> ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_EVENT_HEAD:
>> +trace_amdvi_mmio_read("MMIO_EVENT_HEAD", addr, size, addr & ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_EVENT_TAIL:
>> +trace_amdvi_mmio_read("MMIO_EVENT_TAIL", addr, size, addr & ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_STATUS:
>> +trace_amdvi_mmio_read("MMIO_STATUS", addr, size, addr & ~0x07);
>> +break;
>> +
>> +case AMDVI_MMIO_EXT_FEATURES:
>> +trace_amdvi_mmio_read("MMIO_EXT_FEATURES", addr, size, addr & 
>> ~0x07);
>> +break;
>
> What about a lookup table for that name?

I can't find an obvious way to index a table given the register address.

>
>> +
>> +default:
>> +trace_amdvi_mmio_read("UNHANDLED READ", addr, size, addr & ~0x07);
>> +}
>> +return val;
>> +}
>> +
>> +static void amdvi_handle_control_write(AMDVIState *s)
>> +{
>> +/*
>> + * read whatever is already written in case
>> + * software is writing in chucks less than 8 bytes
>> + */
>> +unsigned long control = amdvi_readq(s, AMDVI_MMIO_CONTROL);
>> +s->enabled = !!(control & AMDVI_MMIO_CONTROL_AMDVIEN);
>> +
>> +s->ats_enabled = !!(control & AMDVI_MMIO_CONTROL_HTTUNEN);
>> +s->evtlog_enabled = s->enabled && !!(control &
>> +AMDVI_MMIO_CONTROL_EVENTLOGEN);
>> +
>> +s->evtlog_intr = !!(control & AMDVI_MMIO_CONTROL_EVENTINTEN);
>> +s->completion_wait_intr = !!(control & AMDVI_MMIO_CONTROL_COMWAITINTEN);
>> +s->cmdbuf_enabled = s->enabled && !!(control &
>> +AMDVI_MMIO_CONTROL_CMDBUFLEN);
>> +
>> +/* update the flags depending on the control register */
>> +if (s->cmdbuf_enabled) {
>> +amdvi_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_CMDBUF_RUN);
>> +} else {
>> +amdvi_and_assignq(s, AMDVI_MMIO_STATUS, 
>> ~AMDVI_MMIO_STATUS_CMDBUF_RUN);
>> +}
>> +if (s->evtlog_enabled) {
>> +amdvi_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVT_RUN);
>> +} else {
>> +amdvi_and_assignq(s, AMDVI_MMIO_STATUS, ~AMDVI_MMIO_STATUS_EVT_RUN);
>> +}
>> +
>> +trace_amdvi_control_status(control);
>> +
>> +amdvi_cmdbuf_run(s);
>> +}
>> +
>> +static inline void amdvi_handle_devtab_write(AMDVIState *s)
>> +
>> +{
>> +uint64_t val = amdvi_readq(s, AMDVI_MMIO_DEVICE_TABLE);
>> +s->devtab = (val & AMDVI_MMIO_DEVTAB_BASE_MASK);
>> +
>> +/* set device table length */
>> +s->devtab_len = ((val & AMDVI_MMIO_DEVTAB_SIZE_MASK) + 1 *
>> +(AMDVI_MMIO_DEVTAB_SIZE_UNIT /
>> + AMDVI_MMIO_DEVTAB_ENTRY_SIZE));
>> +}
>> +
>> +static inline void amdvi_handle_cmdhead_write(AMDV

[Qemu-devel] [PATCH] ppc: Fix xsrdpi, xvrdpi and xvrspi rounding

2016-07-03 Thread Anton Blanchard
From: Anton Blanchard 

xsrdpi, xvrdpi and xvrspi use the round ties away method, not round
nearest even.

Signed-off-by: Anton Blanchard 
---
 target-ppc/fpu_helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index 4ef893b..d9795d0 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -2689,19 +2689,19 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) 
   \
 helper_float_check_status(env);\
 }
 
-VSX_ROUND(xsrdpi, 1, float64, VsrD(0), float_round_nearest_even, 1)
+VSX_ROUND(xsrdpi, 1, float64, VsrD(0), float_round_ties_away, 1)
 VSX_ROUND(xsrdpic, 1, float64, VsrD(0), FLOAT_ROUND_CURRENT, 1)
 VSX_ROUND(xsrdpim, 1, float64, VsrD(0), float_round_down, 1)
 VSX_ROUND(xsrdpip, 1, float64, VsrD(0), float_round_up, 1)
 VSX_ROUND(xsrdpiz, 1, float64, VsrD(0), float_round_to_zero, 1)
 
-VSX_ROUND(xvrdpi, 2, float64, VsrD(i), float_round_nearest_even, 0)
+VSX_ROUND(xvrdpi, 2, float64, VsrD(i), float_round_ties_away, 0)
 VSX_ROUND(xvrdpic, 2, float64, VsrD(i), FLOAT_ROUND_CURRENT, 0)
 VSX_ROUND(xvrdpim, 2, float64, VsrD(i), float_round_down, 0)
 VSX_ROUND(xvrdpip, 2, float64, VsrD(i), float_round_up, 0)
 VSX_ROUND(xvrdpiz, 2, float64, VsrD(i), float_round_to_zero, 0)
 
-VSX_ROUND(xvrspi, 4, float32, VsrW(i), float_round_nearest_even, 0)
+VSX_ROUND(xvrspi, 4, float32, VsrW(i), float_round_ties_away, 0)
 VSX_ROUND(xvrspic, 4, float32, VsrW(i), FLOAT_ROUND_CURRENT, 0)
 VSX_ROUND(xvrspim, 4, float32, VsrW(i), float_round_down, 0)
 VSX_ROUND(xvrspip, 4, float32, VsrW(i), float_round_up, 0)
-- 
2.7.4




Re: [Qemu-devel] [PATCH 1/4] ppc: simplify ppc_hash64_hpte_page_shift_noslb()

2016-07-03 Thread David Gibson
On Mon, Jul 04, 2016 at 02:46:51PM +1000, David Gibson wrote:
> On Fri, Jul 01, 2016 at 09:10:10AM +0200, Cédric Le Goater wrote:
> > The segment page shift parameter is never used. Let's remove it.
> 
> I think I did have a use case for this in mind when I made it, but I
> can't remember what it was now.  Oh well, we can always add it back
> when I remember.  I'll apply this to ppc-for-2.7.

Actually.. no I won't.  There are some problems in the later patches
in this series, and to fix this correctly we're going to need that
slb_pshift return value after all.

> 
> > 
> > Signed-off-by: Cédric Le Goater 
> > ---
> >  hw/ppc/spapr_hcall.c| 4 ++--
> >  target-ppc/mmu-hash64.c | 6 +-
> >  target-ppc/mmu-hash64.h | 3 +--
> >  3 files changed, 4 insertions(+), 9 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > index e011ed4b664b..73af112e1d36 100644
> > --- a/hw/ppc/spapr_hcall.c
> > +++ b/hw/ppc/spapr_hcall.c
> > @@ -83,12 +83,12 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
> > sPAPRMachineState *spapr,
> >  target_ulong pte_index = args[1];
> >  target_ulong pteh = args[2];
> >  target_ulong ptel = args[3];
> > -unsigned apshift, spshift;
> > +unsigned apshift;
> >  target_ulong raddr;
> >  target_ulong index;
> >  uint64_t token;
> >  
> > -apshift = ppc_hash64_hpte_page_shift_noslb(cpu, pteh, ptel, &spshift);
> > +apshift = ppc_hash64_hpte_page_shift_noslb(cpu, pteh, ptel);
> >  if (!apshift) {
> >  /* Bad page size encoding */
> >  return H_PARAMETER;
> > diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
> > index fa26ad2e875b..7d056c1e3b4a 100644
> > --- a/target-ppc/mmu-hash64.c
> > +++ b/target-ppc/mmu-hash64.c
> > @@ -610,14 +610,12 @@ static unsigned hpte_page_shift(const struct 
> > ppc_one_seg_page_size *sps,
> >  }
> >  
> >  unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
> > -  uint64_t pte0, uint64_t pte1,
> > -  unsigned *seg_page_shift)
> > +  uint64_t pte0, uint64_t pte1)
> >  {
> >  CPUPPCState *env = &cpu->env;
> >  int i;
> >  
> >  if (!(pte0 & HPTE64_V_LARGE)) {
> > -*seg_page_shift = 12;
> >  return 12;
> >  }
> >  
> > @@ -635,12 +633,10 @@ unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU 
> > *cpu,
> >  
> >  shift = hpte_page_shift(sps, pte0, pte1);
> >  if (shift) {
> > -*seg_page_shift = sps->page_shift;
> >  return shift;
> >  }
> >  }
> >  
> > -*seg_page_shift = 0;
> >  return 0;
> >  }
> >  
> > diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
> > index 13ad060cfefb..f625de03da44 100644
> > --- a/target-ppc/mmu-hash64.h
> > +++ b/target-ppc/mmu-hash64.h
> > @@ -17,8 +17,7 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
> > target_ulong pte_index,
> > target_ulong pte0, target_ulong pte1);
> >  unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
> > -  uint64_t pte0, uint64_t pte1,
> > -  unsigned *seg_page_shift);
> > +  uint64_t pte0, uint64_t pte1);
> >  #endif
> >  
> >  /*
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 1/4] ppc: simplify ppc_hash64_hpte_page_shift_noslb()

2016-07-03 Thread David Gibson
On Fri, Jul 01, 2016 at 09:10:10AM +0200, Cédric Le Goater wrote:
> The segment page shift parameter is never used. Let's remove it.

I think I did have a use case for this in mind when I made it, but I
can't remember what it was now.  Oh well, we can always add it back
when I remember.  I'll apply this to ppc-for-2.7.

> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/ppc/spapr_hcall.c| 4 ++--
>  target-ppc/mmu-hash64.c | 6 +-
>  target-ppc/mmu-hash64.h | 3 +--
>  3 files changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index e011ed4b664b..73af112e1d36 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -83,12 +83,12 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  target_ulong pte_index = args[1];
>  target_ulong pteh = args[2];
>  target_ulong ptel = args[3];
> -unsigned apshift, spshift;
> +unsigned apshift;
>  target_ulong raddr;
>  target_ulong index;
>  uint64_t token;
>  
> -apshift = ppc_hash64_hpte_page_shift_noslb(cpu, pteh, ptel, &spshift);
> +apshift = ppc_hash64_hpte_page_shift_noslb(cpu, pteh, ptel);
>  if (!apshift) {
>  /* Bad page size encoding */
>  return H_PARAMETER;
> diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
> index fa26ad2e875b..7d056c1e3b4a 100644
> --- a/target-ppc/mmu-hash64.c
> +++ b/target-ppc/mmu-hash64.c
> @@ -610,14 +610,12 @@ static unsigned hpte_page_shift(const struct 
> ppc_one_seg_page_size *sps,
>  }
>  
>  unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
> -  uint64_t pte0, uint64_t pte1,
> -  unsigned *seg_page_shift)
> +  uint64_t pte0, uint64_t pte1)
>  {
>  CPUPPCState *env = &cpu->env;
>  int i;
>  
>  if (!(pte0 & HPTE64_V_LARGE)) {
> -*seg_page_shift = 12;
>  return 12;
>  }
>  
> @@ -635,12 +633,10 @@ unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU 
> *cpu,
>  
>  shift = hpte_page_shift(sps, pte0, pte1);
>  if (shift) {
> -*seg_page_shift = sps->page_shift;
>  return shift;
>  }
>  }
>  
> -*seg_page_shift = 0;
>  return 0;
>  }
>  
> diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
> index 13ad060cfefb..f625de03da44 100644
> --- a/target-ppc/mmu-hash64.h
> +++ b/target-ppc/mmu-hash64.h
> @@ -17,8 +17,7 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
> target_ulong pte_index,
> target_ulong pte0, target_ulong pte1);
>  unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
> -  uint64_t pte0, uint64_t pte1,
> -  unsigned *seg_page_shift);
> +  uint64_t pte0, uint64_t pte1);
>  #endif
>  
>  /*

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [V12 3/4] hw/i386: Introduce AMD IOMMU

2016-07-03 Thread Jan Kiszka
On 2016-07-04 07:06, David Kiarie wrote:
> On Wed, Jun 22, 2016 at 11:24 PM, Jan Kiszka  wrote:
>> On 2016-06-15 14:21, David Kiarie wrote:
>>> +static uint64_t amdvi_mmio_read(void *opaque, hwaddr addr, unsigned size)
>>> +{
>>> +AMDVIState *s = opaque;
>>> +
>>> +uint64_t val = -1;
>>> +if (addr + size > AMDVI_MMIO_SIZE) {
>>> +trace_amdvi_mmio_read("error: addr outside region: max ",
>>> +(uint64_t)AMDVI_MMIO_SIZE, addr, size);
>>> +return (uint64_t)-1;
>>> +}
>>> +
>>> +if (size == 2) {
>>> +val = amdvi_readw(s, addr);
>>> +} else if (size == 4) {
>>> +val = amdvi_readl(s, addr);
>>> +} else if (size == 8) {
>>> +val = amdvi_readq(s, addr);
>>> +}
>>> +
>>> +switch (addr & ~0x07) {
>>> +case AMDVI_MMIO_DEVICE_TABLE:
>>> +trace_amdvi_mmio_read("MMIO_DEVICE_TABLE", addr, size, addr & 
>>> ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_COMMAND_BASE:
>>> +trace_amdvi_mmio_read("MMIO_COMMAND_BASE", addr, size, addr & 
>>> ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_EVENT_BASE:
>>> +trace_amdvi_mmio_read("MMIO_EVENT_BASE", addr, size, addr & ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_CONTROL:
>>> +trace_amdvi_mmio_read("MMIO_MMIO_CONTROL", addr, size, addr & 
>>> ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_EXCL_BASE:
>>> +trace_amdvi_mmio_read("MMIO_EXCL_BASE", addr, size, addr & ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_EXCL_LIMIT:
>>> +trace_amdvi_mmio_read("MMIO_EXCL_LIMIT", addr, size, addr & ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_COMMAND_HEAD:
>>> +trace_amdvi_mmio_read("MMIO_COMMAND_HEAD", addr, size, addr & 
>>> ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_COMMAND_TAIL:
>>> +trace_amdvi_mmio_read("MMIO_COMMAND_TAIL", addr, size, addr & 
>>> ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_EVENT_HEAD:
>>> +trace_amdvi_mmio_read("MMIO_EVENT_HEAD", addr, size, addr & ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_EVENT_TAIL:
>>> +trace_amdvi_mmio_read("MMIO_EVENT_TAIL", addr, size, addr & ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_STATUS:
>>> +trace_amdvi_mmio_read("MMIO_STATUS", addr, size, addr & ~0x07);
>>> +break;
>>> +
>>> +case AMDVI_MMIO_EXT_FEATURES:
>>> +trace_amdvi_mmio_read("MMIO_EXT_FEATURES", addr, size, addr & 
>>> ~0x07);
>>> +break;
>>
>> What about a lookup table for that name?
> 
> I can't find an obvious way to index a table given the register address.

Well, you would need a low ((addr & 0x2000) == 0) and a high table (addr
& 0x2000), and then do the indexing based on (addr & ~0x2000) / 8.

> 
>>
>>> +
>>> +default:
>>> +trace_amdvi_mmio_read("UNHANDLED READ", addr, size, addr & ~0x07);
>>> +}
>>> +return val;
>>> +}
>>> +
>>> +static void amdvi_handle_control_write(AMDVIState *s)
>>> +{
>>> +/*
>>> + * read whatever is already written in case
>>> + * software is writing in chucks less than 8 bytes
>>> + */
>>> +unsigned long control = amdvi_readq(s, AMDVI_MMIO_CONTROL);
>>> +s->enabled = !!(control & AMDVI_MMIO_CONTROL_AMDVIEN);
>>> +
>>> +s->ats_enabled = !!(control & AMDVI_MMIO_CONTROL_HTTUNEN);
>>> +s->evtlog_enabled = s->enabled && !!(control &
>>> +AMDVI_MMIO_CONTROL_EVENTLOGEN);
>>> +
>>> +s->evtlog_intr = !!(control & AMDVI_MMIO_CONTROL_EVENTINTEN);
>>> +s->completion_wait_intr = !!(control & 
>>> AMDVI_MMIO_CONTROL_COMWAITINTEN);
>>> +s->cmdbuf_enabled = s->enabled && !!(control &
>>> +AMDVI_MMIO_CONTROL_CMDBUFLEN);
>>> +
>>> +/* update the flags depending on the control register */
>>> +if (s->cmdbuf_enabled) {
>>> +amdvi_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_CMDBUF_RUN);
>>> +} else {
>>> +amdvi_and_assignq(s, AMDVI_MMIO_STATUS, 
>>> ~AMDVI_MMIO_STATUS_CMDBUF_RUN);
>>> +}
>>> +if (s->evtlog_enabled) {
>>> +amdvi_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVT_RUN);
>>> +} else {
>>> +amdvi_and_assignq(s, AMDVI_MMIO_STATUS, 
>>> ~AMDVI_MMIO_STATUS_EVT_RUN);
>>> +}
>>> +
>>> +trace_amdvi_control_status(control);
>>> +
>>> +amdvi_cmdbuf_run(s);
>>> +}
>>> +
>>> +static inline void amdvi_handle_devtab_write(AMDVIState *s)
>>> +
>>> +{
>>> +uint64_t val = amdvi_readq(s, AMDVI_MMIO_DEVICE_TABLE);
>>> +s->devtab = (val & AMDVI_MMIO_DEVTAB_BASE_MASK);
>>> +
>>> +/* set device table length */
>>> +s->devtab_len = ((val & AMDVI_MMIO_DEVTAB_SIZE_MASK) + 1 *
>>> +(AMDVI_MMIO_DEVTAB_SIZE_UNIT /
>>> + AMDVI_MMIO_DEVTAB_ENTRY_SIZE));
>>> +}
>>> +
>>> +static inline void amdvi_handle_cmdhead_write(AMDVIState *s)
>>> +{
>>> +s->cmdbuf_head = amdvi_readq(s, AMDVI_MMIO_COMMAND_HEAD)
>>> +   

Re: [Qemu-devel] [V12 3/4] hw/i386: Introduce AMD IOMMU

2016-07-03 Thread David Kiarie
On Mon, Jul 4, 2016 at 8:41 AM, Jan Kiszka  wrote:
> On 2016-07-04 07:06, David Kiarie wrote:
>> On Wed, Jun 22, 2016 at 11:24 PM, Jan Kiszka  wrote:
>>> On 2016-06-15 14:21, David Kiarie wrote:
 +static uint64_t amdvi_mmio_read(void *opaque, hwaddr addr, unsigned size)
 +{
 +AMDVIState *s = opaque;
 +
 +uint64_t val = -1;
 +if (addr + size > AMDVI_MMIO_SIZE) {
 +trace_amdvi_mmio_read("error: addr outside region: max ",
 +(uint64_t)AMDVI_MMIO_SIZE, addr, size);
 +return (uint64_t)-1;
 +}
 +
 +if (size == 2) {
 +val = amdvi_readw(s, addr);
 +} else if (size == 4) {
 +val = amdvi_readl(s, addr);
 +} else if (size == 8) {
 +val = amdvi_readq(s, addr);
 +}
 +
 +switch (addr & ~0x07) {
 +case AMDVI_MMIO_DEVICE_TABLE:
 +trace_amdvi_mmio_read("MMIO_DEVICE_TABLE", addr, size, addr & 
 ~0x07);
 +break;
 +
 +case AMDVI_MMIO_COMMAND_BASE:
 +trace_amdvi_mmio_read("MMIO_COMMAND_BASE", addr, size, addr & 
 ~0x07);
 +break;
 +
 +case AMDVI_MMIO_EVENT_BASE:
 +trace_amdvi_mmio_read("MMIO_EVENT_BASE", addr, size, addr & 
 ~0x07);
 +break;
 +
 +case AMDVI_MMIO_CONTROL:
 +trace_amdvi_mmio_read("MMIO_MMIO_CONTROL", addr, size, addr & 
 ~0x07);
 +break;
 +
 +case AMDVI_MMIO_EXCL_BASE:
 +trace_amdvi_mmio_read("MMIO_EXCL_BASE", addr, size, addr & ~0x07);
 +break;
 +
 +case AMDVI_MMIO_EXCL_LIMIT:
 +trace_amdvi_mmio_read("MMIO_EXCL_LIMIT", addr, size, addr & 
 ~0x07);
 +break;
 +
 +case AMDVI_MMIO_COMMAND_HEAD:
 +trace_amdvi_mmio_read("MMIO_COMMAND_HEAD", addr, size, addr & 
 ~0x07);
 +break;
 +
 +case AMDVI_MMIO_COMMAND_TAIL:
 +trace_amdvi_mmio_read("MMIO_COMMAND_TAIL", addr, size, addr & 
 ~0x07);
 +break;
 +
 +case AMDVI_MMIO_EVENT_HEAD:
 +trace_amdvi_mmio_read("MMIO_EVENT_HEAD", addr, size, addr & 
 ~0x07);
 +break;
 +
 +case AMDVI_MMIO_EVENT_TAIL:
 +trace_amdvi_mmio_read("MMIO_EVENT_TAIL", addr, size, addr & 
 ~0x07);
 +break;
 +
 +case AMDVI_MMIO_STATUS:
 +trace_amdvi_mmio_read("MMIO_STATUS", addr, size, addr & ~0x07);
 +break;
 +
 +case AMDVI_MMIO_EXT_FEATURES:
 +trace_amdvi_mmio_read("MMIO_EXT_FEATURES", addr, size, addr & 
 ~0x07);
 +break;
>>>
>>> What about a lookup table for that name?
>>
>> I can't find an obvious way to index a table given the register address.
>
> Well, you would need a low ((addr & 0x2000) == 0) and a high table (addr
> & 0x2000), and then do the indexing based on (addr & ~0x2000) / 8.
>
>>
>>>
 +
 +default:
 +trace_amdvi_mmio_read("UNHANDLED READ", addr, size, addr & ~0x07);
 +}
 +return val;
 +}
 +
 +static void amdvi_handle_control_write(AMDVIState *s)
 +{
 +/*
 + * read whatever is already written in case
 + * software is writing in chucks less than 8 bytes
 + */
 +unsigned long control = amdvi_readq(s, AMDVI_MMIO_CONTROL);
 +s->enabled = !!(control & AMDVI_MMIO_CONTROL_AMDVIEN);
 +
 +s->ats_enabled = !!(control & AMDVI_MMIO_CONTROL_HTTUNEN);
 +s->evtlog_enabled = s->enabled && !!(control &
 +AMDVI_MMIO_CONTROL_EVENTLOGEN);
 +
 +s->evtlog_intr = !!(control & AMDVI_MMIO_CONTROL_EVENTINTEN);
 +s->completion_wait_intr = !!(control & 
 AMDVI_MMIO_CONTROL_COMWAITINTEN);
 +s->cmdbuf_enabled = s->enabled && !!(control &
 +AMDVI_MMIO_CONTROL_CMDBUFLEN);
 +
 +/* update the flags depending on the control register */
 +if (s->cmdbuf_enabled) {
 +amdvi_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_CMDBUF_RUN);
 +} else {
 +amdvi_and_assignq(s, AMDVI_MMIO_STATUS, 
 ~AMDVI_MMIO_STATUS_CMDBUF_RUN);
 +}
 +if (s->evtlog_enabled) {
 +amdvi_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVT_RUN);
 +} else {
 +amdvi_and_assignq(s, AMDVI_MMIO_STATUS, 
 ~AMDVI_MMIO_STATUS_EVT_RUN);
 +}
 +
 +trace_amdvi_control_status(control);
 +
 +amdvi_cmdbuf_run(s);
 +}
 +
 +static inline void amdvi_handle_devtab_write(AMDVIState *s)
 +
 +{
 +uint64_t val = amdvi_readq(s, AMDVI_MMIO_DEVICE_TABLE);
 +s->devtab = (val & AMDVI_MMIO_DEVTAB_BASE_MASK);
 +
 +/* set device table length */
 +s->devtab_len = ((val & AMDVI_MMIO_DEVTAB_SIZE_MASK) + 1 *
 +(AMDVI_MMIO_DEVTAB_SIZE_UNIT /
 +

Re: [Qemu-devel] [V12 3/4] hw/i386: Introduce AMD IOMMU

2016-07-03 Thread Jan Kiszka
On 2016-07-04 07:49, David Kiarie wrote:
> +/* FIXME: something might go wrong if System Software writes in chunks
> + * of one byte but linux writes in chunks of 4 bytes so currently it
> + * works correctly with linux but will definitely be busted if software
> + * reads/writes 8 bytes

 What does the spec tell us about non-dword accesses? If they aren't
 allowed, filter them out completely at the beginning.
>>>
>>> Non-dword accesses are allowed. Spec 2.62
>>>
>>> " Accesses must be aligned to the size of the access and the size
>>> in bytes must be a power
>>> of two. Software may use accesses as small as one byte. "
>>>
>>> Linux uses dword accesses though but even in this case a change in the
>>> order of access whereby the high part of the register is accessed
>>> first then the lower accessed could cause a problem (??)
>>
>> I do not get yet what makes it tricky to support all allowed access
>> sizes. You are just missing byte access, and that will easy to add once
>> the size dispatching is done in a single function like suggested below.
> 
> It is tricky because I need to ready some values that may span across
> 1,2,4-byte boundaries and I don't have a way to tell that software is
> done writing. The current logic is based on the assumption that
> software writes the low bytes first which is mostly the case but might
> not always be the case.

According to the spec, software is allowed to read or write any byte,
word, dword and qword, provided it is naturally aligned. Just model that
access, nothing more. If a high-word/byte access triggers some side
effect, than that is what it does (and probably a reason why Linux
avoids it).

Jan




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 0/4] ppc: fixes for large page and VRMA support

2016-07-03 Thread Benjamin Herrenschmidt
On Fri, 2016-07-01 at 09:10 +0200, Cédric Le Goater wrote:
> Here is a little serie with API cleanups and fixes for large page and
> VRMA. Previous patches which added the support did not take into
> account the segment page size attribute.

I've done a slightly different patch that subsumes 1...3 and is I think
now architecturally correct ;-) (unlike my original code).

Additionaly we now avoid walking all the SLB sizes so things should be
a bit faster too.

The VRMA patch still applies, I'll include it in what I post.

Cheers,
Ben.




Re: [Qemu-devel] [PATCH 1/6] oslib-posix: add helpers for stack alloc and free

2016-07-03 Thread Peter Lieven

Am 01.07.2016 um 22:12 schrieb Richard Henderson:

On 06/30/2016 12:37 AM, Peter Lieven wrote:

+void *qemu_alloc_stack(size_t sz)
+{
+/* allocate sz bytes plus one extra page for a guard
+ * page at the bottom of the stack */
+void *ptr = mmap(NULL, sz + getpagesize(), PROT_NONE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (ptr == MAP_FAILED) {
+abort();
+}
+if (mmap(ptr + getpagesize(), sz, PROT_READ | PROT_WRITE,
+MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) == MAP_FAILED) {
+abort();
+}


Why two mmap instead of mmap + mprotect?


r~


I was looking at qemu_ram_mmap too much. mprotect seems the cleaner solution.

Thanks,
Peter




Re: [Qemu-devel] [PATCH 1/6] oslib-posix: add helpers for stack alloc and free

2016-07-03 Thread Peter Lieven

Am 01.07.2016 um 22:49 schrieb Richard Henderson:

On 07/01/2016 01:12 PM, Richard Henderson wrote:

On 06/30/2016 12:37 AM, Peter Lieven wrote:

+void *qemu_alloc_stack(size_t sz)
+{
+/* allocate sz bytes plus one extra page for a guard
+ * page at the bottom of the stack */
+void *ptr = mmap(NULL, sz + getpagesize(), PROT_NONE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (ptr == MAP_FAILED) {
+abort();
+}
+if (mmap(ptr + getpagesize(), sz, PROT_READ | PROT_WRITE,
+MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) == MAP_FAILED) {
+abort();
+}


Rare platforms now, but fwiw, this is incorrect for hppa and ia64.

For hppa, stack grows up, so the guard page needs to be at the top.

For ia64, there are two stacks, the "normal" program stack (grows down) and the 
register window stack (grows up).  The guard page goes in between.

See e.g. glibc/nptl/allocatestack.c

#ifdef NEED_SEPARATE_REGISTER_STACK
  char *guard = mem + (((size - guardsize) / 2) & ~pagesize_m1);
#elif _STACK_GROWS_DOWN
  char *guard = mem;
#elif _STACK_GROWS_UP
  char *guard = (char *) (((uintptr_t) pd - guardsize) & ~pagesize_m1);
#endif
  if (mprotect (guard, guardsize, PROT_NONE) != 0)


It seems that ia64 needs even more care when allocating a stack, right?
Would you think it is ok to only handle _STACK_GROWS_DOWN and _STACK_GROWS_UP ?

Peter



Re: [Qemu-devel] [PATCH 6/6] coroutine: reduce stack size to 64kB

2016-07-03 Thread Peter Lieven

Am 01.07.2016 um 23:13 schrieb Richard Henderson:

On 06/30/2016 12:37 AM, Peter Lieven wrote:

evaluation with the recently introduced maximum stack usage monitoring revealed
that the actual used stack size was never above 4kB so allocating 1MB stack
for each coroutine is a lot of wasted memory. So reduce the stack size to
64kB which should still give enough head room.

Signed-off-by: Peter Lieven 
---
 include/qemu/coroutine_int.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index eac323a..f84d777 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,7 +28,7 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"

-#define COROUTINE_STACK_SIZE (1 << 20)
+#define COROUTINE_STACK_SIZE (1 << 16)

 typedef enum {
 COROUTINE_YIELD = 1,



Ought we check that this is not smaller than

sysconf(_SC_THREAD_STACK_MIN)

which (for glibc at least), is 192k for ia64, 128k for aarch64, mips and tile 
(though why it is quite so high in those later cases I don't know).


for x86_64 it seems to be 16k. I would not mind to adjust the stack size either 
in qemu_alloc_stack or change the macro for the coroutine stack
size into a function returning MAX(1 << 16, sysconf(_SC_THREAD_STACK_MIN)).

Peter




Re: [Qemu-devel] [PATCH] virtio: revert host notifiers to old semantics

2016-07-03 Thread Peter Lieven

Am 30.06.2016 um 17:31 schrieb Cornelia Huck:

The host notifier rework tried both to unify host notifiers across
transports and plug a possible hole during host notifier
re-assignment. Unfortunately, this meant a change in semantics that
breaks vhost and iSCSI+dataplane.

As the minimal fix, keep the common host notifier code but revert
to the old semantics so that we have time to figure out the proper
fix.

Fixes: 6798e245a3 ("virtio-bus: common ioeventfd infrastructure")
Reported-by: Peter Lieven 
Reported-by: Jason Wang 
Reported-by: Marc-André Lureau 
Signed-off-by: Cornelia Huck 


Works for iscsi + dateplane.

Peter




Re: [Qemu-devel] [PATCH 0/4] ppc: fixes for large page and VRMA support

2016-07-03 Thread Cédric Le Goater
On 07/04/2016 08:11 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2016-07-01 at 09:10 +0200, Cédric Le Goater wrote:
>> Here is a little serie with API cleanups and fixes for large page and
>> VRMA. Previous patches which added the support did not take into
>> account the segment page size attribute.
> 
> I've done a slightly different patch that subsumes 1...3 and is I think 
> now architecturally correct ;-) (unlike my original code).
>
> Additionaly we noavoid walking all the SLB sizes so things should be
> a bit faster too.

good that part felt like it needed some optimization.  

> The VRMA patch still applies, I'll include it in what I post.

I will give it a try.

Thanks,

C. 




[Qemu-devel] [PULL 4/8] e1000e: add boot rom

2016-07-03 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
---
 hw/net/e1000e.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 692283f..4778744 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -693,6 +693,7 @@ static void e1000e_class_init(ObjectClass *class, void 
*data)
 c->vendor_id = PCI_VENDOR_ID_INTEL;
 c->device_id = E1000_DEV_ID_82574L;
 c->revision = 0;
+c->romfile = "efi-e1000e.rom";
 c->class_id = PCI_CLASS_NETWORK_ETHERNET;
 c->is_express = 1;
 
-- 
1.8.3.1




[Qemu-devel] [PULL 6/8] ipxe: update prebuilt binaries

2016-07-03 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
---
 pc-bios/efi-e1000.rom| Bin 196608 -> 209408 bytes
 pc-bios/efi-e1000e.rom   | Bin 0 -> 209408 bytes
 pc-bios/efi-eepro100.rom | Bin 197120 -> 209920 bytes
 pc-bios/efi-ne2k_pci.rom | Bin 195584 -> 208384 bytes
 pc-bios/efi-pcnet.rom| Bin 195584 -> 208384 bytes
 pc-bios/efi-rtl8139.rom  | Bin 199168 -> 211456 bytes
 pc-bios/efi-virtio.rom   | Bin 193024 -> 211456 bytes
 pc-bios/efi-vmxnet3.rom  | Bin 0 -> 205312 bytes
 8 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 pc-bios/efi-e1000e.rom
 create mode 100644 pc-bios/efi-vmxnet3.rom

diff --git a/pc-bios/efi-e1000.rom b/pc-bios/efi-e1000.rom
index 4bc89a3..4e61f9b 100644
Binary files a/pc-bios/efi-e1000.rom and b/pc-bios/efi-e1000.rom differ
diff --git a/pc-bios/efi-e1000e.rom b/pc-bios/efi-e1000e.rom
new file mode 100644
index 000..192a437
Binary files /dev/null and b/pc-bios/efi-e1000e.rom differ
diff --git a/pc-bios/efi-eepro100.rom b/pc-bios/efi-eepro100.rom
index 85b7f9b..66c5226 100644
Binary files a/pc-bios/efi-eepro100.rom and b/pc-bios/efi-eepro100.rom differ
diff --git a/pc-bios/efi-ne2k_pci.rom b/pc-bios/efi-ne2k_pci.rom
index ebafd84..8c3e5fd 100644
Binary files a/pc-bios/efi-ne2k_pci.rom and b/pc-bios/efi-ne2k_pci.rom differ
diff --git a/pc-bios/efi-pcnet.rom b/pc-bios/efi-pcnet.rom
index 6f19723..802e225 100644
Binary files a/pc-bios/efi-pcnet.rom and b/pc-bios/efi-pcnet.rom differ
diff --git a/pc-bios/efi-rtl8139.rom b/pc-bios/efi-rtl8139.rom
index 086551b..8827181 100644
Binary files a/pc-bios/efi-rtl8139.rom and b/pc-bios/efi-rtl8139.rom differ
diff --git a/pc-bios/efi-virtio.rom b/pc-bios/efi-virtio.rom
index 140c680..2fc0497 100644
Binary files a/pc-bios/efi-virtio.rom and b/pc-bios/efi-virtio.rom differ
diff --git a/pc-bios/efi-vmxnet3.rom b/pc-bios/efi-vmxnet3.rom
new file mode 100644
index 000..3d42635
Binary files /dev/null and b/pc-bios/efi-vmxnet3.rom differ
-- 
1.8.3.1




[Qemu-devel] [PULL 5/8] vmxnet3: add boot rom

2016-07-03 Thread Gerd Hoffmann
Disable for old machine types as this is a guest visible change.

Signed-off-by: Gerd Hoffmann 
---
 hw/net/vmxnet3.c | 1 +
 include/hw/i386/pc.h | 4 
 2 files changed, 5 insertions(+)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index d978976..25cee9f 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2700,6 +2700,7 @@ static void vmxnet3_class_init(ObjectClass *class, void 
*data)
 c->vendor_id = PCI_VENDOR_ID_VMWARE;
 c->device_id = PCI_DEVICE_ID_VMWARE_VMXNET3;
 c->revision = PCI_DEVICE_ID_VMWARE_VMXNET3_REVISION;
+c->romfile = "efi-vmxnet3.rom";
 c->class_id = PCI_CLASS_NETWORK_ETHERNET;
 c->subsystem_vendor_id = PCI_VENDOR_ID_VMWARE;
 c->subsystem_id = PCI_DEVICE_ID_VMWARE_VMXNET3;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 49566c8..a112efb 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -362,6 +362,10 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 .driver   = TYPE_X86_CPU,\
 .property = "cpuid-0xb",\
 .value= "off",\
+},{\
+.driver   = "vmxnet3",\
+.property = "romfile",\
+.value= "",\
 },
 
 #define PC_COMPAT_2_5 \
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v2 4/7] ppc: open code cpu creation for machine types

2016-07-03 Thread Greg Kurz
On Mon, 4 Jul 2016 13:54:55 +1000
David Gibson  wrote:

> On Sat, Jul 02, 2016 at 10:33:33AM +0200, Greg Kurz wrote:
> > On Sat, 2 Jul 2016 13:36:22 +0530
> > Bharata B Rao  wrote:
> >   
> > > On Sat, Jul 02, 2016 at 12:41:48AM +0200, Greg Kurz wrote:  
> > > > If we want to generate cpu_dt_id in the machine code, this must occur
> > > > before the cpu gets realized. We must open code the cpu creation to be
> > > > able to do this.
> > > > 
> > > > This patch just does that. It borrows some lines from previous work
> > > > from Bharata to handle the feature parsing.
> > > > 
> > > > Signed-off-by: Greg Kurz 
> > > > ---
> > > >  hw/ppc/ppc.c |   39 ++-
> > > >  1 file changed, 38 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> > > > index dc3d214009c5..57f4ddd073d0 100644
> > > > --- a/hw/ppc/ppc.c
> > > > +++ b/hw/ppc/ppc.c
> > > > @@ -32,6 +32,7 @@
> > > >  #include "sysemu/cpus.h"
> > > >  #include "hw/timer/m48t59.h"
> > > >  #include "qemu/log.h"
> > > > +#include "qapi/error.h"
> > > >  #include "qemu/error-report.h"
> > > >  #include "hw/loader.h"
> > > >  #include "sysemu/kvm.h"
> > > > @@ -1353,5 +1354,41 @@ PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id)
> > > > 
> > > >  PowerPCCPU *ppc_cpu_init(const char *cpu_model)
> > > >  {
> > > > -return POWERPC_CPU(cpu_generic_init(TYPE_POWERPC_CPU, cpu_model));
> > > > +PowerPCCPU *cpu;
> > > > +CPUClass *cc;
> > > > +ObjectClass *oc;
> > > > +gchar **model_pieces;
> > > > +Error *err = NULL;
> > > > +
> > > > +model_pieces = g_strsplit(cpu_model, ",", 2);
> > > > +if (!model_pieces[0]) {
> > > > +error_report("Invalid/empty CPU model name");
> > > > +return NULL;
> > > > +}
> > > > +
> > > > +oc = cpu_class_by_name(TYPE_POWERPC_CPU, model_pieces[0]);
> > > > +if (oc == NULL) {
> > > > +error_report("Unable to find CPU definition: %s", 
> > > > model_pieces[0]);
> > > > +return NULL;
> > > > +}
> > > > +
> > > > +cpu = POWERPC_CPU(object_new(object_class_get_name(oc)));
> > > > +
> > > > +cc = CPU_CLASS(oc);
> > > > +cc->parse_features(CPU(cpu), model_pieces[1], &err);
> > > 
> > > Igor is working on a patchset to convert -cpu features into global 
> > > properties.
> > > IIUC, after that patchset, it is not recommended to parse the -cpu 
> > > features
> > > for every CPU but do it only once.
> > >   
> > 
> > cpu_generic_init() in the current code also does the parsing, and as the 
> > title
> > says, this patch is just about open coding the creation. I don't want to
> > change behavior yet.
> > 
> > But yes, I agree that we should only parse features once and I'll be more 
> > than
> > happy to fix this in a followup patch, based on Igor's work.
> > 
> > In the meantime, maybe I can add a comment stating that the parsing should 
> > go
> > away ?  
> 
> Right.  But the thing is by open coding here, you're making two copies
> that need to be fixed instead of one, which increases the chances of
> error.
> 
> It seems like it would be safer to change the generic code so there's
> a new generic function which doesn't do the realize which we can use
> on ppc (and other platforms when/if they need it).
> 
> Doing the change just on ppc by making our own copy of
> cpu_generic_init() seems more like to lead to future mistakes.
> 

I had this in v1:

http://patchwork.ozlabs.org/patch/642216/

I'll repost it in v3.

> > > That is what I attempted here in the context of supporting compat cpu type
> > > for pseries-2.7:
> > > 
> > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg381660.html
> > >   
> > 
> > Yeah and this is where I borrowed some lines. :)
> >   
> > > Regards,
> > > Bharata.
> > >   
> > 
> > Cheers.
> >   
> 



pgpiNmlwtr3_X.pgp
Description: OpenPGP digital signature


[Qemu-devel] [PULL 2/8] ipxe: add e1000e rom

2016-07-03 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
---
 roms/Makefile | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/roms/Makefile b/roms/Makefile
index 7bd1252..e8133fe 100644
--- a/roms/Makefile
+++ b/roms/Makefile
@@ -1,11 +1,13 @@
 
 vgabios_variants := stdvga cirrus vmware qxl isavga virtio
 vgabios_targets  := $(subst -isavga,,$(patsubst 
%,vgabios-%.bin,$(vgabios_variants)))
-pxerom_variants  := e1000 eepro100 ne2k_pci pcnet rtl8139 virtio
-pxerom_targets   := 8086100e 80861209 10500940 10222000 10ec8139 1af41000
+pxerom_variants  := e1000 e1000e eepro100 ne2k_pci pcnet rtl8139 virtio
+pxerom_targets   := 8086100e 808610d3 80861209 10500940 10222000 10ec8139 
1af41000
 
 pxe-rom-e1000efi-rom-e1000: VID := 8086
 pxe-rom-e1000efi-rom-e1000: DID := 100e
+pxe-rom-e1000e   efi-rom-e1000e   : VID := 8086
+pxe-rom-e1000e   efi-rom-e1000e   : DID := 10d3
 pxe-rom-eepro100 efi-rom-eepro100 : VID := 8086
 pxe-rom-eepro100 efi-rom-eepro100 : DID := 1209
 pxe-rom-ne2k_pci efi-rom-ne2k_pci : VID := 1050
-- 
1.8.3.1




[Qemu-devel] [PULL 3/8] ipxe: add vmxnet3 rom

2016-07-03 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
---
 roms/Makefile | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/roms/Makefile b/roms/Makefile
index e8133fe..88b3709 100644
--- a/roms/Makefile
+++ b/roms/Makefile
@@ -1,8 +1,8 @@
 
 vgabios_variants := stdvga cirrus vmware qxl isavga virtio
 vgabios_targets  := $(subst -isavga,,$(patsubst 
%,vgabios-%.bin,$(vgabios_variants)))
-pxerom_variants  := e1000 e1000e eepro100 ne2k_pci pcnet rtl8139 virtio
-pxerom_targets   := 8086100e 808610d3 80861209 10500940 10222000 10ec8139 
1af41000
+pxerom_variants  := e1000 e1000e eepro100 ne2k_pci pcnet rtl8139 virtio vmxnet3
+pxerom_targets   := 8086100e 808610d3 80861209 10500940 10222000 10ec8139 
1af41000 15ad07b0
 
 pxe-rom-e1000efi-rom-e1000: VID := 8086
 pxe-rom-e1000efi-rom-e1000: DID := 100e
@@ -18,6 +18,8 @@ pxe-rom-rtl8139  efi-rom-rtl8139  : VID := 10ec
 pxe-rom-rtl8139  efi-rom-rtl8139  : DID := 8139
 pxe-rom-virtio   efi-rom-virtio   : VID := 1af4
 pxe-rom-virtio   efi-rom-virtio   : DID := 1000
+pxe-rom-vmxnet3  efi-rom-vmxnet3  : VID := 15ad
+pxe-rom-vmxnet3  efi-rom-vmxnet3  : DID := 07b0
 
 #
 # cross compiler auto detection
-- 
1.8.3.1




[Qemu-devel] [PULL 0/8] ipxe: update submodule from 4e03af8ec to 041863191

2016-07-03 Thread Gerd Hoffmann
  Hi,

Here comes the ipxe update for 2.7, rebasing the ipxe module to latest
master and also adding boot roms for e1000e and vmxnet3.

v2: two incremental tweaks to make sure the two new roms are installed
properly.

please pull,
  Gerd

The following changes since commit c7288767523f6510cf557707d3eb5e78e519b90d:

  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160623' into 
staging (2016-06-23 11:53:14 +0100)

are available in the git repository at:


  git://git.kraxel.org/qemu tags/pull-ipxe-20160704-1

for you to fetch changes up to 8df42d855c38a1b23b6ba9f38ab71b9d7fb24216:

  build: add pc-bios to config-host.mak deps (2016-07-01 13:31:44 +0200)


ipxe: update submodule from 4e03af8ec to 041863191
e1000e+vmxnet3: add boot rom


Gerd Hoffmann (8):
  ipxe: update submodule from 4e03af8ec to 041863191
  ipxe: add e1000e rom
  ipxe: add vmxnet3 rom
  e1000e: add boot rom
  vmxnet3: add boot rom
  ipxe: update prebuilt binaries
  ipxe: add new roms to BLOBS
  build: add pc-bios to config-host.mak deps

 Makefile |   3 ++-
 hw/net/e1000e.c  |   1 +
 hw/net/vmxnet3.c |   1 +
 include/hw/i386/pc.h |   4 
 pc-bios/efi-e1000.rom| Bin 196608 -> 209408 bytes
 pc-bios/efi-e1000e.rom   | Bin 0 -> 209408 bytes
 pc-bios/efi-eepro100.rom | Bin 197120 -> 209920 bytes
 pc-bios/efi-ne2k_pci.rom | Bin 195584 -> 208384 bytes
 pc-bios/efi-pcnet.rom| Bin 195584 -> 208384 bytes
 pc-bios/efi-rtl8139.rom  | Bin 199168 -> 211456 bytes
 pc-bios/efi-virtio.rom   | Bin 193024 -> 211456 bytes
 pc-bios/efi-vmxnet3.rom  | Bin 0 -> 205312 bytes
 roms/Makefile|   8 ++--
 roms/ipxe|   2 +-
 14 files changed, 15 insertions(+), 4 deletions(-)
 create mode 100644 pc-bios/efi-e1000e.rom
 create mode 100644 pc-bios/efi-vmxnet3.rom



[Qemu-devel] [PULL 7/8] ipxe: add new roms to BLOBS

2016-07-03 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
---
 Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile b/Makefile
index 7087fc2..5ea13bc 100644
--- a/Makefile
+++ b/Makefile
@@ -416,6 +416,7 @@ pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \
 pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \
 efi-e1000.rom efi-eepro100.rom efi-ne2k_pci.rom \
 efi-pcnet.rom efi-rtl8139.rom efi-virtio.rom \
+efi-e1000e.rom efi-vmxnet3.rom \
 qemu-icon.bmp qemu_logo_no_text.svg \
 bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \
 multiboot.bin linuxboot.bin kvmvapic.bin \
-- 
1.8.3.1




[Qemu-devel] [PULL 8/8] build: add pc-bios to config-host.mak deps

2016-07-03 Thread Gerd Hoffmann
... so configure re-runs on pc-bios updates such as new pxe roms.
Needed because configure symlinks the prebuilt roms from src
into build tree.

Signed-off-by: Gerd Hoffmann 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 5ea13bc..c1ac21d 100644
--- a/Makefile
+++ b/Makefile
@@ -30,7 +30,7 @@ CONFIG_ALL=y
 -include config-all-devices.mak
 -include config-all-disas.mak
 
-config-host.mak: $(SRC_PATH)/configure
+config-host.mak: $(SRC_PATH)/configure $(SRC_PATH)/pc-bios
@echo $@ is out-of-date, running configure
@# TODO: The next lines include code which supports a smooth
@# transition from old configurations without config.status.
-- 
1.8.3.1




[Qemu-devel] [PULL 1/8] ipxe: update submodule from 4e03af8ec to 041863191

2016-07-03 Thread Gerd Hoffmann
shortlog


Andrew Widdersheim (1):
  [netdevice] Add "ifname" setting

Carl Henrik Lunde (1):
  [vmxnet3] Avoid completely filling the TX descriptor ring

Christian Hesse (2):
  [golan] Fix build error on some versions of gcc
  [ath9k] Fix buffer overrun for ar9287

Christian Nilsson (2):
  [intel] Add PCI device ID for another I219-V
  [intel] Add PCI device ID for another I219-LM

Hummel Frank (1):
  [intel] Add INTEL_NO_PHY_RST for I218-LM

Kyösti Mälkki (1):
  [intel] Add PCI IDs for i210/i211 flashless operation

Ladi Prosek (6):
  [pci] Add pci_find_next_capability()
  [virtio] Add virtio 1.0 constants and data structures
  [virtio] Add virtio 1.0 PCI support
  [virtio] Add virtio-net 1.0 support
  [virtio] Renumber virtio_pci_region flags
  [virtio] Fix virtio-pci logging

Leendert van Doorn (2):
  [tg3] Fix address truncation bug on 64-bit machines
  [tg3] Add missing memory barrier

Michael Brown (287):
  [settings] Re-add "uristring" setting type
  [dhcp] Do not skip ProxyDHCPREQUEST if next-server is empty
  [efi] Add definitions of GUIDs observed when booting shim.efi and grub.efi
  [efi] Mark EFI debug transcription functions as __attribute__ (( pure ))
  [efi] Remove raw EFI_HANDLE values from debug messages
  [efi] Include installed protocol list in unknown handle names
  [efi] Improve efi_wrap debugging
  [pxe] Construct all fake DHCP packets before starting PXE NBP
  [efi] Add definitions of GUIDs observed when booting wdsmgfw.efi
  [efi] Fix debug directory size
  [efi] Populate debug directory entry FileOffset field
  [build] Search for ldlinux.c32 separately from isolinux.bin
  [tcpip] Allow supported address families to be detected at runtime
  [efi] Allow calls to efi_snp_claim() and efi_snp_release() to be nested
  [efi] Fix order of events on SNP removal path
  [efi] Do not return EFI_NOT_READY from our ReceiveFilters() method
  [pxe] Populate ciaddr in fake PXE Boot Server ACK packet
  [uri] Generalise tftp_uri() to pxe_uri()
  [efi] Implement the EFI_PXE_BASE_CODE_PROTOCOL
  [usb] Expose usb_find_driver()
  [usb] Add function to device's function list before attempting probe
  [efi] Add USB headers and GUID definitions
  [efi] Allow efidev_parent() to traverse multiple device generations
  [efi] Add a USB host controller driver based on EFI_USB_IO_PROTOCOL
  [tcpip] Avoid generating positive zero for transmitted UDP checksums
  [usb] Generalise zero-length packet generation logic
  [ehci] Do not treat zero-length NULL pointers as unreachable
  [ehci] Support arbitrarily large transfers
  [xhci] Support arbitrarily large transfers
  [efi] Provide efi_devpath_len()
  [efi] Include a copy of the device path within struct efi_device
  [usb] Select preferred USB device configuration based on driver score
  [usb] Allow for wildcard USB class IDs
  [efi] Expose unused USB devices via EFI_USB_IO_PROTOCOL
  [ncm] Support setting MAC address
  [build] Remove dependency on libiberty
  [efi] Minimise use of iPXE header files when building host utilities
  [pxe] Invoke INT 1a,564e when PXE stack is activated
  [pxe] Notify BIOS via INT 1a,564e for each new network device
  [efi] Work around broken 32-bit PE executable parsing in ImageHlp.dll
  [efi] Avoid infinite loops when asked to stop non-existent devices
  [efi] Expose an UNDI interface alongside the existing SNP interface
  [malloc] Avoid integer overflow for excessively large memory allocations
  [peerdist] Avoid NULL pointer dereference for plaintext blocks
  [http] Verify server port when reusing a pooled connection
  [efi] Reset root directory when installing EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
  [efi] Update to current EDK2 headers
  [efi] Import EFI_HII_FONT_PROTOCOL definitions
  [fbcon] Allow character height to be selected at runtime
  [fbcon] Move margin calculations to fbcon.c
  [console] Tidy up config/console.h
  [build] Generalise CONSOLE_VESAFB to CONSOLE_FRAMEBUFFER
  [efi] Add support for EFI_GRAPHICS_OUTPUT_PROTOCOL frame buffer consoles
  [dhcp] Reset start time when deferring discovery
  [dhcp] Limit maximum number of DHCP discovery deferrals
  [comboot] Reset console before starting COMBOOT executable
  [intel] Forcibly skip PHY reset on some models
  [intel] Correct definition of receive overrun bit
  [infiniband] Add definitions for FDR and EDR link speeds
  [infiniband] Add qword accessors for ib_guid and ib_gid
  [pci] Add definitions for PCI Express function level reset (FLR)
  [bitops] Fix definitions for big-endian devices
  [smsc95xx] Add driver for SMSC/Microchip LAN95xx USB Ethernet NICs
  [bitops] Provide BIT_QWORD_PTR()
  [efi] Add %.usb target for building EFI-bootable USB (or other