date:20170627

Re: [Qemu-devel] [PATCH v3 00/10] Clock framework API.

2017-06-27 Thread KONRAD Frederic




Le 06/23/2017 à 03:58 PM, Peter Maydell a écrit :

On 23 June 2017 at 14:07, KONRAD Frederic  wrote:

Le 06/23/2017 à 02:47 PM, Peter Maydell a écrit :

Each device "owns" its output clock objects, but input
clocks are just pointers to the clock object owned by the
device at the other end. In the board you wire up CI1 to C1,
and CI2 to C2 (using link properties I guess).
Then in device C you can implement the clock switching by
some kind of bind(s->CI1, &s->C3) call because you have
pointers to all the relevant clock objects.

As I understand it your current implementation makes not
just the output clocks C1 C2 C2 be clock objects, but also
the inputs CI1 CI2, so effectively each link from a clock
source to a clock sink has two objects involved.



Yes that makes sense but you won't have the name.


I don't think that's a big deal.


And one other thing I wanted to have is beeing able to refresh
the clock tree without refreshing everything. If you start a
refresh from the "pointer" and it is bind to other things you
will refresh the other devices as well. Maybe we don't care or we
can workaround that but..


The pointer is for your clock inputs -- when would you
want to start a refresh from that? I would expect
refreshes to only ever go downstream -- you update
the config of your clock outputs and things downstream
of them will update in turn.


I started with the goal in mind that the binding + the callback
can refresh themself without user intervention.

So actually the user only have to change the binding in case of a
clock selector.. Everything else is done by the framework.

For example if we want to change a multiplier through a register:

void register_write(..)
{
  /* Refresh related clock input. */
}

void clock_cb(..)
{
  return register_value * rate_in;
}

If we drop the clock first possibility:

void register_write(..)
{
  /* refresh all depending clock with
   * rate_in * register value.
   * Either this can be tricky as we need to know exactly which
   * clock need to be refreshed or we need to refresh everybody.
   * On the device example in the patch-set this can become a
   * mess.
   */
}

void clock_cb(..)
{
  return register_value * rate_in;
}

Second possibility:

void register_write(..)
{
  /* refresh the clock which is referenced as input.
   * This is easy BUT it will refresh all other devices bound to
   * this clock.
   */
}

void clock_cb(..)
{
  return register_value * rate_in;
}

Thanks,
Fred



thanks
-- PMM

Re: [Qemu-devel] [PATCH v5 2/4] net/socket: Convert error message to Error

2017-06-27 Thread Markus Armbruster

Suggest "net/socket: Convert several helper functions to Error".

Mao Zhongyi  writes:

> Currently, net_socket_mcast_create(), net_socket_fd_init_dgram() and
> net_socket_fd_init() use the function such as fprintf(), perror() to
> report an error message.
>
> Now, convert these functions to Error.
>
> Cc: jasow...@redhat.com
> Cc: arm...@redhat.com
> Signed-off-by: Mao Zhongyi 
> ---
>  net/socket.c | 76 
> ++--
>  1 file changed, 48 insertions(+), 28 deletions(-)
>
> diff --git a/net/socket.c b/net/socket.c
> index 53765bd..e136f87 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -209,7 +209,9 @@ static void net_socket_send_dgram(void *opaque)
>  }
>  }
>  
> -static int net_socket_mcast_create(struct sockaddr_in *mcastaddr, struct 
> in_addr *localaddr)
> +static int net_socket_mcast_create(struct sockaddr_in *mcastaddr,
> +   struct in_addr *localaddr,
> +   Error **errp)
>  {
>  struct ip_mreq imr;
>  int fd;
> @@ -221,8 +223,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
> *mcastaddr, struct in_addr
>  #endif
>  
>  if (!IN_MULTICAST(ntohl(mcastaddr->sin_addr.s_addr))) {
> -fprintf(stderr, "qemu: error: specified mcastaddr \"%s\" (0x%08x) "
> -"does not contain a multicast address\n",
> +error_setg(errp, "qemu: error: specified mcastaddr %s (0x%08x) "
> +"does not contain a multicast address",

Please drop the "qemu: error: " prefix.  More of the same below.

>  inet_ntoa(mcastaddr->sin_addr),
>  (int)ntohl(mcastaddr->sin_addr.s_addr));
>  return -1;
> @@ -230,7 +232,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
> *mcastaddr, struct in_addr
>  }
>  fd = qemu_socket(PF_INET, SOCK_DGRAM, 0);
>  if (fd < 0) {
> -perror("socket(PF_INET, SOCK_DGRAM)");
> +error_setg_errno(errp, errno, "Create socket(PF_INET, SOCK_DGRAM)"
> + " failed");

I'm not a native speaker, but "Create FOO failed" doesn't feel right to
me.  Where's the subject?  "Create" is a verb.  "Creation of FOO failed"
has a subject, but doesn't feel right.  What about "can't create
datagram socket"?

Same for many of the other new error messages.

>  return -1;
>  }
>  
> @@ -242,13 +245,15 @@ static int net_socket_mcast_create(struct sockaddr_in 
> *mcastaddr, struct in_addr
>  val = 1;
>  ret = qemu_setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
>  if (ret < 0) {
> -perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");
> +error_setg_errno(errp, errno, "Set the socket fd=%d attribute"
> + " SO_REUSEADDR failed", fd);

Why would the user want to know the value of @fd here?

Same for many of the other new error messages.

>  goto fail;
>  }
>  
>  ret = bind(fd, (struct sockaddr *)mcastaddr, sizeof(*mcastaddr));
>  if (ret < 0) {
> -perror("bind");
> +error_setg_errno(errp, errno, "Bind ip=%s to fd=%d failed",
> + inet_ntoa(mcastaddr->sin_addr), fd);
>  goto fail;
>  }
>  
> @@ -263,7 +268,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
> *mcastaddr, struct in_addr
>  ret = qemu_setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP,
>&imr, sizeof(struct ip_mreq));
>  if (ret < 0) {
> -perror("setsockopt(IP_ADD_MEMBERSHIP)");
> +error_setg_errno(errp, errno, "Add socket fd=%d to multicast group 
> %s"
> + " failed", fd, inet_ntoa(imr.imr_multiaddr));
>  goto fail;
>  }
>  
> @@ -272,7 +278,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
> *mcastaddr, struct in_addr
>  ret = qemu_setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
>&loop, sizeof(loop));
>  if (ret < 0) {
> -perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
> +error_setg_errno(errp, errno, "Force multicast message to loopback"
> + " failed");
>  goto fail;
>  }
>  
> @@ -281,7 +288,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
> *mcastaddr, struct in_addr
>  ret = qemu_setsockopt(fd, IPPROTO_IP, IP_MULTICAST_IF,
>localaddr, sizeof(*localaddr));
>  if (ret < 0) {
> -perror("setsockopt(IP_MULTICAST_IF)");
> +error_setg_errno(errp, errno, "Set the default network send "
> + "interfaec failed");

s/interfaec/interface/

>  goto fail;
>  }
>  }
> @@ -320,7 +328,8 @@ static NetClientInfo net_dgram_socket_info = {
>  static NetSocketState *net_socket_fd_init_dgram(NetClientState *peer,
>  const char *model,
>  const char *name,
> -

Re: [Qemu-devel] [PATCH v6 6/8] vmdk: New functions to assist allocating multiple clusters

2017-06-27 Thread Fam Zheng

On Mon, 06/05 13:22, Ashijeet Acharya wrote:
> +/*
> + * vmdk_handle_alloc
> + *
> + * Allocate new clusters for an area that either is yet unallocated or needs 
> a
> + * copy on write. If *cluster_offset is non_zero, clusters are only 
> allocated if
> + * the new allocation can match the specified host offset.

I don't think this matches the function body, the passed in *cluster_offset
value is ignored.

> + *
> + * Returns:
> + *   VMDK_OK:   if new clusters were allocated, *bytes may be decreased 
> if
> + *  the new allocation doesn't cover all of the requested 
> area.
> + *  *cluster_offset is updated to contain the offset of the
> + *  first newly allocated cluster.
> + *
> + *   VMDK_UNALLOC:  if no clusters could be allocated. *cluster_offset is 
> left
> + *  unchanged.
> + *
> + *   VMDK_ERROR:in error cases
> + */
> +static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
> + uint64_t offset, uint64_t *cluster_offset,
> + int64_t *bytes, VmdkMetaData *m_data,
> + bool allocate, uint32_t *alloc_clusters_counter)
> +{
> +int l1_index, l2_offset, l2_index;
> +uint32_t *l2_table;
> +uint32_t cluster_sector;
> +uint32_t nb_clusters;
> +bool zeroed = false;
> +uint64_t skip_start_bytes, skip_end_bytes;
> +int ret;
> +
> +ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
> +&l2_index, &l2_table);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +cluster_sector = le32_to_cpu(l2_table[l2_index]);
> +
> +skip_start_bytes = vmdk_find_offset_in_cluster(extent, offset);
> +/* Calculate the number of clusters to look for. Here we truncate the 
> last
> + * cluster, i.e. 1 less than the actual value calculated as we may need 
> to
> + * perform COW for the last one. */
> +nb_clusters = DIV_ROUND_UP(skip_start_bytes + *bytes,
> +   extent->cluster_sectors << BDRV_SECTOR_BITS) 
> - 1;
> +
> +nb_clusters = MIN(nb_clusters, extent->l2_size - l2_index);
> +assert(nb_clusters <= INT_MAX);
> +
> +/* update bytes according to final nb_clusters value */
> +if (nb_clusters != 0) {
> +*bytes = ((nb_clusters * extent->cluster_sectors) << 
> BDRV_SECTOR_BITS)
> + - skip_start_bytes;
> +} else {
> +nb_clusters = 1;
> +}
> +*alloc_clusters_counter += nb_clusters;
> +skip_end_bytes = skip_start_bytes + MIN(*bytes,
> + extent->cluster_sectors * BDRV_SECTOR_SIZE
> +- skip_start_bytes);

I don't understand the MIN part, shouldn't skip_end_bytes simply be
skip_start_bytes + *bytes?

> +
> +if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
> +zeroed = true;
> +}
> +
> +if (!cluster_sector || zeroed) {
> +if (!allocate) {
> +return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
> +}
> +
> +cluster_sector = extent->next_cluster_sector;
> +extent->next_cluster_sector += extent->cluster_sectors
> +* nb_clusters;
> +
> +ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
> +   offset, skip_start_bytes,
> +   skip_end_bytes);
> +if (ret < 0) {
> +return ret;
> +}
> +if (m_data) {
> +m_data->valid = 1;
> +m_data->l1_index = l1_index;
> +m_data->l2_index = l2_index;
> +m_data->l2_offset = l2_offset;
> +m_data->l2_cache_entry = &l2_table[l2_index];
> +m_data->nb_clusters = nb_clusters;
> +}
> +}
> +*cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
> +return VMDK_OK;
> +}
> +
> +/*
> + * vmdk_alloc_clusters
> + *
> + * For a given offset on the virtual disk, find the cluster offset in vmdk
> + * file. If the offset is not found, allocate a new cluster.
> + *
> + * If the cluster is newly allocated, m_data->nb_clusters is set to the 
> number
> + * of contiguous clusters that have been allocated. In this case, the other
> + * fields of m_data are valid and contain information about the first 
> allocated
> + * cluster.
> + *
> + * Returns:
> + *
> + *   VMDK_OK:   on success and @cluster_offset was set
> + *
> + *   VMDK_UNALLOC:  if no clusters were allocated and @cluster_offset is
> + *  set to zero
> + *
> + *   VMDK_ERROR:in error cases
> + */
> +static int vmdk_alloc_clusters(BlockDriverState *bs,
> +   VmdkExtent *extent,
> +   VmdkMetaData *m_data, uint64_t offset,
> +   bool allocate, uint64_t *cluster_offset,
> +   int64_t bytes,
> +

Re: [Qemu-devel] [PATCH v5 3/4] net/net: Convert parse_host_port() to Error

2017-06-27 Thread Markus Armbruster

Mao Zhongyi  writes:

> Cc: berra...@redhat.com
> Cc: kra...@redhat.com
> Cc: pbonz...@redhat.com
> Cc: jasow...@redhat.com
> Cc: arm...@redhat.com
> Signed-off-by: Mao Zhongyi 
> ---
>  include/qemu/sockets.h |  3 ++-
>  net/net.c  | 22 +-
>  net/socket.c   | 19 ++-
>  3 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
> index 5c326db..78e2b30 100644
> --- a/include/qemu/sockets.h
> +++ b/include/qemu/sockets.h
> @@ -53,7 +53,8 @@ void socket_listen_cleanup(int fd, Error **errp);
>  int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
>  
>  /* Old, ipv4 only bits.  Don't use for new code. */
> -int parse_host_port(struct sockaddr_in *saddr, const char *str);
> +int parse_host_port(struct sockaddr_in *saddr, const char *str,
> +Error **errp);
>  int socket_init(void);
>  
>  /**
> diff --git a/net/net.c b/net/net.c
> index 6235aab..884e3ac 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -100,7 +100,8 @@ static int get_str_sep(char *buf, int buf_size, const 
> char **pp, int sep)
>  return 0;
>  }
>  
> -int parse_host_port(struct sockaddr_in *saddr, const char *str)
> +int parse_host_port(struct sockaddr_in *saddr, const char *str,
> +Error **errp)
>  {
>  char buf[512];
>  struct hostent *he;
> @@ -108,24 +109,35 @@ int parse_host_port(struct sockaddr_in *saddr, const 
> char *str)
>  int port;
>  
>  p = str;
> -if (get_str_sep(buf, sizeof(buf), &p, ':') < 0)
> +if (get_str_sep(buf, sizeof(buf), &p, ':') < 0) {
> +error_setg(errp, "The address should contain ':', for example: "
> +   "=230.0.0.1:1234");

Suggest "Host address '%s' should ..." like you do in the next error message.

The = is confusing.  Do we need an example here?  The other error
messages in this function apparently don't.

What about "host address '%s' doesn't contain ':' separating host from
port" or "can't find ':' separating host from port in host address
'%s'"?


>  return -1;
> +}
>  saddr->sin_family = AF_INET;
>  if (buf[0] == '\0') {
>  saddr->sin_addr.s_addr = 0;
>  } else {
>  if (qemu_isdigit(buf[0])) {
> -if (!inet_aton(buf, &saddr->sin_addr))
> +if (!inet_aton(buf, &saddr->sin_addr)) {
> +error_setg(errp, "Host address '%s' is not a valid "
> +   "IPv4 address", buf);
>  return -1;
> +}
>  } else {
> -if ((he = gethostbyname(buf)) == NULL)
> +he = gethostbyname(buf);
> +if (he == NULL) {
> +error_setg(errp, "Specified hostname is error.");

No.  Suggest "can't resolve host address '%s'.  This error message still
lacks detail, but I'm not sure hstrerror() is sufficiently portable.

Outside this patch's scope: gethostbyname() is obsolete.  Applications
should use getaddrinfo() instead.  Comes with gai_strerror().

>  return - 1;
> +}
>  saddr->sin_addr = *(struct in_addr *)he->h_addr;
>  }
>  }
>  port = strtol(p, (char **)&r, 0);
> -if (r == p)
> +if (r == p) {
> +error_setg(errp, "The port number is illegal");

Suggest "Port number '%s' is invalid".

>  return -1;
> +}
>  saddr->sin_port = htons(port);
>  return 0;
>  }
> diff --git a/net/socket.c b/net/socket.c
> index e136f87..a875205 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -501,9 +501,12 @@ static int net_socket_listen_init(NetClientState *peer,
>  NetSocketState *s;
>  struct sockaddr_in saddr;
>  int fd, ret;
> +Error *err = NULL;
>  
> -if (parse_host_port(&saddr, host_str) < 0)
> +if (parse_host_port(&saddr, host_str, &err) < 0) {
> +error_report_err(err);
>  return -1;
> +}
>  
>  fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
>  if (fd < 0) {
> @@ -548,8 +551,10 @@ static int net_socket_connect_init(NetClientState *peer,
>  struct sockaddr_in saddr;
>  Error *err = NULL;
>  
> -if (parse_host_port(&saddr, host_str) < 0)
> +if (parse_host_port(&saddr, host_str, &err) < 0) {
> +error_report_err(err);
>  return -1;
> +}
>  
>  fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
>  if (fd < 0) {
> @@ -601,8 +606,10 @@ static int net_socket_mcast_init(NetClientState *peer,
>  struct in_addr localaddr, *param_localaddr;
>  Error *err = NULL;
>  
> -if (parse_host_port(&saddr, host_str) < 0)
> +if (parse_host_port(&saddr, host_str, &err) < 0) {
> +error_report_err(err);
>  return -1;
> +}
>  
>  if (localaddr_str != NULL) {
>  if (inet_aton(localaddr_str, &localaddr) == 0)
> @@ -644,11 +651,13 @@ static int net_socket_udp_init(NetClientState *peer,
>  struct sockaddr_in laddr, raddr;
>  Error *err = NU

Re: [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for multiple clusters

2017-06-27 Thread Fam Zheng

On Mon, 06/05 13:22, Ashijeet Acharya wrote:
> @@ -1876,6 +1942,13 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t 
> offset,
>  offset += n_bytes;
>  bytes_done += n_bytes;
>  
> +while (m_data->next != NULL) {

If you do

   while (m_data) {

> +VmdkMetaData *next;
> +next = m_data->next;
> +g_free(m_data);
> +m_data = next;
> +}
> +
>  /* update CID on the first write every time the virtual disk is
>   * opened */
>  if (!s->cid_updated) {
> @@ -1886,6 +1959,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t 
> offset,
>  s->cid_updated = true;
>  }
>  }
> +g_free(m_data);

then you can remove this line.

>  return 0;
>  }
>  
> -- 
> 2.6.2
>

Re: [Qemu-devel] [RFC PATCH 0/4] travis: run all coccinelle scripts

2017-06-27 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> Another item from my 'automated testing' list: use travis-ci to run coccinelle
> scripts. This series is more of a PoC. The idea would be to run it once a day
> only on /master.
>
> Patch 1 is here only to speedup travis testing.
>
> Patch 3 add a script which run each cocci script sequencially. If the script
> modified any file from the repo, changes are commited. If the script generated
> some output (i.e. using python), this output is logged in an empty commit.
> Spatch is run thru a debian-based Docker image.
>
> Patch 4 is the travis job: it calls the previous script. To respect travis 
> time
> limit timeout, each script is limited to <10min. If any commit were
> generated, they are pushed to my gh-repo:
>
> https://github.com/philmd/qemu/compare/travis-cocci_v1...philmd:autogenerated-coccinelle-20170625-126
>
> Build output (Ran for 23 min 7 sec):
> https://travis-ci.org/philmd/qemu/builds/246848085
>
> Any idea is welcome :)

Are we sure all the Coccinelle scripts always produce desirable and
correct results?  I often use Coccinelle to do the tedious 95% of the
job, followed by a bit of manual cleanup.  What if manual cleanup is
required to make things compile cleanly?  Say because the transformation
leaves unused variables behind.  What if the transformation is wanted in
19 out of 20 cases?  Say because it replaces a bulky expression by the
helper call created for this purpose (good), including in the helper
function (bad)?

Re: [Qemu-devel] [RFC 1/5] vfio: introduce a new VFIO region for migration support

2017-06-27 Thread Zhang, Yulei



> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, June 27, 2017 4:19 AM
> To: Zhang, Yulei 
> Cc: qemu-devel@nongnu.org; Tian, Kevin ;
> joonas.lahti...@linux.intel.com; zhen...@linux.intel.com; Zheng, Xiao
> ; Wang, Zhi A 
> Subject: Re: [Qemu-devel] [RFC 1/5] vfio: introduce a new VFIO region for
> migration support
> 
> On Tue,  4 Apr 2017 18:26:57 +0800
> Yulei Zhang  wrote:
> 
> > New VFIO region VFIO_PCI_DEVICE_STATE_REGION_INDEX is added to
> fetch
> > and restore the pci device status during the live migration.
> >
> > Signed-off-by: Yulei Zhang 
> > ---
> >  hw/vfio/pci.c  | 17 +
> >  hw/vfio/pci.h  |  1 +
> >  linux-headers/linux/vfio.h |  5 -
> >  3 files changed, 22 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 03a3d01..bf2e0ff 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2360,6 +2360,23 @@ static void vfio_populate_device(VFIOPCIDevice
> *vdev, Error **errp)
> >  QLIST_INIT(&vdev->bars[i].quirks);
> >  }
> >
> > +/* device state region setup */
> > +if (vbasedev->flags & VFIO_DEVICE_FLAGS_MIGRATABLE) {
> > +char *name = g_strdup_printf("%s BAR %d", vbasedev->name,
> VFIO_PCI_DEVICE_STATE_REGION_INDEX);
> > +
> > +ret = vfio_region_setup(OBJECT(vdev), vbasedev,
> > +&vdev->device_state.region,
> VFIO_PCI_DEVICE_STATE_REGION_INDEX, name);
> > +g_free(name);
> > +
> > +if (ret) {
> > +error_setg_errno(errp, -ret, "failed to get region %d info",
> > + VFIO_PCI_DEVICE_STATE_REGION_INDEX);
> > +return;
> > +}
> > +
> > +QLIST_INIT(&vdev->device_state.quirks);
> > +}
> > +
> >  ret = vfio_get_region_info(vbasedev,
> > VFIO_PCI_CONFIG_REGION_INDEX, ®_info);
> >  if (ret) {
> > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> > index a8366bb..bd98618 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -115,6 +115,7 @@ typedef struct VFIOPCIDevice {
> >  int interrupt; /* Current interrupt type */
> >  VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
> >  VFIOVGA *vga; /* 0xa, 0x3b0, 0x3c0 */
> > +VFIOBAR device_state;
> 
> But it's not a BAR...
> 
> >  void *igd_opregion;
> >  PCIHostDeviceAddress host;
> >  EventNotifier err_notifier;
> > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > index 531cb2e..c87d05c 100644
> > --- a/linux-headers/linux/vfio.h
> > +++ b/linux-headers/linux/vfio.h
> > @@ -198,6 +198,8 @@ struct vfio_device_info {
> >  #define VFIO_DEVICE_FLAGS_PCI  (1 << 1)/* vfio-pci device */
> >  #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)/* vfio-platform
> device */
> >  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)   /* vfio-amba device */
> > +#define VFIO_DEVICE_FLAGS_CCW   (1 << 4)/* vfio-ccw device */
> > +#define VFIO_DEVICE_FLAGS_MIGRATABLE (1 << 5)  /* Device supports
> migrate */
> > __u32   num_regions;/* Max region index + 1 */
> > __u32   num_irqs;   /* Max IRQ index + 1 */
> >  };
> > @@ -433,7 +435,8 @@ enum {
> >  * between described ranges are unimplemented.
> >  */
> > VFIO_PCI_VGA_REGION_INDEX,
> > -   VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9
> use */
> > +   VFIO_PCI_DEVICE_STATE_REGION_INDEX,
> > +   VFIO_PCI_NUM_REGIONS = 10 /* Fixed user ABI, region indexes >=9
> use */
> >  /* device specific cap to define content. */
> >  };
> >
> 
> Nak, please read the comment on the line you changed.  We're not adding
> any more static region indexes, anything new should be added with
> device specific capabilities.  Look at how we do opregions and various
> other IGD related regions.  There's also no reason to add a flag, the
> existence of the device specific region should indicate that it
> supports migration.  Also, we call these device specific, but I would
> still encourage a shared format, userspace doesn't want to support a
> different mechanism for each device.  Thanks,
> 
> Alex

Thanks, Alex. I will revise it accordingly.

Re: [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for multiple clusters

2017-06-27 Thread Fam Zheng

On Mon, 06/05 13:22, Ashijeet Acharya wrote:
> Include a next pointer in VmdkMetaData struct to point to the previous
> allocated L2 table. Modify vmdk_L2update to start updating metadata for
> allocation of multiple clusters at once.
> 
> Signed-off-by: Ashijeet Acharya 
> ---
>  block/vmdk.c | 128 
> ++-
>  1 file changed, 101 insertions(+), 27 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index b671dc9..9fa2414 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -137,6 +137,8 @@ typedef struct VmdkMetaData {
>  int valid;
>  uint32_t *l2_cache_entry;
>  uint32_t nb_clusters;
> +uint32_t offset;
> +struct VmdkMetaData *next;
>  } VmdkMetaData;
>  
>  typedef struct VmdkGrainMarker {
> @@ -1116,34 +1118,87 @@ exit:
>  return ret;
>  }
>  
> -static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
> - uint32_t offset)
> +static int vmdk_alloc_cluster_link_l2(VmdkExtent *extent,
> +  VmdkMetaData *m_data, bool zeroed)
>  {
> -offset = cpu_to_le32(offset);
> +int i;
> +uint32_t offset, temp_offset;
> +int *l2_table_array;
> +int l2_array_size;
> +
> +if (zeroed) {
> +temp_offset = VMDK_GTE_ZEROED;
> +} else {
> +temp_offset = m_data->offset;
> +}
> +
> +l2_array_size = sizeof(uint32_t) * m_data->nb_clusters;
> +l2_table_array = qemu_try_blockalign(extent->file->bs,
> + QEMU_ALIGN_UP(l2_array_size,
> +   BDRV_SECTOR_SIZE));
> +if (l2_table_array == NULL) {
> +return VMDK_ERROR;
> +}
> +memset(l2_table_array, 0, QEMU_ALIGN_UP(l2_array_size, 
> BDRV_SECTOR_SIZE));
>  /* update L2 table */
> +offset = temp_offset;
> +for (i = 0; i < m_data->nb_clusters; i++) {
> +l2_table_array[i] = cpu_to_le32(offset);
> +if (!zeroed) {
> +offset += 128;

s/128/extent->cluster_sectors/?

> +}
> +}
>  if (bdrv_pwrite_sync(extent->file,
> -((int64_t)m_data->l2_offset * 512)
> -+ (m_data->l2_index * sizeof(offset)),
> -&offset, sizeof(offset)) < 0) {
> + ((int64_t)m_data->l2_offset * 512)
> + + ((m_data->l2_index) * sizeof(offset)),
> + l2_table_array, l2_array_size) < 0) {
>  return VMDK_ERROR;
>  }
>  /* update backup L2 table */
>  if (extent->l1_backup_table_offset != 0) {
>  m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
>  if (bdrv_pwrite_sync(extent->file,
> -((int64_t)m_data->l2_offset * 512)
> -+ (m_data->l2_index * sizeof(offset)),
> -&offset, sizeof(offset)) < 0) {
> + ((int64_t)m_data->l2_offset * 512)
> + + ((m_data->l2_index) * sizeof(offset)),
> + l2_table_array, l2_array_size) < 0) {
>  return VMDK_ERROR;
>  }
>  }
> +
> +offset = temp_offset;
>  if (m_data->l2_cache_entry) {
> -*m_data->l2_cache_entry = offset;
> +for (i = 0; i < m_data->nb_clusters; i++) {
> +*m_data->l2_cache_entry = cpu_to_le32(offset);
> +m_data->l2_cache_entry++;
> +
> +if (!zeroed) {
> +offset += 128;

Ditto.

> +}
> +}
>  }
>  
> +qemu_vfree(l2_table_array);
>  return VMDK_OK;
>  }
>  
> +static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
> + bool zeroed)
> +{
> +int ret;
> +
> +while (m_data->next != NULL) {
> +
> +ret = vmdk_alloc_cluster_link_l2(extent, m_data, zeroed);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +m_data = m_data->next;
> + }
> +
> + return VMDK_OK;
> +}
> +
>  /*
>   * vmdk_l2load
>   *
> @@ -1261,9 +1316,10 @@ static int get_cluster_table(VmdkExtent *extent, 
> uint64_t offset,
>   *
>   *   VMDK_ERROR:in error cases
>   */
> +
>  static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>   uint64_t offset, uint64_t *cluster_offset,
> - int64_t *bytes, VmdkMetaData *m_data,
> + int64_t *bytes, VmdkMetaData **m_data,
>   bool allocate, uint32_t *alloc_clusters_counter)
>  {
>  int l1_index, l2_offset, l2_index;
> @@ -1272,6 +1328,7 @@ static int vmdk_handle_alloc(BlockDriverState *bs, 
> VmdkExtent *extent,
>  uint32_t nb_clusters;
>  bool zeroed = false;
>  uint64_t skip_start_bytes, skip_end_bytes;
> +VmdkMetaData *old_m_data;
>  int ret;
>  
>  ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
> @@ -1323,13 +1380,21 @@

Re: [Qemu-devel] [PATCH v6 8/8] vmdk: Make vmdk_get_cluster_offset() return cluster offset only

2017-06-27 Thread Fam Zheng

On Mon, 06/05 13:22, Ashijeet Acharya wrote:
> vmdk_alloc_clusters() introduced earlier now handles the task of
> allocating clusters and performing COW when needed. Thus we can change
> vmdk_get_cluster_offset() to stick to the sole purpose of returning
> cluster offset using sector number. Update the changes at all call
> sites.
> 
> Signed-off-by: Ashijeet Acharya 
> ---
>  block/vmdk.c | 56 
>  1 file changed, 12 insertions(+), 44 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 9fa2414..accf1c3 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1485,25 +1485,16 @@ static int vmdk_alloc_clusters(BlockDriverState *bs,
>   * For flat extents, the start offset as parsed from the description file is
>   * returned.
>   *
> - * For sparse extents, look up in L1, L2 table. If allocate is true, return 
> an
> - * offset for a new cluster and update L2 cache. If there is a backing file,
> - * COW is done before returning; otherwise, zeroes are written to the 
> allocated
> - * cluster. Both COW and zero writing skips the sector range
> - * [@skip_start_sector, @skip_end_sector) passed in by caller, because caller
> - * has new data to write there.
> + * For sparse extents, look up the L1, L2 table.
>   *
>   * Returns: VMDK_OK if cluster exists and mapped in the image.
> - *  VMDK_UNALLOC if cluster is not mapped and @allocate is false.
> - *  VMDK_ERROR if failed.
> + *  VMDK_UNALLOC if cluster is not mapped.
> + *  VMDK_ERROR if failed
>   */
>  static int vmdk_get_cluster_offset(BlockDriverState *bs,
> VmdkExtent *extent,
> -   VmdkMetaData *m_data,
> uint64_t offset,
> -   bool allocate,
> -   uint64_t *cluster_offset,
> -   uint64_t skip_start_bytes,
> -   uint64_t skip_end_bytes)
> +   uint64_t *cluster_offset)
>  {
>  int l1_index, l2_offset, l2_index;
>  uint32_t *l2_table;
> @@ -1528,31 +1519,9 @@ static int vmdk_get_cluster_offset(BlockDriverState 
> *bs,
>  }
>  
>  if (!cluster_sector || zeroed) {
> -if (!allocate) {
> -return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
> -}
> -
> -cluster_sector = extent->next_cluster_sector;
> -extent->next_cluster_sector += extent->cluster_sectors;
> -
> -/* First of all we write grain itself, to avoid race condition
> - * that may to corrupt the image.
> - * This problem may occur because of insufficient space on host disk
> - * or inappropriate VM shutdown.
> - */
> -ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
> -offset, skip_start_bytes, skip_end_bytes);
> -if (ret) {
> -return ret;
> -}
> -if (m_data) {
> -m_data->valid = 1;
> -m_data->l1_index = l1_index;
> -m_data->l2_index = l2_index;
> -m_data->l2_offset = l2_offset;
> -m_data->l2_cache_entry = &l2_table[l2_index];
> -}
> +return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
>  }
> +
>  *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
>  return VMDK_OK;
>  }
> @@ -1595,9 +1564,7 @@ static int64_t coroutine_fn 
> vmdk_co_get_block_status(BlockDriverState *bs,
>  return 0;
>  }
>  qemu_co_mutex_lock(&s->lock);
> -ret = vmdk_get_cluster_offset(bs, extent, NULL,
> -  sector_num * 512, false, &offset,
> -  0, 0);
> +ret = vmdk_get_cluster_offset(bs, extent, sector_num * 512, &offset);
>  qemu_co_mutex_unlock(&s->lock);
>  
>  index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
> @@ -1788,13 +1755,14 @@ vmdk_co_preadv(BlockDriverState *bs, uint64_t offset, 
> uint64_t bytes,
>  ret = -EIO;
>  goto fail;
>  }
> -ret = vmdk_get_cluster_offset(bs, extent, NULL,
> -  offset, false, &cluster_offset, 0, 0);
> +
>  offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
>  
>  n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
>   - offset_in_cluster);
>  
> +ret = vmdk_get_cluster_offset(bs, extent, offset, &cluster_offset);
> +
>  if (ret != VMDK_OK) {
>  /* if not allocated, try to read from parent image, if exist */
>  if (bs->backing && ret != VMDK_ZEROED) {
> @@ -2541,9 +2509,9 @@ static int vmdk_check(BlockDriverState *bs, 
> BdrvCheckResult *result,
>  sector_num);
>  break;
>  }
> -ret = vmdk_get_cluster_offset(bs, extent, NULL,
> +

Re: [Qemu-devel] [PATCH v4 2/3] Add memfd based hostmem

2017-06-27 Thread Marc-André Lureau

Hi Eduardo

On Fri, Jun 23, 2017 at 11:09 PM Eduardo Habkost 
wrote:

> On Wed, Jun 21, 2017 at 04:02:18PM +0200, Marc-André Lureau wrote:
> > Add a new memory backend, similar to hostmem-file, except that it
> > doesn't need to create files. It also enforces memory sealing.
> >
> > This backend is mainly useful for sharing the memory with other
> > processes.
>
> How exactly can the memfd be used to share memory?  Is there an existing
> mechanism for sharing the memfd file descriptor with another process



Since there is no backing file, the traditional mechanism is by passing fd,
via socket ancillary data or forking etc.. Both ivshmem and vhost-user have
such messages, with eventually details for the memory map usage.


> >
> > Note that Linux supports transparent huge-pages of shmem/memfd memory
> > since 4.8. It is relatively easier to set up THP than a dedicate
> > hugepage mount point by using "madvise" in
> > /sys/kernel/mm/transparent_hugepage/shmem_enabled.
> >
> > Usage:
> > -object memory-backend-memfd,id=mem1,size=1G
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  backends/hostmem-memfd.c | 67
> 
> >  backends/Makefile.objs   |  2 ++
> >  qemu-options.hx  | 11 
> >  3 files changed, 80 insertions(+)
> >  create mode 100644 backends/hostmem-memfd.c
> >
> [...]
>
> --
> Eduardo
>
> --
Marc-André Lureau

Re: [Qemu-devel] [RFC PATCH 0/2] Enhance block status when crossing EOF

2017-06-27 Thread Fam Zheng

On Thu, 05/04 21:14, Eric Blake wrote:
> Thus, this is a followup to that series, but I'm also okay if we
> think it is too much maintenance compared to the potential gains,
> and decide to drop it after all.

The comments are good enough and I like how this makes the function interface a
bit more powerful. Fixed the typo as pointed out by Stefan and applied. Thanks!

Fam

Re: [Qemu-devel] [PATCH v4 3/3] migration: add bitmap for received page

2017-06-27 Thread Alexey

On Tue, Jun 27, 2017 at 12:03:10PM +0800, Peter Xu wrote:
> On Mon, Jun 26, 2017 at 11:35:20AM +0300, Alexey Perevalov wrote:
> > This patch adds ability to track down already received
> > pages, it's necessary for calculation vCPU block time in
> > postcopy migration feature, maybe for restore after
> > postcopy migration failure.
> > Also it's necessary to solve shared memory issue in
> > postcopy livemigration. Information about received pages
> > will be transferred to the software virtual bridge
> > (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> > already received pages. fallocate syscall is required for
> > remmaped shared memory, due to remmaping itself blocks
> > ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> > error (struct page is exists after remmap).
> > 
> > Bitmap is placed into RAMBlock as another postcopy/precopy
> > related bitmaps.
> > 
> > Signed-off-by: Alexey Perevalov 
> 
> Mostly good to me, some minor nits only...
> 
> [...]
> 
> >  static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> > -void *from_addr, uint64_t pagesize)
> > +   void *from_addr, uint64_t pagesize, 
> > RAMBlock *rb)
> >  {
> > +int ret;
> >  if (from_addr) {
> >  struct uffdio_copy copy_struct;
> >  copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
> >  copy_struct.src = (uint64_t)(uintptr_t)from_addr;
> >  copy_struct.len = pagesize;
> >  copy_struct.mode = 0;
> > -return ioctl(userfault_fd, UFFDIO_COPY, ©_struct);
> > +ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct);
> >  } else {
> >  struct uffdio_zeropage zero_struct;
> >  zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
> >  zero_struct.range.len = pagesize;
> >  zero_struct.mode = 0;
> > -return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> > +ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> > +}
> > +/* received page isn't feature of blocktime calculation,
> > + * it's more general entity, so keep it here,
> > + * but gup betwean two following operation could be high,
> > + * and in this case blocktime for such small interval will be lost */
> 
> I would drop this comment for this patch. It didn't help me to be
> clearer on the code but a bit more messy... Maybe it suites for some
> place in the blocktime series? Not sure.
yes, here could be a problem in stats, so it worth to mention about
it in stats series.
> 
> [...]
> 
> > +void ramblock_recv_map_init(void)
> > +{
> > +RAMBlock *rb;
> > +
> > +RAMBLOCK_FOREACH(rb) {
> > +unsigned long pages;
> > +pages = rb->max_length >> TARGET_PAGE_BITS;
> > +assert(!rb->receivedmap);
> > +rb->receivedmap = bitmap_new(pages);
> 
> I'll prefer removing pages variable since used only once.
no problem )
> 
> [...]
> 
> > +static void ramblock_recv_bitmap_clear_range(uint64_t start, size_t length,
> > + RAMBlock *rb)
> > +{
> > +int i, range_count;
> > +long nr_bit = start >> TARGET_PAGE_BITS;
> > +range_count = length >> TARGET_PAGE_BITS;
> > +for (i = 0; i < range_count; i++) {
> > +clear_bit(nr_bit, rb->receivedmap);
> > +nr_bit += 1;
> 
> (Dave commented this one)
I agree with Dave, looks like I invented bitmap_clear ;)
> 
> [...]
> 
> > @@ -2513,6 +2560,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> > version_id)
> >  ram_addr_t addr, total_ram_bytes;
> >  void *host = NULL;
> >  uint8_t ch;
> > +RAMBlock *rb = NULL;
> >  
> >  addr = qemu_get_be64(f);
> >  flags = addr & ~TARGET_PAGE_MASK;
> > @@ -2520,15 +2568,15 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> > version_id)
> >  
> >  if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
> >   RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
> > -RAMBlock *block = ram_block_from_stream(f, flags);
> > +rb = ram_block_from_stream(f, flags);
> >  
> > -host = host_from_ram_block_offset(block, addr);
> > +host = host_from_ram_block_offset(rb, addr);
> >  if (!host) {
> >  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> >  ret = -EINVAL;
> >  break;
> >  }
> 
> IMHO it's ok to set the bit once here.  Thanks,
yes, this is common place for all copying operations here.
> 
> > -trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
> > +trace_ram_load_loop(rb->idstr, (uint64_t)addr, flags, host);
> >  }
> >  
> >  switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
> > @@ -2582,10 +2630,12 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> > version_id)
> >  
> >  case RAM_SAVE_FLAG_ZERO:
> >  ch = qemu_get_byte(f);
> > +

Re: [Qemu-devel] [Qemu-arm] [PATCH 0/4] cpu: Implement cpu_generic_new()

2017-06-27 Thread Igor Mammedov

On Mon, 26 Jun 2017 23:33:48 -0300
Eduardo Habkost  wrote:

> On Mon, Jun 26, 2017 at 02:28:13PM +0100, Alex Bennée wrote:
> > 
> > Peter Maydell  writes:
> >   
> > > This patchset adds a new function cpu_generic_new()
> > > which is similar to cpu_generic_init() except that it
> > > does not realize the created CPU object. This means that
> > > board code can do a "new cpu; set QOM properties; realize"
> > > sequence without having to do all the work of splitting
> > > the CPU model string and calling parse_features by hand.  
> > 
> > 
> > Just going through my review queue and I see this needs re-basing. Is
> > there going to be another rev or was a different approach suggested?  
> 
> The right way to go is not clear.  We know we want to remove duplication
> of CPU creation code, but probably we should first refactor the -cpu
> parsing code, so parsing happens: 1) only once; 2) earlier in main(),
> preferably before machine->init() runs; 3) inside generic code instead
> of arch-specific code; 4) preferably using the QemuOpts parser instead
> of the current strtok()-based custom parsers.
> 
> After the parsing code mess is sorted out, writing a generic CPU
> creation wrapper will probably be easier (and safer).

Also there is legacy cpu features parsing/handling in sparc target,
so we might need to clean it up and convert to property based features
(as have been done for i386) before making generic cpu creation.

Re: [Qemu-devel] [PATCH 06/16] tcg: Add temp_global bit to TCGTemp

2017-06-27 Thread Alex Bennée


Richard Henderson  writes:

> This avoids needing to test the index of a temp against nb_globals.
>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/optimize.c | 15 ---
>  tcg/tcg.c  | 11 ---
>  tcg/tcg.h  | 12 
>  3 files changed, 24 insertions(+), 14 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d8c3a7e..55f9e83 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -116,25 +116,26 @@ static TCGOpcode op_to_movi(TCGOpcode op)
>  }
>  }
>
> -static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
> +static TCGArg find_better_copy(TCGContext *s, TCGArg arg)
>  {
> +TCGTemp *ts = arg_temp(arg);
>  TCGArg i;
>
>  /* If this is already a global, we can't do better. */
> -if (temp < s->nb_globals) {
> -return temp;
> +if (ts->temp_global) {
> +return arg;
>  }
>
>  /* Search for a global first. */
> -for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
> +for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
>  if (i < s->nb_globals) {
>  return i;
>  }
>  }
>
>  /* If it is a temp, search for a temp local. */
> -if (!arg_temp(temp)->temp_local) {
> -for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) 
> {
> +if (!ts->temp_local) {
> +for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
>  if (s->temps[i].temp_local) {
>  return i;
>  }
> @@ -142,7 +143,7 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
>  }
>
>  /* Failure to find a better representation, return the same temp. */
> -return temp;
> +return arg;
>  }
>
>  static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 068ac51..0bb88b1 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -489,9 +489,14 @@ static inline TCGTemp *tcg_temp_alloc(TCGContext *s)
>
>  static inline TCGTemp *tcg_global_alloc(TCGContext *s)
>  {
> +TCGTemp *ts;
> +
>  tcg_debug_assert(s->nb_globals == s->nb_temps);
>  s->nb_globals++;
> -return tcg_temp_alloc(s);
> +ts = tcg_temp_alloc(s);
> +ts->temp_global = 1;
> +
> +return ts;
>  }
>
>  static int tcg_global_reg_new_internal(TCGContext *s, TCGType type,
> @@ -967,7 +972,7 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char 
> *buf, int buf_size,
>  {
>  int idx = temp_idx(s, ts);
>
> -if (idx < s->nb_globals) {
> +if (ts->temp_global) {
>  pstrcpy(buf, buf_size, ts->name);
>  } else if (ts->temp_local) {
>  snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
> @@ -1905,7 +1910,7 @@ static void temp_free_or_dead(TCGContext *s, TCGTemp 
> *ts, int free_or_dead)
>  }
>  ts->val_type = (free_or_dead < 0
>  || ts->temp_local
> -|| temp_idx(s, ts) < s->nb_globals
> +|| ts->temp_global
>  ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
>  }
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 70d9fda..3b35344 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -586,10 +586,14 @@ typedef struct TCGTemp {
>  unsigned int indirect_base:1;
>  unsigned int mem_coherent:1;
>  unsigned int mem_allocated:1;
> -unsigned int temp_local:1; /* If true, the temp is saved across
> -  basic blocks. Otherwise, it is not
> -  preserved across basic blocks. */
> -unsigned int temp_allocated:1; /* never used for code gen */
> +/* If true, the temp is saved across both basic blocks and
> +   translation blocks.  */
> +unsigned int temp_global:1;
> +/* If true, the temp is saved across basic blocks but dead
> +   at the end of translation blocks.  If false, the temp is
> +   dead at the end of basic blocks.  */
> +unsigned int temp_local:1;
> +unsigned int temp_allocated:1;

This is where my knowledge of the TCG internals gets slightly confused.
As far as I'm aware all our TranslationBlocks are Basic Blocks - they
don't have any branches until the end of the block. What is the
distinction here?

Is a temp_global truly global? I thought the guest state was fully
rectified by the time we leave the basic block.

>
>  tcg_target_long val;
>  struct TCGTemp *mem_base;


--
Alex Bennée

Re: [Qemu-devel] [PATCH 1/5] virtio-pci: use ioeventfd even when KVM is disabled

2017-06-27 Thread Fam Zheng

On Thu, 06/15 17:38, Stefan Hajnoczi wrote:
> Old kvm.ko versions only supported a tiny number of ioeventfds so
> virtio-pci avoids ioeventfds when kvm_has_many_ioeventfds() returns 0.
> 
> Do not check kvm_has_many_ioeventfds() when KVM is disabled since it
> always returns 0.  Since commit 8c56c1a592b5092d91da8d8943c1d6462a6f
> ("memory: emulate ioeventfd") it has been possible to use ioeventfds in
> qtest or TCG mode.
> 
> This patch makes -device virtio-blk-pci,iothread=iothread0 work even
> when KVM is disabled.
> 
> I have tested that virtio-blk-pci works under TCG both with and without
> iothread.
> 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Stefan Hajnoczi 

This one was dropped out from Kevin's pull request but the iotest case update on
068 which depends on it is merged. Now the test fails for me:

068 2s ... - output mismatch (see 068.out.bad)
--- /stor/work/qemu/tests/qemu-iotests/068.out  2017-06-27 16:22:55.003815188 
+0800
+++ 068.out.bad 2017-06-27 16:41:37.903626275 +0800
@@ -12,9 +12,8 @@
 === Saving and reloading a VM state to/from a qcow2 image (-object 
iothread,id=iothread0 -set device.hba0.iothread=iothread0) ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
+qemu-system-x86_64: -device virtio-scsi-pci,id=hba0: ioeventfd is required for 
iothread
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) savevm 0
-(qemu) quit
+(qemu) qemu-system-x86_64: -device virtio-scsi-pci,id=hba0: ioeventfd is 
required for iothread
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) quit
-*** done
+(qemu) *** done
Failures: 068
Failed 1 of 1 tests

Fam

Re: [Qemu-devel] [RFC 3/5] vfio: introduce new VFIO ioctl VFIO_DEVICE_PCI_STATUS_SET

2017-06-27 Thread Zhang, Yulei



> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, June 27, 2017 4:19 AM
> To: Zhang, Yulei 
> Cc: qemu-devel@nongnu.org; Tian, Kevin ;
> joonas.lahti...@linux.intel.com; zhen...@linux.intel.com; Zheng, Xiao
> ; Wang, Zhi A 
> Subject: Re: [Qemu-devel] [RFC 3/5] vfio: introduce new VFIO ioctl
> VFIO_DEVICE_PCI_STATUS_SET
> 
> On Tue,  4 Apr 2017 18:27:30 +0800
> Yulei Zhang  wrote:
> 
> > New VFIO ioctl VFIO_DEVICE_PCI_STATUS_SET is added to change the
> > vfio pci device status during the migration, stop the device on
> > the source side before fetch its status and start the deivce on
> > the target side after restore its status.
> >
> > Signed-off-by: Yulei Zhang 
> > ---
> >  hw/vfio/pci.c  | 17 +
> >  linux-headers/linux/vfio.h | 15 +++
> >  2 files changed, 32 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 7de4eb4..605a473 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -38,6 +38,7 @@
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> >  static VMStateDescription vfio_pci_vmstate;
> > +static void vfio_vm_change_state_handler(void *pv, int running,
> RunState state);
> >
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -2866,6 +2867,7 @@ static void vfio_realize(PCIDevice *pdev, Error
> **errp)
> >  vfio_register_err_notifier(vdev);
> >  vfio_register_req_notifier(vdev);
> >  vfio_setup_resetfn_quirk(vdev);
> > +qemu_add_vm_change_state_handler(vfio_vm_change_state_handler,
> vdev);
> >
> >  return;
> >
> > @@ -2948,6 +2950,21 @@ post_reset:
> >  vfio_pci_post_reset(vdev);
> >  }
> >
> > +static void vfio_vm_change_state_handler(void *pv, int running,
> RunState state)
> > +{
> > +VFIOPCIDevice *vdev = pv;
> > +struct vfio_pci_status_set *vfio_status;
> > +int argsz = sizeof(*vfio_status);
> > +
> > +vfio_status = g_malloc0(argsz);
> > +vfio_status->argsz = argsz;
> > +vfio_status->flags = running ? VFIO_DEVICE_PCI_START :
> > + VFIO_DEVICE_PCI_STOP;
> > +
> > +ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_STATUS_SET, vfio_status);
> > +g_free(vfio_status);
> > +}
> > +
> >  static int vfio_device_put(QEMUFile *f, void *pv, size_t size, VMStateField
> *field,
> >  QJSON *vmdesc)
> >  {
> > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > index c87d05c..fa17848 100644
> > --- a/linux-headers/linux/vfio.h
> > +++ b/linux-headers/linux/vfio.h
> > @@ -487,6 +487,21 @@ struct vfio_pci_hot_reset {
> >
> >  #define VFIO_DEVICE_PCI_HOT_RESET  _IO(VFIO_TYPE, VFIO_BASE + 13)
> >
> > +/**
> > + * VFIO_DEVICE_PCI_STATUS_SET - _IOW(VFIO_TYPE, VFIO_BASE + 14,
> > + * struct vfio_pci_status_set)
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_pci_status_set{
> > +   __u32   argsz;
> > +   __u32   flags;
> > +#define VFIO_DEVICE_PCI_STOP  (1 << 0)
> > +#define VFIO_DEVICE_PCI_START (1 << 1)
> > +};
> > +
> > +#define VFIO_DEVICE_PCI_STATUS_SET _IO(VFIO_TYPE, VFIO_BASE + 14)
> > +
> >  /*  API for Type1 VFIO IOMMU  */
> >
> >  /**
> 
> Why does this need to be an ioctl?  We could simply define the first
> dword of the migration region as the device state and the user could
> read and write it.  Thanks,
> 
> Alex

Sure, we can remove this ioctl.

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-27 Thread Will Deacon

Hi Eric,

On Tue, Jun 27, 2017 at 08:38:48AM +0200, Auger Eric wrote:
> On 26/06/2017 18:13, Jean-Philippe Brucker wrote:
> > On 26/06/17 09:22, Auger Eric wrote:
> >> On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
> >>> On 19/06/17 08:54, Bharat Bhushan wrote:
>  I started added replay in virtio-iommu and came across how MSI 
>  interrupts with work with VFIO. 
>  I understand that on intel this works differently but vsmmu will have 
>  same requirement. 
>  kvm-msi-irq-route are added using the msi-address to be translated by 
>  viommu and not the final translated address.
>  While currently the irqfd framework does not know about emulated iommus 
>  (virtio-iommu, vsmmuv3/vintel-iommu).
>  So in my view we have following options:
>  - Programming with translated address when setting up kvm-msi-irq-route
>  - Route the interrupts via QEMU, which is bad from performance
>  - vhost-virtio-iommu may solve the problem in long term
> 
>  Is there any other better option I am missing?
> >>>
> >>> Since we're on the topic of MSIs... I'm currently trying to figure out how
> >>> we'll handle MSIs in the nested translation mode, where the guest manages
> >>> S1 page tables and the host doesn't know about GVA->GPA translation.
> >>
> >> I have a question about the "nested translation mode" terminology. Do
> >> you mean in that case you use stage 1 + stage 2 of the physical IOMMU
> >> (which the ARM spec normally advises or was meant for) or do you mean
> >> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
> >> moment my understanding is for VFIO integration the pIOMMU uses a single
> >> stage combining both the stage 1 and stage2 mappings but the host is not
> >> aware of those 2 stages.
> > 
> > Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
> > its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
> > in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU.
> > 
> > What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU.
> > I'm referring to the "Page Table Sharing" bit of the Future Work in the
> > initial RFC for virtio-iommu [1], and also PASID table binding [2] in the
> > case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed
> > by the guest, and the VMM only maps GPA->HPA.
> 
> OK I need to read that part more thoroughly. I was told in the past
> handling nested stages at pIOMMU was considered too complex and
> difficult to maintain. But definitively The SMMU architecture is devised
> for that. Michael asked why we did not use that already for vsmmu
> (nested stages are used on AMD IOMMU I think).

Curious -- but what gave you that idea? I worry that something I might have
said wasn't clear or has been misunderstood.

Will

Re: [Qemu-devel] [PATCH 07/16] tcg: Return NULL temp for TCG_CALL_DUMMY_ARG

2017-06-27 Thread Alex Bennée

Richard Henderson  writes:

> Signed-off-by: Richard Henderson 
> ---
>  tcg/tcg.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 3b35344..6c357e7 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -730,7 +730,7 @@ extern bool parallel_cpus;
>
>  static inline TCGTemp *arg_temp(TCGArg a)
>  {
> -return &tcg_ctx.temps[a];
> +return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];
>  }

It doesn't look like a lot of calls to arg_temp are able to deal with a
NULL return and may well immediately deref the value. Are we sure the
cases the TCG_CALL_DUMMY arg is involved are narrowly defined?

>
>  static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)

--
Alex Bennée

Re: [Qemu-devel] [RFC 5/5] vifo: introduce new VFIO ioctl VFIO_DEVICE_PCI_GET_DIRTY_BITMAP

2017-06-27 Thread Zhang, Yulei



> -Original Message-
> From: Qemu-devel [mailto:qemu-devel-
> bounces+yulei.zhang=intel@nongnu.org] On Behalf Of Alex Williamson
> Sent: Tuesday, June 27, 2017 4:19 AM
> To: Zhang, Yulei 
> Cc: Tian, Kevin ; joonas.lahti...@linux.intel.com;
> qemu-devel@nongnu.org; zhen...@linux.intel.com; Zheng, Xiao
> ; Wang, Zhi A 
> Subject: Re: [Qemu-devel] [RFC 5/5] vifo: introduce new VFIO ioctl
> VFIO_DEVICE_PCI_GET_DIRTY_BITMAP
> 
> On Tue,  4 Apr 2017 18:28:04 +0800
> Yulei Zhang  wrote:
> 
> > New VFIO ioctl VFIO_DEVICE_PCI_GET_DIRTY_BITMAP is used to sync the
> > pci device dirty pages during the migration.
> 
> If this needs to exist, it needs a lot more documentation.  Why is this
> a PCI specific device ioctl?  Couldn't any vfio device need this?
>
> > Signed-off-by: Yulei Zhang 
> > ---
> >  hw/vfio/pci.c  | 32 
> >  hw/vfio/pci.h  |  2 ++
> >  linux-headers/linux/vfio.h | 14 ++
> >  3 files changed, 48 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 833cd90..64c851f 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -32,6 +32,7 @@
> >  #include "pci.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "exec/ram_addr.h"
> >
> >  #define MSIX_CAP_LENGTH 12
> >
> > @@ -39,6 +40,7 @@ static void vfio_disable_interrupts(VFIOPCIDevice
> *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> >  static VMStateDescription vfio_pci_vmstate;
> >  static void vfio_vm_change_state_handler(void *pv, int running, RunState
> state);
> > +static void vfio_log_sync(MemoryListener *listener,
> MemoryRegionSection *section);
> >
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -2869,6 +2871,11 @@ static void vfio_realize(PCIDevice *pdev, Error
> **errp)
> >  vfio_setup_resetfn_quirk(vdev);
> >  qemu_add_vm_change_state_handler(vfio_vm_change_state_handler,
> vdev);
> >
> > +vdev->vfio_memory_listener = (MemoryListener) {
> > +   .log_sync = vfio_log_sync,
> > +};
> > +memory_listener_register(&vdev->vfio_memory_listener,
> &address_space_memory);
> > +
> >  return;
> >
> >  out_teardown:
> > @@ -2964,6 +2971,7 @@ static void vfio_vm_change_state_handler(void
> *pv, int running, RunState state)
> >  if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_STATUS_SET, vfio_status))
> {
> >  error_report("vfio: Failed to %s device\n", running ? "start" : 
> > "stop");
> >  }
> > +vdev->device_stop = running ? false : true;
> >  g_free(vfio_status);
> >  }
> >
> > @@ -3079,6 +3087,30 @@ static int vfio_device_get(QEMUFile *f, void *pv,
> size_t size, VMStateField *fie
> >  return 0;
> >  }
> >
> > +static void vfio_log_sync(MemoryListener *listener,
> MemoryRegionSection *section)
> > +{
> > +VFIOPCIDevice *vdev = container_of(listener, struct VFIOPCIDevice,
> vfio_memory_listener);
> > +
> > +if (vdev->device_stop) {
> > +struct vfio_pci_get_dirty_bitmap *d;
> > +ram_addr_t size = int128_get64(section->size);
> > +unsigned long page_nr = size >> TARGET_PAGE_BITS;
> > +unsigned long bitmap_size = (BITS_TO_LONGS(page_nr) + 1) *
> sizeof(unsigned long);
> > +d = g_malloc0(sizeof(*d) + bitmap_size);
> > +d->start_addr = section->offset_within_address_space;
> > +d->page_nr = page_nr;
> > +
> > +if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_GET_DIRTY_BITMAP, d))
> {
> > +error_report("vfio: Failed to fetch dirty pages for 
> > migration\n");
> > +goto exit;
> > +}
> > +cpu_physical_memory_set_dirty_lebitmap((unsigned long*)&d-
> >dirty_bitmap, d->start_addr, d->page_nr);
> > +
> > +exit:
> > +g_free(d);
> > +}
> > +}
> > +
> >  static void vfio_instance_init(Object *obj)
> >  {
> >  PCIDevice *pci_dev = PCI_DEVICE(obj);
> > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> > index bd98618..984391d 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -144,6 +144,8 @@ typedef struct VFIOPCIDevice {
> >  bool no_kvm_intx;
> >  bool no_kvm_msi;
> >  bool no_kvm_msix;
> > +bool device_stop;
> > +MemoryListener vfio_memory_listener;
> >  } VFIOPCIDevice;
> >
> >  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > index fa17848..aa73ee1 100644
> > --- a/linux-headers/linux/vfio.h
> > +++ b/linux-headers/linux/vfio.h
> > @@ -502,6 +502,20 @@ struct vfio_pci_status_set{
> >
> >  #define VFIO_DEVICE_PCI_STATUS_SET _IO(VFIO_TYPE, VFIO_BASE + 14)
> >
> > +/**
> > + * VFIO_DEVICE_PCI_GET_DIRTY_BITMAP - _IOW(VFIO_TYPE, VFIO_BASE +
> 15,
> > + * struct vfio_pci_get_dirty_bitmap)
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_pci_get_dirty_bitmap{
> > +   __u64  start_addr;
> > +   __u64

Re: [Qemu-devel] [PATCH 09/16] tcg: Use per-temp state data in liveness

2017-06-27 Thread Alex Bennée


Richard Henderson  writes:

> This avoids having to allocate external memory for each temporary.
>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/tcg.c | 232 
> ++
>  tcg/tcg.h |   6 ++
>  2 files changed, 120 insertions(+), 118 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 0d758e4..e78140b 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1399,42 +1399,54 @@ TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp 
> *old_op,
>
>  /* liveness analysis: end of function: all temps are dead, and globals
> should be in memory. */
> -static inline void tcg_la_func_end(TCGContext *s, uint8_t *temp_state)
> +static void tcg_la_func_end(TCGContext *s)
>  {
> -memset(temp_state, TS_DEAD | TS_MEM, s->nb_globals);
> -memset(temp_state + s->nb_globals, TS_DEAD, s->nb_temps - s->nb_globals);
> +int ng = s->nb_globals;
> +int nt = s->nb_temps;
> +int i;
> +
> +for (i = 0; i < ng; ++i) {
> +s->temps[i].state = TS_DEAD | TS_MEM;
> +}
> +for (i = ng; i < nt; ++i) {
> +s->temps[i].state = TS_DEAD;
> +}
>  }
>
>  /* liveness analysis: end of basic block: all temps are dead, globals
> and local temps should be in memory. */
> -static inline void tcg_la_bb_end(TCGContext *s, uint8_t *temp_state)
> +static void tcg_la_bb_end(TCGContext *s)
>  {
> -int i, n;
> +int ng = s->nb_globals;
> +int nt = s->nb_temps;
> +int i;
>
> -tcg_la_func_end(s, temp_state);
> -for (i = s->nb_globals, n = s->nb_temps; i < n; i++) {
> -if (s->temps[i].temp_local) {
> -temp_state[i] |= TS_MEM;
> -}
> +for (i = 0; i < ng; ++i) {
> +s->temps[i].state = TS_DEAD | TS_MEM;
> +}
> +for (i = ng; i < nt; ++i) {
> +s->temps[i].state = (s->temps[i].temp_local
> + ? TS_DEAD | TS_MEM
> + : TS_DEAD);
>  }
>  }
>
>  /* Liveness analysis : update the opc_arg_life array to tell if a
> given input arguments is dead. Instructions updating dead
> temporaries are removed. */
> -static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
> +static void liveness_pass_1(TCGContext *s)
>  {
>  int nb_globals = s->nb_globals;
>  int oi, oi_prev;
>
> -tcg_la_func_end(s, temp_state);
> +tcg_la_func_end(s);
>
>  for (oi = s->gen_op_buf[0].prev; oi != 0; oi = oi_prev) {
>  int i, nb_iargs, nb_oargs;
>  TCGOpcode opc_new, opc_new2;
>  bool have_opc_new2;
>  TCGLifeData arg_life = 0;
> -TCGArg arg;
> +TCGTemp *arg_ts;
>
>  TCGOp * const op = &s->gen_op_buf[oi];
>  TCGOpcode opc = op->opc;
> @@ -1454,8 +1466,8 @@ static void liveness_pass_1(TCGContext *s, uint8_t 
> *temp_state)
>  /* pure functions can be removed if their result is unused */
>  if (call_flags & TCG_CALL_NO_SIDE_EFFECTS) {
>  for (i = 0; i < nb_oargs; i++) {
> -arg = op->args[i];
> -if (temp_state[arg] != TS_DEAD) {
> +arg_ts = arg_temp(op->args[i]);
> +if (arg_ts->state != TS_DEAD) {
>  goto do_not_remove_call;
>  }
>  }
> @@ -1465,41 +1477,41 @@ static void liveness_pass_1(TCGContext *s, uint8_t 
> *temp_state)
>
>  /* output args are dead */
>  for (i = 0; i < nb_oargs; i++) {
> -arg = op->args[i];
> -if (temp_state[arg] & TS_DEAD) {
> +arg_ts = arg_temp(op->args[i]);
> +if (arg_ts->state & TS_DEAD) {
>  arg_life |= DEAD_ARG << i;
>  }
> -if (temp_state[arg] & TS_MEM) {
> +if (arg_ts->state & TS_MEM) {
>  arg_life |= SYNC_ARG << i;
>  }
> -temp_state[arg] = TS_DEAD;
> +arg_ts->state = TS_DEAD;
>  }
>
>  if (!(call_flags & (TCG_CALL_NO_WRITE_GLOBALS |
>  TCG_CALL_NO_READ_GLOBALS))) {
>  /* globals should go back to memory */
> -memset(temp_state, TS_DEAD | TS_MEM, nb_globals);
> +for (i = 0; i < nb_globals; i++) {
> +s->temps[i].state = TS_DEAD | TS_MEM;
> +}
>  } else if (!(call_flags & TCG_CALL_NO_READ_GLOBALS)) {
>  /* globals should be synced to memory */
>  for (i = 0; i < nb_globals; i++) {
> -temp_state[i] |= TS_MEM;
> +s->temps[i].state |= TS_MEM;
>  }
>

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-27 Thread Auger Eric

Hi,

On 27/06/2017 10:46, Will Deacon wrote:
> Hi Eric,
> 
> On Tue, Jun 27, 2017 at 08:38:48AM +0200, Auger Eric wrote:
>> On 26/06/2017 18:13, Jean-Philippe Brucker wrote:
>>> On 26/06/17 09:22, Auger Eric wrote:
 On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
> On 19/06/17 08:54, Bharat Bhushan wrote:
>> I started added replay in virtio-iommu and came across how MSI 
>> interrupts with work with VFIO. 
>> I understand that on intel this works differently but vsmmu will have 
>> same requirement. 
>> kvm-msi-irq-route are added using the msi-address to be translated by 
>> viommu and not the final translated address.
>> While currently the irqfd framework does not know about emulated iommus 
>> (virtio-iommu, vsmmuv3/vintel-iommu).
>> So in my view we have following options:
>> - Programming with translated address when setting up kvm-msi-irq-route
>> - Route the interrupts via QEMU, which is bad from performance
>> - vhost-virtio-iommu may solve the problem in long term
>>
>> Is there any other better option I am missing?
>
> Since we're on the topic of MSIs... I'm currently trying to figure out how
> we'll handle MSIs in the nested translation mode, where the guest manages
> S1 page tables and the host doesn't know about GVA->GPA translation.

 I have a question about the "nested translation mode" terminology. Do
 you mean in that case you use stage 1 + stage 2 of the physical IOMMU
 (which the ARM spec normally advises or was meant for) or do you mean
 stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
 moment my understanding is for VFIO integration the pIOMMU uses a single
 stage combining both the stage 1 and stage2 mappings but the host is not
 aware of those 2 stages.
>>>
>>> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
>>> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
>>> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU.
>>>
>>> What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU.
>>> I'm referring to the "Page Table Sharing" bit of the Future Work in the
>>> initial RFC for virtio-iommu [1], and also PASID table binding [2] in the
>>> case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed
>>> by the guest, and the VMM only maps GPA->HPA.
>>
>> OK I need to read that part more thoroughly. I was told in the past
>> handling nested stages at pIOMMU was considered too complex and
>> difficult to maintain. But definitively The SMMU architecture is devised
>> for that. Michael asked why we did not use that already for vsmmu
>> (nested stages are used on AMD IOMMU I think).
> 
> Curious -- but what gave you that idea? I worry that something I might have
> said wasn't clear or has been misunderstood.

Lobby discussions I might not have correctly understood ;-) Anyway
that's a new direction that I am happy to investigate then.

Thanks

Eric
> 
> Will
>

Re: [Qemu-devel] [PATCH 10/16] tcg: Avoid loops against variable bounds

2017-06-27 Thread Alex Bennée


Richard Henderson  writes:

> Copy s->nb_globals or s->nb_temps to a local variable for the purposes
> of iteration.  This should allow the compiler to use low-overhead
> looping constructs on some hosts.
>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/tcg.c | 27 ++-
>  1 file changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index e78140b..c228f1e 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -943,23 +943,16 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg 
> ret,
>
>  static void tcg_reg_alloc_start(TCGContext *s)
>  {
> -int i;
> +int i, n;
>  TCGTemp *ts;
> -for(i = 0; i < s->nb_globals; i++) {
> +
> +for (i = 0, n = s->nb_globals; i < n; i++) {
>  ts = &s->temps[i];
> -if (ts->fixed_reg) {
> -ts->val_type = TEMP_VAL_REG;
> -} else {
> -ts->val_type = TEMP_VAL_MEM;
> -}
> +ts->val_type = (ts->fixed_reg ? TEMP_VAL_REG : TEMP_VAL_MEM);
>  }
> -for(i = s->nb_globals; i < s->nb_temps; i++) {
> +for (n = s->nb_temps; i < n; i++) {

A one line comment like /* i continues on from s->nb_globals above */
might prevent a momentary confusion when reading through.

>  ts = &s->temps[i];
> -if (ts->temp_local) {
> -ts->val_type = TEMP_VAL_MEM;
> -} else {
> -ts->val_type = TEMP_VAL_DEAD;
> -}
> +ts->val_type = (ts->temp_local ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
>  ts->mem_allocated = 0;
>  ts->fixed_reg = 0;
>  }
> @@ -2050,9 +2043,9 @@ static void temp_save(TCGContext *s, TCGTemp *ts, 
> TCGRegSet allocated_regs)
> temporary registers needs to be allocated to store a constant. */
>  static void save_globals(TCGContext *s, TCGRegSet allocated_regs)
>  {
> -int i;
> +int i, n;
>
> -for (i = 0; i < s->nb_globals; i++) {
> +for (i = 0, n = s->nb_globals; i < n; i++) {
>  temp_save(s, &s->temps[i], allocated_regs);
>  }
>  }
> @@ -2062,9 +2055,9 @@ static void save_globals(TCGContext *s, TCGRegSet 
> allocated_regs)
> temporary registers needs to be allocated to store a constant. */
>  static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
>  {
> -int i;
> +int i, n;
>
> -for (i = 0; i < s->nb_globals; i++) {
> +for (i = 0, n = s->nb_globals; i < n; i++) {
>  TCGTemp *ts = &s->temps[i];
>  tcg_debug_assert(ts->val_type != TEMP_VAL_REG
>   || ts->fixed_reg

Otherwise:

Reviewed-by: Alex Bennée 

--
Alex Bennée

[Qemu-devel] [PATCH 0/8] VT-d: some enhancements on iotlb and tools

2017-06-27 Thread Peter Xu

Patch 1: fixes a very rare PT path issue on iova value. It didn't
break anything since it's merely not touched (only if when IOMMU
enabled, then set one device to PT), but still better fix it.

Patch 2-5: added "info iommu" hmp command, and implemented for VT-d.
Meanwhile, added some statistics for iotlb.

Patch 6: introduce "x-iotlb-size" to tune iotlb size, or to turn it
off (e.g., when we want to measure how iotlb affects one payload).

Patch 7: some refine on iotlb entry.

Patch 8: implemented MRU list algorithm for iotlb.

For the last patch, it's logically making more sense than the old
algo, however the performance is merely the same as before (as far as
I tested with simple netperf payloads, in either streaming, rr,
reverse, etc.) since in most normal cases we cannot really let iotlb
overflow especially when size is 1024 by default, e.g., guest kernel
driver will release buffer when after used, and unstrict
intel_iommu=on parameter will also send periodic global iotlb flush
which will reset the whole cache. If anyone has suggestion on specific
workload, please shoot. Anyway, I'm posting this out for review to see
any possible comments/suggestions.

Thanks,

Peter Xu (8):
  intel_iommu: fix VTD_PAGE_MASK
  hmp: add info iommu
  intel_iommu: support "info iommu"
  intel_iommu: add iotlb/context cache statistics
  intel_iommu: hmp: allow "-c" for "info iommu"
  intel_iommu: let iotlb size tunable
  intel_iommu: use access_flags for iotlb
  intel_iommu: implement mru list for iotlb

 hmp-commands-info.hx   |  14 
 hmp.c  |   6 ++
 hmp.h  |   1 +
 hw/i386/intel_iommu.c  | 169 +++--
 hw/i386/intel_iommu_internal.h |  11 +--
 hw/i386/trace-events   |   1 -
 hw/i386/x86-iommu.c|  17 +
 include/hw/i386/intel_iommu.h  |  20 -
 include/hw/i386/x86-iommu.h|   5 ++
 include/hw/iommu.h |   9 +++
 stubs/Makefile.objs|   1 +
 stubs/iommu.c  |   9 +++
 12 files changed, 209 insertions(+), 54 deletions(-)
 create mode 100644 include/hw/iommu.h
 create mode 100644 stubs/iommu.c

-- 
2.7.4

[Qemu-devel] [PATCH 2/8] hmp: add info iommu

2017-06-27 Thread Peter Xu

Introducing a new HMP interface "info iommu" to dump IOMMU information.
This command will be only used for developers' debugging purpose, and no
possible use for users. So QMP interface will not be implemented.

This patch only implements the stub one.  We can provide arch-dependent
status dump in the future.

Signed-off-by: Peter Xu 
---
 hmp-commands-info.hx | 14 ++
 hmp.c|  6 ++
 hmp.h|  1 +
 include/hw/iommu.h   |  9 +
 stubs/Makefile.objs  |  1 +
 stubs/iommu.c|  9 +
 6 files changed, 40 insertions(+)
 create mode 100644 include/hw/iommu.h
 create mode 100644 stubs/iommu.c

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index ae16901..a39243d 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -802,6 +802,20 @@ Dump all the ramblocks of the system.
 ETEXI
 
 {
+.name   = "iommu",
+.args_type  = "",
+.params = "",
+.help   = "Display system IOMMU information",
+.cmd= hmp_info_iommu,
+},
+
+STEXI
+@item info ramblock
+@findex ramblock
+Dump all the ramblocks of the system.
+ETEXI
+
+{
 .name   = "hotpluggable-cpus",
 .args_type  = "",
 .params = "",
diff --git a/hmp.c b/hmp.c
index 8c72c58..68994af 100644
--- a/hmp.c
+++ b/hmp.c
@@ -42,6 +42,7 @@
 #include "qemu/error-report.h"
 #include "exec/ramlist.h"
 #include "hw/intc/intc.h"
+#include "hw/iommu.h"
 #include "migration/snapshot.h"
 
 #ifdef CONFIG_SPICE
@@ -2817,3 +2818,8 @@ void hmp_info_vm_generation_id(Monitor *mon, const QDict 
*qdict)
 hmp_handle_error(mon, &err);
 qapi_free_GuidInfo(info);
 }
+
+void hmp_info_iommu(Monitor *mon, const QDict *qdict)
+{
+arch_iommu_info(mon, qdict);
+}
diff --git a/hmp.h b/hmp.h
index d8b94ce..ed01c49 100644
--- a/hmp.h
+++ b/hmp.h
@@ -143,5 +143,6 @@ void hmp_info_dump(Monitor *mon, const QDict *qdict);
 void hmp_info_ramblock(Monitor *mon, const QDict *qdict);
 void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
+void hmp_info_iommu(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/include/hw/iommu.h b/include/hw/iommu.h
new file mode 100644
index 000..5201a8d
--- /dev/null
+++ b/include/hw/iommu.h
@@ -0,0 +1,9 @@
+#ifndef __HW_IOMMU_H__
+#define __HW_IOMMU_H__
+
+#include "qemu/typedefs.h"
+#include "qapi/qmp/qdict.h"
+
+void arch_iommu_info(Monitor *mon, const QDict *qdict);
+
+#endif
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index f5b47bf..dfd5569 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -39,3 +39,4 @@ stub-obj-y += pc_madt_cpu_entry.o
 stub-obj-y += vmgenid.o
 stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
+stub-obj-y += iommu.o
diff --git a/stubs/iommu.c b/stubs/iommu.c
new file mode 100644
index 000..75b4f4c
--- /dev/null
+++ b/stubs/iommu.c
@@ -0,0 +1,9 @@
+#include "qemu/osdep.h"
+#include "monitor/monitor.h"
+#include "hw/iommu.h"
+
+void arch_iommu_info(Monitor *mon, const QDict *qdict)
+{
+monitor_printf(mon, "This command is not supported "
+   "on this platform.\n");
+}
-- 
2.7.4

[Qemu-devel] [PATCH 1/8] intel_iommu: fix VTD_PAGE_MASK

2017-06-27 Thread Peter Xu

IOMMUTLBEntry.iova is returned incorrectly in one PT path (though mostly
we cannot really trigger this path, even if we do, we are mostly
disgarding this value, so it didn't break anything). Fix it by
converting the VTD_PAGE_MASK into normal definition (normally it should
be pfn mask, not offset mask), then switch the other user of it.

Fixes: b93130 ("intel_iommu: cleanup vtd_{do_}iommu_translate()")
Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 2 +-
 hw/i386/intel_iommu_internal.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a9b59bd..a5c83dd 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1141,7 +1141,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
 entry->iova = addr & VTD_PAGE_MASK;
 entry->translated_addr = entry->iova;
-entry->addr_mask = VTD_PAGE_MASK;
+entry->addr_mask = ~VTD_PAGE_MASK;
 entry->perm = IOMMU_RW;
 trace_vtd_translate_pt(source_id, entry->iova);
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f50ecd8..d1d6290 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -384,7 +384,7 @@ typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
 /* Pagesize of VTD paging structures, including root and context tables */
 #define VTD_PAGE_SHIFT  12
 #define VTD_PAGE_SIZE   (1ULL << VTD_PAGE_SHIFT)
-#define VTD_PAGE_MASK   (VTD_PAGE_SIZE - 1)
+#define VTD_PAGE_MASK   ~(VTD_PAGE_SIZE - 1)
 
 #define VTD_PAGE_SHIFT_4K   12
 #define VTD_PAGE_MASK_4K(~((1ULL << VTD_PAGE_SHIFT_4K) - 1))
-- 
2.7.4

[Qemu-devel] [PATCH 6/8] intel_iommu: let iotlb size tunable

2017-06-27 Thread Peter Xu

We were having static IOTLB size as 1024. Let it be a tunable. We can
also turns IOTLB off if we want, by specify the size as zero.

The tunable is named as "x-iotlb-size" since that should not really be
something used by user yet, but mostly for debugging purpose now.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 14 --
 hw/i386/intel_iommu_internal.h |  1 -
 include/hw/i386/intel_iommu.h  |  1 +
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 72b39f0..fc05764 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -227,6 +227,10 @@ static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, 
uint16_t source_id,
 uint64_t key;
 int level;
 
+if (s->iotlb_size == 0) {
+return NULL;
+}
+
 for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
 key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level),
 source_id, level);
@@ -249,8 +253,12 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t 
source_id,
 uint64_t *key = g_malloc(sizeof(*key));
 uint64_t gfn = vtd_get_iotlb_gfn(addr, level);
 
+if (s->iotlb_size == 0) {
+return;
+}
+
 trace_vtd_iotlb_page_update(source_id, addr, slpte, domain_id);
-if (g_hash_table_size(s->iotlb) >= VTD_IOTLB_MAX_SIZE) {
+if (g_hash_table_size(s->iotlb) >= s->iotlb_size) {
 trace_vtd_iotlb_reset("iotlb exceeds size limit");
 vtd_reset_iotlb(s);
 }
@@ -2388,6 +2396,7 @@ static Property vtd_properties[] = {
 ON_OFF_AUTO_AUTO),
 DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
 DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
+DEFINE_PROP_UINT16("x-iotlb-size", IntelIOMMUState, iotlb_size, 1024),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -3047,7 +3056,8 @@ static void vtd_info_dump(X86IOMMUState *x86_iommu, 
Monitor *mon,
 DUMP("Caching-mode: %s\n", s->caching_mode ? "enabled" : "disabled");
 DUMP("Misc: next_frr=%d, context_gen=%d, buggy_eim=%d\n",
  s->next_frcd_reg, s->context_cache_gen, s->buggy_eim);
-DUMP("  iotlb_size=%d\n", g_hash_table_size(s->iotlb));
+DUMP("  iotlb_size=%d/%d\n", g_hash_table_size(s->iotlb),
+ s->iotlb_size);
 
 if (clear_stats) {
 vtd_reset_stats(s);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index d1d6290..dc0257c 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -116,7 +116,6 @@
 /* The shift of source_id in the key of IOTLB hash table */
 #define VTD_IOTLB_SID_SHIFT 36
 #define VTD_IOTLB_LVL_SHIFT 52
-#define VTD_IOTLB_MAX_SIZE  1024/* Max size of the hash table */
 
 /* IOTLB_REG */
 #define VTD_TLB_GLOBAL_FLUSH(1ULL << 60) /* Global invalidation */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index fc69ff3..947c153 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -295,6 +295,7 @@ struct IntelIOMMUState {
 
 uint32_t context_cache_gen; /* Should be in [1,MAX] */
 GHashTable *iotlb;  /* IOTLB */
+uint16_t iotlb_size;/* IOTLB max cache entries */
 
 MemoryRegionIOMMUOps iommu_ops;
 GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* 
reference */
-- 
2.7.4

[Qemu-devel] [PATCH 3/8] intel_iommu: support "info iommu"

2017-06-27 Thread Peter Xu

Dump critical information for VT-d. Sample output:

(qemu) info iommu
Version: 1
Cap: 0x12008c22260286
Extended Cap: 0xf00f5a
DMAR: enabled, root=0x7435f000 (extended=0)
IR: enabled, root=0x17a40, size=0x1 (eim=1)
QI: enabled, root=0x17aadf000, head=156, tail=156, size=256
Caching-mode: enabled
Misc: next_frr=0, context_gen=2, buggy_eim=0

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c   | 39 +++
 hw/i386/x86-iommu.c | 17 +
 include/hw/i386/x86-iommu.h |  5 +
 3 files changed, 61 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a5c83dd..39f772a 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2991,6 +2991,44 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error 
**errp)
 return true;
 }
 
+#define DUMP(...) monitor_printf(mon, ## __VA_ARGS__)
+static void vtd_info_dump(X86IOMMUState *x86_iommu, Monitor *mon,
+  const QDict *qdict)
+{
+IntelIOMMUState *s = INTEL_IOMMU_DEVICE(x86_iommu);
+
+DUMP("Version: %d\n", 1);
+DUMP("Cap: 0x%"PRIx64"\n", s->cap);
+DUMP("Extended Cap: 0x%"PRIx64"\n", s->ecap);
+
+DUMP("DMAR: %s", s->dmar_enabled ? "enabled" : "disabled");
+if (s->dmar_enabled) {
+DUMP(", root=0x%"PRIx64" (extended=%d)",
+ s->root, s->root_extended);
+}
+DUMP("\n");
+
+DUMP("IR: %s", s->intr_enabled ? "enabled" : "disabled");
+if (s->intr_enabled) {
+DUMP(", root=0x%"PRIx64", size=0x%"PRIx32" (eim=%d)",
+ s->intr_root, s->intr_size, s->intr_eime);
+}
+DUMP("\n");
+
+DUMP("QI: %s", s->qi_enabled ? "enabled" : "disabled");
+if (s->qi_enabled) {
+DUMP(", root=0x%"PRIx64", head=%u, tail=%u, size=%u",
+ s->iq, s->iq_head, s->iq_tail, s->iq_size);
+}
+DUMP("\n");
+
+DUMP("Caching-mode: %s\n", s->caching_mode ? "enabled" : "disabled");
+DUMP("Misc: next_frr=%d, context_gen=%d, buggy_eim=%d\n",
+ s->next_frcd_reg, s->context_cache_gen, s->buggy_eim);
+DUMP("  iotlb_size=%d\n", g_hash_table_size(s->iotlb));
+}
+#undef DUMP
+
 static void vtd_realize(DeviceState *dev, Error **errp)
 {
 MachineState *ms = MACHINE(qdev_get_machine());
@@ -3042,6 +3080,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
 dc->hotpluggable = false;
 x86_class->realize = vtd_realize;
 x86_class->int_remap = vtd_int_remap;
+x86_class->info_dump = vtd_info_dump;
 /* Supported by the pc-q35-* machine types */
 dc->user_creatable = true;
 }
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 293caf8..fed35b4 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -76,6 +76,23 @@ IommuType x86_iommu_get_type(void)
 return x86_iommu_default->type;
 }
 
+void arch_iommu_info(Monitor *mon, const QDict *qdict)
+{
+X86IOMMUState *iommu = x86_iommu_get_default();
+X86IOMMUClass *class;
+
+if (!iommu) {
+monitor_printf(mon, "No IOMMU is detected.\n");
+return;
+}
+
+class = X86_IOMMU_GET_CLASS(iommu);
+
+if (class->info_dump) {
+class->info_dump(iommu, mon, qdict);
+}
+}
+
 static void x86_iommu_realize(DeviceState *dev, Error **errp)
 {
 X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index ef89c0c..c414b65 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -22,6 +22,8 @@
 
 #include "hw/sysbus.h"
 #include "hw/pci/pci.h"
+#include "hw/iommu.h"
+#include "monitor/monitor.h"
 
 #define  TYPE_X86_IOMMU_DEVICE  ("x86-iommu")
 #define  X86_IOMMU_DEVICE(obj) \
@@ -50,6 +52,9 @@ struct X86IOMMUClass {
 /* MSI-based interrupt remapping */
 int (*int_remap)(X86IOMMUState *iommu, MSIMessage *src,
  MSIMessage *dst, uint16_t sid);
+/* Dump IOMMU information */
+void (*info_dump)(X86IOMMUState *iommu, Monitor *mon,
+  const QDict *qdict);
 };
 
 /**
-- 
2.7.4

[Qemu-devel] [PATCH 5/8] intel_iommu: hmp: allow "-c" for "info iommu"

2017-06-27 Thread Peter Xu

New parameter "-c" for it to clear statistics. Other platforms can
selectively support this (though none yet).

Signed-off-by: Peter Xu 
---
 hmp-commands-info.hx  | 4 ++--
 hw/i386/intel_iommu.c | 5 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index a39243d..2add941 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -803,8 +803,8 @@ ETEXI
 
 {
 .name   = "iommu",
-.args_type  = "",
-.params = "",
+.args_type  = "clear_stats:-c",
+.params = "[-c]",
 .help   = "Display system IOMMU information",
 .cmd= hmp_info_iommu,
 },
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 45d0919..72b39f0 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3008,6 +3008,7 @@ static void vtd_info_dump(X86IOMMUState *x86_iommu, 
Monitor *mon,
   const QDict *qdict)
 {
 IntelIOMMUState *s = INTEL_IOMMU_DEVICE(x86_iommu);
+bool clear_stats = qdict_get_try_bool(qdict, "clear_stats", false);
 
 DUMP("Version: %d\n", 1);
 DUMP("Cap: 0x%"PRIx64"\n", s->cap);
@@ -3047,6 +3048,10 @@ static void vtd_info_dump(X86IOMMUState *x86_iommu, 
Monitor *mon,
 DUMP("Misc: next_frr=%d, context_gen=%d, buggy_eim=%d\n",
  s->next_frcd_reg, s->context_cache_gen, s->buggy_eim);
 DUMP("  iotlb_size=%d\n", g_hash_table_size(s->iotlb));
+
+if (clear_stats) {
+vtd_reset_stats(s);
+}
 }
 #undef DUMP
 
-- 
2.7.4

[Qemu-devel] [PATCH 8/8] intel_iommu: implement mru list for iotlb

2017-06-27 Thread Peter Xu

It is not wise to disgard all the IOTLB cache when cache size reaches
max, but that's what we do now. A slightly better (but still simple) way
to do this is, we just throw away the least recent used cache entry.

This patch implemented MRU list algorithm for VT-d IOTLB. The main logic
is to maintain a Most Recently Used list for the IOTLB entries. The hash
table is still used for lookup, though a new list field is added to each
IOTLB entry for a iotlb MRU list. For each active IOTLB entry, it's both
in the hash table in s->iotlb, and also linked into the MRU list of
s->iotlb_head. The hash helps in fast lookup, and the MRU list helps in
knowing whether the cache is still hot.

After we have such a MRU list, replacing all the iterators of IOTLB
entries by using list iterations rather than hash table iterations.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 75 +-
 hw/i386/intel_iommu_internal.h |  8 -
 hw/i386/trace-events   |  1 -
 include/hw/i386/intel_iommu.h  |  6 +++-
 4 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c2b2683..8375fc3 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -37,6 +37,9 @@
 #include "kvm_i386.h"
 #include "trace.h"
 
+#define FOREACH_IOTLB_SAFE(entry, s, entry_n) \
+QTAILQ_FOREACH_SAFE(entry, &(s)->iotlb_head, link, entry_n)
+
 static void vtd_reset_stats(IntelIOMMUState *s)
 {
 memset(&s->cache_stat, 0, sizeof(s->cache_stat));
@@ -144,14 +147,6 @@ static guint vtd_uint64_hash(gconstpointer v)
 return (guint)*(const uint64_t *)v;
 }
 
-static gboolean vtd_hash_remove_by_domain(gpointer key, gpointer value,
-  gpointer user_data)
-{
-VTDIOTLBEntry *entry = (VTDIOTLBEntry *)value;
-uint16_t domain_id = *(uint16_t *)user_data;
-return entry->domain_id == domain_id;
-}
-
 /* The shift of an addr for a certain level of paging structure */
 static inline uint32_t vtd_slpt_level_shift(uint32_t level)
 {
@@ -164,18 +159,6 @@ static inline uint64_t vtd_slpt_level_page_mask(uint32_t 
level)
 return ~((1ULL << vtd_slpt_level_shift(level)) - 1);
 }
 
-static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value,
-gpointer user_data)
-{
-VTDIOTLBEntry *entry = (VTDIOTLBEntry *)value;
-VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data;
-uint64_t gfn = (info->addr >> VTD_PAGE_SHIFT_4K) & info->mask;
-uint64_t gfn_tlb = (info->addr & entry->mask) >> VTD_PAGE_SHIFT_4K;
-return (entry->domain_id == info->domain_id) &&
-(((entry->gfn & info->mask) == gfn) ||
- (entry->gfn == gfn_tlb));
-}
-
 /* Reset all the gen of VTDAddressSpace to zero and set the gen of
  * IntelIOMMUState to 1.
  */
@@ -206,6 +189,7 @@ static void vtd_reset_iotlb(IntelIOMMUState *s)
 {
 assert(s->iotlb);
 g_hash_table_remove_all(s->iotlb);
+QTAILQ_INIT(&s->iotlb_head);
 }
 
 static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
@@ -236,6 +220,9 @@ static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, 
uint16_t source_id,
 source_id, level);
 entry = g_hash_table_lookup(s->iotlb, &key);
 if (entry) {
+/* Move the entry to the head of MRU list */
+QTAILQ_REMOVE(&s->iotlb_head, entry, link);
+QTAILQ_INSERT_HEAD(&s->iotlb_head, entry, link);
 goto out;
 }
 }
@@ -244,11 +231,23 @@ out:
 return entry;
 }
 
+static void vtd_iotlb_remove_entry(IntelIOMMUState *s, VTDIOTLBEntry *entry)
+{
+uint64_t key = entry->key;
+
+/*
+ * To remove an entry, we need to both remove it from the MRU
+ * list, and also from the hash table.
+ */
+QTAILQ_REMOVE(&s->iotlb_head, entry, link);
+g_hash_table_remove(s->iotlb, &key);
+}
+
 static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
  uint16_t domain_id, hwaddr addr, uint64_t slpte,
  uint8_t access_flags, uint32_t level)
 {
-VTDIOTLBEntry *entry = g_malloc(sizeof(*entry));
+VTDIOTLBEntry *entry = g_new0(VTDIOTLBEntry, 1), *last;
 uint64_t *key = g_malloc(sizeof(*key));
 uint64_t gfn = vtd_get_iotlb_gfn(addr, level);
 
@@ -258,8 +257,9 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t 
source_id,
 
 trace_vtd_iotlb_page_update(source_id, addr, slpte, domain_id);
 if (g_hash_table_size(s->iotlb) >= s->iotlb_size) {
-trace_vtd_iotlb_reset("iotlb exceeds size limit");
-vtd_reset_iotlb(s);
+/* Remove the Least Recently Used cache */
+last = QTAILQ_LAST(&s->iotlb_head, VTDIOTLBEntryHead);
+vtd_iotlb_remove_entry(s, last);
 }
 
 entry->gfn = gfn;
@@ -268,7 +268,11 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t 
source_id,
 entry->access_flags = acces

[Qemu-devel] [PATCH 4/8] intel_iommu: add iotlb/context cache statistics

2017-06-27 Thread Peter Xu

Add statistics for the VT-d IOMMU DMA remapping.

Now "info iommu" shows us this for extra:

Statistics: iotlb=26.35% (6689/25388), context=99.99% (18697/18699)

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 21 +
 include/hw/i386/intel_iommu.h | 10 ++
 2 files changed, 31 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 39f772a..45d0919 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -37,6 +37,11 @@
 #include "kvm_i386.h"
 #include "trace.h"
 
+static void vtd_reset_stats(IntelIOMMUState *s)
+{
+memset(&s->cache_stat, 0, sizeof(s->cache_stat));
+}
+
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
 uint64_t wmask, uint64_t w1cmask)
 {
@@ -1095,9 +1100,12 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
  */
 assert(!vtd_is_interrupt_addr(addr));
 
+s->cache_stat.iotlb_total++;
+
 /* Try to fetch slpte form IOTLB */
 iotlb_entry = vtd_lookup_iotlb(s, source_id, addr);
 if (iotlb_entry) {
+s->cache_stat.iotlb_hit++;
 trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
  iotlb_entry->domain_id);
 slpte = iotlb_entry->slpte;
@@ -1107,8 +1115,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 goto out;
 }
 
+s->cache_stat.context_total++;
+
 /* Try to fetch context-entry from cache first */
 if (cc_entry->context_cache_gen == s->context_cache_gen) {
+s->cache_stat.context_hit++;
 trace_vtd_iotlb_cc_hit(bus_num, devfn, cc_entry->context_entry.hi,
cc_entry->context_entry.lo,
cc_entry->context_cache_gen);
@@ -2875,6 +2886,7 @@ static void vtd_init(IntelIOMMUState *s)
 
 vtd_reset_context_cache(s);
 vtd_reset_iotlb(s);
+vtd_reset_stats(s);
 
 /* Define registers with default values and bit semantics */
 vtd_define_long(s, DMAR_VER_REG, 0x10UL, 0, 0);
@@ -3022,6 +3034,15 @@ static void vtd_info_dump(X86IOMMUState *x86_iommu, 
Monitor *mon,
 }
 DUMP("\n");
 
+DUMP("Statistics: iotlb=%.2lf%% (%"PRIu64"/%"PRIu64"), "
+ "context=%.2lf%% (%"PRIu64"/%"PRIu64")\n",
+ (double)s->cache_stat.iotlb_hit /
+ s->cache_stat.iotlb_total * 100,
+ s->cache_stat.iotlb_hit, s->cache_stat.iotlb_total,
+ (double)s->cache_stat.context_hit /
+ s->cache_stat.context_total * 100,
+ s->cache_stat.context_hit, s->cache_stat.context_total);
+
 DUMP("Caching-mode: %s\n", s->caching_mode ? "enabled" : "disabled");
 DUMP("Misc: next_frr=%d, context_gen=%d, buggy_eim=%d\n",
  s->next_frcd_reg, s->context_cache_gen, s->buggy_eim);
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 3e51876..fc69ff3 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -255,6 +255,13 @@ struct IntelIOMMUNotifierNode {
 QLIST_ENTRY(IntelIOMMUNotifierNode) next;
 };
 
+typedef struct IOMMUCacheStats {
+uint64_t iotlb_hit;
+uint64_t iotlb_total;
+uint64_t context_hit;
+uint64_t context_total;
+} IOMMUCacheStats;
+
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
 X86IOMMUState x86_iommu;
@@ -302,6 +309,9 @@ struct IntelIOMMUState {
 bool intr_eime; /* Extended interrupt mode enabled */
 OnOffAuto intr_eim; /* Toggle for EIM cabability */
 bool buggy_eim; /* Force buggy EIM unless eim=off */
+
+/* For statistics */
+IOMMUCacheStats cache_stat;
 };
 
 /* Find the VTD Address space associated with the given bus pointer,
-- 
2.7.4

[Qemu-devel] [PATCH 7/8] intel_iommu: use access_flags for iotlb

2017-06-27 Thread Peter Xu

It was cached by read/write separately. Let's merge them.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 15 +++
 include/hw/i386/intel_iommu.h |  3 +--
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fc05764..c2b2683 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -246,8 +246,7 @@ out:
 
 static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
  uint16_t domain_id, hwaddr addr, uint64_t slpte,
- bool read_flags, bool write_flags,
- uint32_t level)
+ uint8_t access_flags, uint32_t level)
 {
 VTDIOTLBEntry *entry = g_malloc(sizeof(*entry));
 uint64_t *key = g_malloc(sizeof(*key));
@@ -266,8 +265,7 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t 
source_id,
 entry->gfn = gfn;
 entry->domain_id = domain_id;
 entry->slpte = slpte;
-entry->read_flags = read_flags;
-entry->write_flags = write_flags;
+entry->access_flags = access_flags;
 entry->mask = vtd_slpt_level_page_mask(level);
 *key = vtd_get_iotlb_key(gfn, source_id, level);
 g_hash_table_replace(s->iotlb, key, entry);
@@ -1100,6 +1098,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 bool is_fpd_set = false;
 bool reads = true;
 bool writes = true;
+uint8_t access_flags;
 VTDIOTLBEntry *iotlb_entry;
 
 /*
@@ -1117,8 +1116,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
  iotlb_entry->domain_id);
 slpte = iotlb_entry->slpte;
-reads = iotlb_entry->read_flags;
-writes = iotlb_entry->write_flags;
+access_flags = iotlb_entry->access_flags;
 page_mask = iotlb_entry->mask;
 goto out;
 }
@@ -1191,13 +1189,14 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 }
 
 page_mask = vtd_slpt_level_page_mask(level);
+access_flags = IOMMU_ACCESS_FLAG(reads, writes);
 vtd_update_iotlb(s, source_id, VTD_CONTEXT_ENTRY_DID(ce.hi), addr, slpte,
- reads, writes, level);
+ access_flags, level);
 out:
 entry->iova = addr & page_mask;
 entry->translated_addr = vtd_get_slpte_addr(slpte) & page_mask;
 entry->addr_mask = ~page_mask;
-entry->perm = IOMMU_ACCESS_FLAG(reads, writes);
+entry->perm = access_flags;
 return true;
 
 error:
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 947c153..4960f8d 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -101,8 +101,7 @@ struct VTDIOTLBEntry {
 uint16_t domain_id;
 uint64_t slpte;
 uint64_t mask;
-bool read_flags;
-bool write_flags;
+uint8_t access_flags;
 };
 
 /* VT-d Source-ID Qualifier types */
-- 
2.7.4

Re: [Qemu-devel] [PATCH 16/19] block: protect modification of dirty bitmaps with a mutex

2017-06-27 Thread Vladimir Sementsov-Ogievskiy


26.06.2017 19:54, Paolo Bonzini wrote:


On 26/06/2017 18:07, Vladimir Sementsov-Ogievskiy wrote:

HI!

One question here, should not 'bdrv_undo_clear_dirty_bitmap' be under
lock too?

Any call to dirty bitmap functions between bdrv_clear_dirty_bitmap and
bdrv_undo_clear_dirty_bitmap is problematic anyway, so
bdrv_clear_dirty_bitmap really only needs the lock in the !out case;
bdrv_undo_clear_dirty_bitmap is only called when out != NULL.

However, I agree it would be cleaner to add the lock there, too.

Paolo



Also, you've added comment "Called with BQL taken" both to functions 
that calls bdrv_dirty_bitmaps_lock and that do not... What is BQL?



--
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v6 2/6] queue: Add macro for incremental traversal

2017-06-27 Thread Lluís Vilanova

Richard Henderson writes:

> On 06/26/2017 05:33 AM, Lluís Vilanova wrote:
>> Richard Henderson writes:
>> 
>>> On 06/12/2017 07:54 AM, Lluís Vilanova wrote:
 Adds macro QTAILQ_FOREACH_CONTINUE to support incremental list
 traversal.
 
 Signed-off-by: Lluís Vilanova 
 ---
 include/qemu/queue.h |   12 
 1 file changed, 12 insertions(+)
 
 diff --git a/include/qemu/queue.h b/include/qemu/queue.h
 index 35292c3155..eb2bf9cb1c 100644
 --- a/include/qemu/queue.h
 +++ b/include/qemu/queue.h
 @@ -415,6 +415,18 @@ struct {  
   \
 (var);  \
 (var) = ((var)->field.tqe_next))
 +/**
 + * QTAILQ_FOREACH_CONTINUE:
 + * @var: Variable to resume iteration from.
 + * @field: Field in @var holding a QTAILQ_ENTRY for this queue.
 + *
 + * Resumes iteration on a queue from the element in @var.
 + */
 +#define QTAILQ_FOREACH_CONTINUE(var, field) \
 +for ((var) = ((var)->field.tqe_next);   \
 +(var);  \
 +(var) = ((var)->field.tqe_next))
 +
 #define QTAILQ_FOREACH_SAFE(var, head, field, next_var) \
 for ((var) = ((head)->tqh_first);   \
 (var) && ((next_var) = ((var)->field.tqe_next), 1); \
 
 
>> 
>>> I still say this isn't required if the breakpoint loop is better structured.
>> 
>> I can embed the use of QTAILQ into translate-block.c, but I wanted to keep 
>> the
>> implementation of breakpoint lists hidden behind the cpu_breakpoint API.

> I think using QTAILQ in the common main loop is better than twisting the logic
> so that the loop is unnaturally split into a subroutine.

Ok, then I'll integrate that into the new series.

Thanks,
  Lluis

Re: [Qemu-devel] [PATCH 0/8] VT-d: some enhancements on iotlb and tools

2017-06-27 Thread Peter Xu

On Tue, Jun 27, 2017 at 05:03:31PM +0800, Peter Xu wrote:
> Patch 1: fixes a very rare PT path issue on iova value. It didn't
> break anything since it's merely not touched (only if when IOMMU
> enabled, then set one device to PT), but still better fix it.
> 
> Patch 2-5: added "info iommu" hmp command, and implemented for VT-d.
> Meanwhile, added some statistics for iotlb.

For patch 2,3,5 I should CC Dave. Sorry to have forgotten.

> 
> Patch 6: introduce "x-iotlb-size" to tune iotlb size, or to turn it
> off (e.g., when we want to measure how iotlb affects one payload).
> 
> Patch 7: some refine on iotlb entry.
> 
> Patch 8: implemented MRU list algorithm for iotlb.
> 
> For the last patch, it's logically making more sense than the old
> algo, however the performance is merely the same as before (as far as
> I tested with simple netperf payloads, in either streaming, rr,
> reverse, etc.) since in most normal cases we cannot really let iotlb
> overflow especially when size is 1024 by default, e.g., guest kernel
> driver will release buffer when after used, and unstrict
> intel_iommu=on parameter will also send periodic global iotlb flush
> which will reset the whole cache. If anyone has suggestion on specific
> workload, please shoot. Anyway, I'm posting this out for review to see
> any possible comments/suggestions.
> 
> Thanks,
> 
> Peter Xu (8):
>   intel_iommu: fix VTD_PAGE_MASK
>   hmp: add info iommu
>   intel_iommu: support "info iommu"
>   intel_iommu: add iotlb/context cache statistics
>   intel_iommu: hmp: allow "-c" for "info iommu"
>   intel_iommu: let iotlb size tunable
>   intel_iommu: use access_flags for iotlb
>   intel_iommu: implement mru list for iotlb
> 
>  hmp-commands-info.hx   |  14 
>  hmp.c  |   6 ++
>  hmp.h  |   1 +
>  hw/i386/intel_iommu.c  | 169 
> +++--
>  hw/i386/intel_iommu_internal.h |  11 +--
>  hw/i386/trace-events   |   1 -
>  hw/i386/x86-iommu.c|  17 +
>  include/hw/i386/intel_iommu.h  |  20 -
>  include/hw/i386/x86-iommu.h|   5 ++
>  include/hw/iommu.h |   9 +++
>  stubs/Makefile.objs|   1 +
>  stubs/iommu.c  |   9 +++
>  12 files changed, 209 insertions(+), 54 deletions(-)
>  create mode 100644 include/hw/iommu.h
>  create mode 100644 stubs/iommu.c
> 
> -- 
> 2.7.4
> 
> 

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v9 04/26] target: [tcg] Add generic translation framework

2017-06-27 Thread Peter Maydell

On 27 June 2017 at 04:22, Richard Henderson  wrote:
> On 06/26/2017 11:21 AM, Peter Maydell wrote:
>>
>> x86 definitely gets this totally wrong. I would be unsurprised
>> to find that other variable-length-insn targets do too.
>
>
> For x86, doing this optimally is difficult.  We *could* fix it simply by
> single-stepping when executing within the last 15 bytes of the page.

My feeling is that the "longjump out of translate.c on insn aborts"
approach is pretty confusing and ideally we should get rid of that
entirely in favour of having the translate code handle an error
return from the "load byte/short/word" functions it calls. That might
then make it easier to bail out on page-crossing instructions.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 16/19] block: protect modification of dirty bitmaps with a mutex

2017-06-27 Thread Paolo Bonzini

On 27/06/2017 11:07, Vladimir Sementsov-Ogievskiy wrote:
> 26.06.2017 19:54, Paolo Bonzini wrote:
>>
>> On 26/06/2017 18:07, Vladimir Sementsov-Ogievskiy wrote:
>>> HI!
>>>
>>> One question here, should not 'bdrv_undo_clear_dirty_bitmap' be under
>>> lock too?
>> Any call to dirty bitmap functions between bdrv_clear_dirty_bitmap and
>> bdrv_undo_clear_dirty_bitmap is problematic anyway, so
>> bdrv_clear_dirty_bitmap really only needs the lock in the !out case;
>> bdrv_undo_clear_dirty_bitmap is only called when out != NULL.
>>
>> However, I agree it would be cleaner to add the lock there, too.
> 
> Also, you've added comment "Called with BQL taken" both to functions
> that calls bdrv_dirty_bitmaps_lock and that do not... What is BQL?

It's the "big QEMU lock", also known as "iothread lock".  The locking policy
is documented in block_int.h:

/* Writing to the list requires the BQL _and_ the dirty_bitmap_mutex.
 * Reading from the list can be done with either the BQL or the
 * dirty_bitmap_mutex.  Modifying a bitmap only requires
 * dirty_bitmap_mutex.  */
QemuMutex dirty_bitmap_mutex;
QLIST_HEAD(, BdrvDirtyBitmap) dirty_bitmaps;

and the comments in block/dirty-bitmap.c reflect the above comment.

Paolo

Re: [Qemu-devel] [PATCH v3 00/10] Clock framework API.

2017-06-27 Thread Peter Maydell

On 27 June 2017 at 08:04, KONRAD Frederic  wrote:
> Le 06/23/2017 à 03:58 PM, Peter Maydell a écrit :
>> The pointer is for your clock inputs -- when would you
>> want to start a refresh from that? I would expect
>> refreshes to only ever go downstream -- you update
>> the config of your clock outputs and things downstream
>> of them will update in turn.
>
>
> I started with the goal in mind that the binding + the callback
> can refresh themself without user intervention.
>
> So actually the user only have to change the binding in case of a
> clock selector.. Everything else is done by the framework.

I agree you only need to change the bindings for clock selectors,
that should be true whatever approach we take.

> For example if we want to change a multiplier through a register:
>
> void register_write(..)
> {
>   /* Refresh related clock input. */
> }
>
> void clock_cb(..)
> {
>   return register_value * rate_in;
> }
>
> If we drop the clock first possibility:
>
> void register_write(..)
> {
>   /* refresh all depending clock with
>* rate_in * register value.
>* Either this can be tricky as we need to know exactly which
>* clock need to be refreshed or we need to refresh everybody.
>* On the device example in the patch-set this can become a
>* mess.
>*/
> }
>
> void clock_cb(..)
> {
>   return register_value * rate_in;
> }
>
> Second possibility:
>
> void register_write(..)
> {
>   /* refresh the clock which is referenced as input.
>* This is easy BUT it will refresh all other devices bound to
>* this clock.
>*/

If this device can change the multiplier for a clock
then it must be changing the multiplier for one of
its *output* clocks. The input clock is owned by
whatever device controls that clock. (In hardware,
a device can't change the frequency of a clock signal
that is an input to it.)

Eg you might have a fixed clock (output) representing
the system clock, and a PLL device which takes that
clock as an input, and has one or more output clocks
at various rates whose multipliers can be changed.
Changing the multiplier value in a register in the
PLL device should make it change and refresh the
relevant output clock. (If a device has a builtin
clock divider then you'd model that as it having an
input clock, plus an "output" clock which it doesn't
actually expose to the rest of the world but just
consumes internally. The builtin divider would
change the settings on the 'output' clock.)

thanks
-- PMM

Re: [Qemu-devel] [PATCH 12/16] tcg: Remove unused TCG_CALL_DUMMY_TCGV

2017-06-27 Thread Alex Bennée


Richard Henderson  writes:

> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  tcg/tcg.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 1eeeca5..4f69d0c 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -503,7 +503,6 @@ static inline intptr_t QEMU_ARTIFICIAL 
> GET_TCGV_PTR(TCGv_ptr t)
>  #define TCG_CALL_NO_WG_SE   (TCG_CALL_NO_WG | TCG_CALL_NO_SE)
>
>  /* used to align parameters */
> -#define TCG_CALL_DUMMY_TCGV MAKE_TCGV_I32(-1)
>  #define TCG_CALL_DUMMY_ARG  ((TCGArg)(-1))
>
>  /* Conditions.  Note that these are laid out for easy manipulation by


--
Alex Bennée

Re: [Qemu-devel] [PATCH v4 5/5] tests: add functional test validating ipv4/ipv6 address flag handling

2017-06-27 Thread Daniel P. Berrange

On Mon, Jun 26, 2017 at 09:30:38PM -0500, Eric Blake wrote:
> On 06/26/2017 09:10 PM, Eric Blake wrote:
> > On 06/16/2017 05:12 AM, Daniel P. Berrange wrote:
> >> The semantics around handling ipv4=on|off & ipv6=on|off are quite
> >> subtle to understand in combination with the various hostname addresses
> >> and backend types. Introduce a massive test matrix that launches QEMU
> >> and validates the ability to connect a client on each protocol as
> >> appropriate.
> >>
> >> The test requires that the host has ability to bind to both :: and
> >> 0.0.0.0, on port 9000. If either protocol is not available, or if
> >> something is already listening on that port the test will skip.
> >>
> >> Although it isn't using the QTest APIs, it expects the
> >> QTEST_QEMU_BINARY env variable to be set.
> > 
> > I note that on failure, v3 created test-sockets-proto.pid in the current
> > working directory (the top level, if I ran 'make check-qtest') rather
> > than under the tests/ subdirectory or even better under a scratch
> > location that gets automatically cleaned up regardless of failure mode.
> > 
> > Since v3 failed for me, but v4 passes (and cleans up on success), I
> > can't say if that is still a problem in v4.
> 
> Scratch that - I just confirmed that 'test-sockets-proto.pid' is still
> created and now listed by 'git status' as a stray file after my in-tree
> build 'make check', even though the test passed.

Yep, I notice that I completely forgot to delete the pid file after
reading it.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH 13/16] tcg: Export temp_idx

2017-06-27 Thread Alex Bennée


Richard Henderson  writes:

> At the same time, drop the TCGContext argument and use tcg_ctx instead.
>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/tcg.c | 15 ---
>  tcg/tcg.h |  7 ++-
>  2 files changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index f8d96fa..26931a7 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -473,13 +473,6 @@ void tcg_func_start(TCGContext *s)
>  s->be = tcg_malloc(sizeof(TCGBackendData));
>  }
>
> -static inline int temp_idx(TCGContext *s, TCGTemp *ts)
> -{
> -ptrdiff_t n = ts - s->temps;
> -tcg_debug_assert(n >= 0 && n < s->nb_temps);
> -return n;
> -}
> -
>  static inline TCGTemp *tcg_temp_alloc(TCGContext *s)
>  {
>  int n = s->nb_temps++;
> @@ -516,7 +509,7 @@ static int tcg_global_reg_new_internal(TCGContext *s, 
> TCGType type,
>  ts->name = name;
>  tcg_regset_set_reg(s->reserved_regs, reg);
>
> -return temp_idx(s, ts);
> +return temp_idx(ts);
>  }
>
>  void tcg_set_frame(TCGContext *s, TCGReg reg, intptr_t start, intptr_t size)
> @@ -605,7 +598,7 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr 
> base,
>  ts->mem_offset = offset;
>  ts->name = name;
>  }
> -return temp_idx(s, ts);
> +return temp_idx(ts);
>  }
>
>  static int tcg_temp_new_internal(TCGType type, int temp_local)
> @@ -645,7 +638,7 @@ static int tcg_temp_new_internal(TCGType type, int 
> temp_local)
>  ts->temp_allocated = 1;
>  ts->temp_local = temp_local;
>  }
> -idx = temp_idx(s, ts);
> +idx = temp_idx(ts);
>  }
>
>  #if defined(CONFIG_DEBUG_TCG)
> @@ -963,7 +956,7 @@ static void tcg_reg_alloc_start(TCGContext *s)
>  static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
>   TCGTemp *ts)
>  {
> -int idx = temp_idx(s, ts);
> +int idx = temp_idx(ts);
>
>  if (ts->temp_global) {
>  pstrcpy(buf, buf_size, ts->name);
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 4f69d0c..b75a745 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -733,13 +733,18 @@ struct TCGContext {
>  extern TCGContext tcg_ctx;
>  extern bool parallel_cpus;
>
> -static inline TCGArg temp_arg(TCGTemp *ts)
> +static inline size_t temp_idx(TCGTemp *ts)
>  {
>  ptrdiff_t n = ts - tcg_ctx.temps;
>  tcg_debug_assert(n >= 0 && n < tcg_ctx.nb_temps);
>  return n;
>  }
>
> +static inline TCGArg temp_arg(TCGTemp *ts)
> +{
> +return temp_idx(ts);
> +}

I'm confused at the dropping of TCGArg in favour of size_t only for
temp_arg to implicitly cast it back. Was this meant to be part of
another patch?

> +
>  static inline TCGTemp *arg_temp(TCGArg a)
>  {
>  return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];


--
Alex Bennée

Re: [Qemu-devel] [PATCH 16/19] block: protect modification of dirty bitmaps with a mutex

2017-06-27 Thread Vladimir Sementsov-Ogievskiy


27.06.2017 12:27, Paolo Bonzini wrote:


On 27/06/2017 11:07, Vladimir Sementsov-Ogievskiy wrote:

26.06.2017 19:54, Paolo Bonzini wrote:

On 26/06/2017 18:07, Vladimir Sementsov-Ogievskiy wrote:

HI!

One question here, should not 'bdrv_undo_clear_dirty_bitmap' be under
lock too?

Any call to dirty bitmap functions between bdrv_clear_dirty_bitmap and
bdrv_undo_clear_dirty_bitmap is problematic anyway, so
bdrv_clear_dirty_bitmap really only needs the lock in the !out case;
bdrv_undo_clear_dirty_bitmap is only called when out != NULL.

However, I agree it would be cleaner to add the lock there, too.

Also, you've added comment "Called with BQL taken" both to functions
that calls bdrv_dirty_bitmaps_lock and that do not... What is BQL?

It's the "big QEMU lock", also known as "iothread lock".  The locking policy
is documented in block_int.h:

 /* Writing to the list requires the BQL _and_ the dirty_bitmap_mutex.
  * Reading from the list can be done with either the BQL or the
  * dirty_bitmap_mutex.  Modifying a bitmap only requires
  * dirty_bitmap_mutex.  */
 QemuMutex dirty_bitmap_mutex;
 QLIST_HEAD(, BdrvDirtyBitmap) dirty_bitmaps;

and the comments in block/dirty-bitmap.c reflect the above comment.

Paolo


bdrv_enable_dirty_bitmap - writes to the list, it changes 'disabled' 
field. So it requires both BQL and dirty_bitmap_mutex? But the comment 
says only about BQL.


Also, for example, if I want to create a new bitmap and than somehow 
change it, should I do it like this:


bdrv_create_dirty_bitmap(...)

bdrv_dirty_bitmaps_lock(bs)

bitmap  = bdrv_find_dirty_bitmap(bs, name)



bdrv_dirty_bitmaps_unlock(bs)

- because we can't now trust the pointer returned by 
bdrv_create_dirty_bitmap, as it releases bitmap lock before return.





--
Best regards,
Vladimir

[Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches

2017-06-27 Thread Lluís Vilanova

Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory
accesses), making it feasible to statically enable them by default on all QEMU
builds.

Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input
(medium size - suns.pl) and the guest_mem_before event:

* vanilla, statically disabled
real0m2,259s
user0m2,252s
sys 0m0,004s

* vanilla, statically enabled (overhead: 2.18x)
real0m4,921s
user0m4,912s
sys 0m0,008s

* multi-tb, statically disabled (overhead: 0.99x) [within noise range]
real0m2,228s
user0m2,216s
sys 0m0,008s

* multi-tb, statically enabled (overhead: 0.99x) [within noise range]
real0m2,229s
user0m2,224s
sys 0m0,004s


Right now, events with the 'tcg' property always generate TCG code to trace that
event at guest code execution time, where the event's dynamic state is checked.

This series adds a performance optimization where TCG code for events with the
'tcg' and 'vcpu' properties is not generated if the event is dynamically
disabled. This optimization raises two issues:

* An event can be dynamically disabled/enabled after the corresponding TCG code
  has been generated (i.e., a new TB with the corresponding code should be
  used).

* Each vCPU can have a different dynamic state for the same event (i.e., tracing
  the memory accesses of only one process pinned to a vCPU).

To handle both issues, this series integrates the dynamic tracing event state
into the TB hashing function, so that vCPUs tracing different events will use
separate TBs. Note that only events with the 'vcpu' property are used for
hashing (as stored in the bitmap of CPUState->trace_dstate).

This makes dynamic event state changes on vCPUs very efficient, since they can
use TBs produced by other vCPUs while on the same event state combination (or
produced by the same vCPU, earlier).

Discarded alternatives:

* Emitting TCG code to check if an event needs tracing, where we should still
  move the tracing call code to either a cold path (making tracing performance
  worse), or leave it inlined (making non-tracing performance worse).

* Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling
  it on a single vCPU will impact the performance of all other vCPUs that are
  not tracing that event.

Signed-off-by: Lluís Vilanova 
---

Changes in v9
=

* Rebase on 931892e8a6.
* Undo renaming of tb->trace_vcpu_dstate to the shorter tb->trace_ds.
* Add measurements to commit enabling all guest events.


Changes in v8
=

[Emilio G. Cota]

* Ported to current dev tree.

* Allocate cpu->trace_dstate statically. This
  * allows us to drop the event_count inline patch.
  * simplifies and improves the performance of accessing cpu->trace_dstate:
we just need to dereference, instead of going through bitmap_copy and
an intermediate unsigned long.

* If we try to register more CPU events than the max we support (there's a
  constant for it), drop the event and tell the user with error_report. But
  really this is a bug, since we control what CPU events are traceable. Should
  we abort() as well?

* Added rth's R-b tag

* Addressed my own comments:
  * rename tb->trace_vcpu_dstate to the shorter tb->trace_ds
  * use uint32_t for tb->trace_ds instead of a typedef
  * add BUILD_BUG_ON check to make sure tb->trace_ds is big enough
  * fix xxhash

* Do not add trace_dstate to tb_htable_lookup, since we can grab it from
  cpu->trace_dstate.

This patchset applies cleanly on top of rth's tcg-next (a01792e1e).


Changes in v7
=

* Fix delayed dstate changes (now uses async_run_on_cpu() as suggested by Paolo
  Bonzini).

* Note to Richard: patch 4 has been adapted to the new patch 3 async callback,
  but is essentially the same.


Changes in v6
=

* Check hashing size error with QEMU_BUILD_BUG_ON [Richard Henderson].


Changes in v5
=

* Move define into "qemu-common.h" to allow compilation of tests.


Changes in v4
=

* Incorporate trace_dstate into the TB hashing function instead of using
  multiple physical TB caches [suggested by Richard Henderson].


Changes in v3
=

* Rebase on 0737f32daf.
* Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi].
* Refactor trace_get_vcpu_event_count() to be inlinable.
* Optimize cpu_tb_cache_set_requested() (hottest path).


Changes in v2
=

* Fix bitmap copy in cpu_tb_cache_set_apply().
* Split generated code re-alignment into a separate patch [Daniel P. Berrange].

Lluís Vilanova (7):
  exec: [tcg] Refactor flush of per-CPU virtual TB cache
  trace: Allocate cpu->trace_dstate in place
  trace: [tcg] Delay changes to dynamic state when translating
  exec: [tcg] Use different TBs according to the vCPU's dynamic tracing 
state
  trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
  trace: [tcg,trivial] Re-align generated code
  trace: [trivial]

[Qemu-devel] [PATCH v5 0/3] Add bitmap for received pages in postcopy migration

2017-06-27 Thread Alexey Perevalov

This is 5th version of
[PATCH v1 0/2] Add bitmap for copied pages in postcopy migration
cover message from there

This is a separate patch set, it derived from
https://www.mail-archive.com/qemu-devel@nongnu.org/msg456004.html

There are several possible use cases:
1. solve issue with postcopy live migration and shared memory.
OVS-VSWITCH requires information about copied pages, to fallocate
newly allocated pages.
2. calculation vCPU blocktime
for more details see
https://www.mail-archive.com/qemu-devel@nongnu.org/msg456004.html
3. Recovery after fail in the middle of postcopy migration 

Declaration is placed in two places include/migration/migration.h and into
migration/postcopy-ram.h, because some functions are required in virtio and
into public function include/exec/ram_addr.h.


V4 -> V5
- remove ramblock_recv_bitmap_clear_range in favor to bitmap_clear (comment 
from David)
- single invocation place for ramblock_recv_bitmap_set (comment from Peter)
- minor changes like removing comment from qemu_ufd_copy_ioctl and local 
variable from
ramblock_recv_map_init (comment from Peter)

V3 -> V4
- clear_bit instead of ramblock_recv_bitmap_clear in 
ramblock_recv_bitmap_clear_range,
it reduced number of operation (comment from Juan)
- for postcopy ramblock_recv_bitmap_set is calling after page was copied,
only in case of success (comment from David)
- indentation fixes (comment from Juan)

V2 -> V3
- ramblock_recv_map_init call is placed into migration_incoming_get_current,
looks like it's general place for both precopy and postcopy case.
- received bitmap memory releasing is placed into ram_load_cleanup,
unfortunatelly, it calls only in case of precopy.
- precopy case and discard ram block case
- function renaming, and another minor cleanups

V1 -> V2
- change in terminology s/copied/received/g
- granularity became TARGET_PAGE_SIZE, but not actual page size of the
ramblock
- movecopiedmap & get_copiedmap_size were removed, until patch set where
it will be necessary
- releasing memory of receivedmap was added into ram_load_cleanup
- new patch "migration: introduce qemu_ufd_copy_ioctl helper"

Patchset is based on Juan's patchset:
[PATCH v2 0/5] Create setup/cleanup methods for migration incoming side

Alexey Perevalov (3):
  migration: postcopy_place_page factoring out
  migration: introduce qemu_ufd_copy_ioctl helper
  migration: add bitmap for received page

 include/exec/ram_addr.h  | 10 +
 migration/migration.c|  1 +
 migration/postcopy-ram.c | 53 +++-
 migration/postcopy-ram.h |  4 ++--
 migration/ram.c  | 46 -
 migration/ram.h  |  6 ++
 6 files changed, 94 insertions(+), 26 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v5 2/3] migration: introduce qemu_ufd_copy_ioctl helper

2017-06-27 Thread Alexey Perevalov

Just for placing auxilary operations inside helper,
auxilary operations like: track received pages,
notify about copying operation in futher patches.

Reviewed-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Peter Xu 
Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index dae41b5..293db97 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -561,6 +561,25 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 return 0;
 }
 
+static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
+void *from_addr, uint64_t pagesize)
+{
+if (from_addr) {
+struct uffdio_copy copy_struct;
+copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
+copy_struct.src = (uint64_t)(uintptr_t)from_addr;
+copy_struct.len = pagesize;
+copy_struct.mode = 0;
+return ioctl(userfault_fd, UFFDIO_COPY, ©_struct);
+} else {
+struct uffdio_zeropage zero_struct;
+zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
+zero_struct.range.len = pagesize;
+zero_struct.mode = 0;
+return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
+}
+}
+
 /*
  * Place a host page (from) at (host) atomically
  * returns 0 on success
@@ -568,20 +587,14 @@ int postcopy_ram_enable_notify(MigrationIncomingState 
*mis)
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 RAMBlock *rb)
 {
-struct uffdio_copy copy_struct;
 size_t pagesize = qemu_ram_pagesize(rb);
 
-copy_struct.dst = (uint64_t)(uintptr_t)host;
-copy_struct.src = (uint64_t)(uintptr_t)from;
-copy_struct.len = pagesize;
-copy_struct.mode = 0;
-
 /* copy also acks to the kernel waking the stalled thread up
  * TODO: We can inhibit that ack and only do it if it was requested
  * which would be slightly cheaper, but we'd have to be careful
  * of the order of updating our page state.
  */
-if (ioctl(mis->userfault_fd, UFFDIO_COPY, ©_struct)) {
+if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, from, pagesize)) {
 int e = errno;
 error_report("%s: %s copy host: %p from: %p (size: %zd)",
  __func__, strerror(e), host, from, pagesize);
@@ -603,12 +616,7 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, 
void *host,
 trace_postcopy_place_page_zero(host);
 
 if (qemu_ram_pagesize(rb) == getpagesize()) {
-struct uffdio_zeropage zero_struct;
-zero_struct.range.start = (uint64_t)(uintptr_t)host;
-zero_struct.range.len = getpagesize();
-zero_struct.mode = 0;
-
-if (ioctl(mis->userfault_fd, UFFDIO_ZEROPAGE, &zero_struct)) {
+if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, getpagesize())) 
{
 int e = errno;
 error_report("%s: %s zero host: %p",
  __func__, strerror(e), host);
-- 
1.8.3.1

[Qemu-devel] [PATCH v5 1/3] migration: postcopy_place_page factoring out

2017-06-27 Thread Alexey Perevalov

Need to mark copied pages as closer as possible to the place where it
tracks down. That will be necessary in futher patch.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 13 +++--
 migration/postcopy-ram.h |  4 ++--
 migration/ram.c  |  4 ++--
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index c8c4500..dae41b5 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -566,9 +566,10 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
  * returns 0 on success
  */
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-size_t pagesize)
+RAMBlock *rb)
 {
 struct uffdio_copy copy_struct;
+size_t pagesize = qemu_ram_pagesize(rb);
 
 copy_struct.dst = (uint64_t)(uintptr_t)host;
 copy_struct.src = (uint64_t)(uintptr_t)from;
@@ -597,11 +598,11 @@ int postcopy_place_page(MigrationIncomingState *mis, void 
*host, void *from,
  * returns 0 on success
  */
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
- size_t pagesize)
+ RAMBlock *rb)
 {
 trace_postcopy_place_page_zero(host);
 
-if (pagesize == getpagesize()) {
+if (qemu_ram_pagesize(rb) == getpagesize()) {
 struct uffdio_zeropage zero_struct;
 zero_struct.range.start = (uint64_t)(uintptr_t)host;
 zero_struct.range.len = getpagesize();
@@ -631,7 +632,7 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, 
void *host,
 memset(mis->postcopy_tmp_zero_page, '\0', mis->largest_page_size);
 }
 return postcopy_place_page(mis, host, mis->postcopy_tmp_zero_page,
-   pagesize);
+   rb);
 }
 
 return 0;
@@ -694,14 +695,14 @@ int postcopy_ram_enable_notify(MigrationIncomingState 
*mis)
 }
 
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-size_t pagesize)
+RAMBlock *rb)
 {
 assert(0);
 return -1;
 }
 
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
-size_t pagesize)
+RAMBlock *rb)
 {
 assert(0);
 return -1;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 52d51e8..78a3591 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -72,14 +72,14 @@ void postcopy_discard_send_finish(MigrationState *ms,
  * returns 0 on success
  */
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-size_t pagesize);
+RAMBlock *rb);
 
 /*
  * Place a zero page at (host) atomically
  * returns 0 on success
  */
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
- size_t pagesize);
+ RAMBlock *rb);
 
 /* The current postcopy state is read/set by postcopy_state_get/set
  * which update it atomically.
diff --git a/migration/ram.c b/migration/ram.c
index 8dbdfdb..f50479d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2465,10 +2465,10 @@ static int ram_load_postcopy(QEMUFile *f)
 
 if (all_zero) {
 ret = postcopy_place_page_zero(mis, place_dest,
-   block->page_size);
+   block);
 } else {
 ret = postcopy_place_page(mis, place_dest,
-  place_source, block->page_size);
+  place_source, block);
 }
 }
 if (!ret) {
-- 
1.8.3.1

[Qemu-devel] [PATCH v5 3/3] migration: add bitmap for received page

2017-06-27 Thread Alexey Perevalov

This patch adds ability to track down already received
pages, it's necessary for calculation vCPU block time in
postcopy migration feature, maybe for restore after
postcopy migration failure.
Also it's necessary to solve shared memory issue in
postcopy livemigration. Information about received pages
will be transferred to the software virtual bridge
(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
already received pages. fallocate syscall is required for
remmaped shared memory, due to remmaping itself blocks
ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
error (struct page is exists after remmap).

Bitmap is placed into RAMBlock as another postcopy/precopy
related bitmaps.

Signed-off-by: Alexey Perevalov 
---
 include/exec/ram_addr.h  | 10 ++
 migration/migration.c|  1 +
 migration/postcopy-ram.c | 16 +++-
 migration/ram.c  | 42 +++---
 migration/ram.h  |  6 ++
 5 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 140efa8..4170656 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -47,6 +47,8 @@ struct RAMBlock {
  * of the postcopy phase
  */
 unsigned long *unsentmap;
+/* bitmap of already received pages in postcopy */
+unsigned long *receivedmap;
 };
 
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
@@ -60,6 +62,14 @@ static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t 
offset)
 return (char *)block->host + offset;
 }
 
+static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
+RAMBlock *rb)
+{
+uint64_t host_addr_offset =
+(uint64_t)(uintptr_t)(host_addr - (void *)rb->host);
+return host_addr_offset >> TARGET_PAGE_BITS;
+}
+
 long qemu_getrampagesize(void);
 unsigned long last_ram_page(void);
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
diff --git a/migration/migration.c b/migration/migration.c
index 71e38bc..53fbd41 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -143,6 +143,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
 qemu_mutex_init(&mis_current.rp_mutex);
 qemu_event_init(&mis_current.main_thread_load_event, false);
 once = true;
+ramblock_recv_map_init();
 }
 return &mis_current;
 }
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 293db97..f980d93 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -562,22 +562,27 @@ int postcopy_ram_enable_notify(MigrationIncomingState 
*mis)
 }
 
 static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
-void *from_addr, uint64_t pagesize)
+   void *from_addr, uint64_t pagesize, RAMBlock 
*rb)
 {
+int ret;
 if (from_addr) {
 struct uffdio_copy copy_struct;
 copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
 copy_struct.src = (uint64_t)(uintptr_t)from_addr;
 copy_struct.len = pagesize;
 copy_struct.mode = 0;
-return ioctl(userfault_fd, UFFDIO_COPY, ©_struct);
+ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct);
 } else {
 struct uffdio_zeropage zero_struct;
 zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
 zero_struct.range.len = pagesize;
 zero_struct.mode = 0;
-return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
+ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
+}
+if (!ret) {
+ramblock_recv_bitmap_set(host_addr, rb);
 }
+return ret;
 }
 
 /*
@@ -594,7 +599,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void 
*host, void *from,
  * which would be slightly cheaper, but we'd have to be careful
  * of the order of updating our page state.
  */
-if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, from, pagesize)) {
+if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, from, pagesize, rb)) {
 int e = errno;
 error_report("%s: %s copy host: %p from: %p (size: %zd)",
  __func__, strerror(e), host, from, pagesize);
@@ -616,7 +621,8 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, 
void *host,
 trace_postcopy_place_page_zero(host);
 
 if (qemu_ram_pagesize(rb) == getpagesize()) {
-if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, getpagesize())) 
{
+if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, getpagesize(),
+rb)) {
 int e = errno;
 error_report("%s: %s zero host: %p",
  __func__, strerror(e), host);
diff --git a/migration/ram.c b/migration/ram.c
index f50479d..95962a0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -151,6 +151,32 @@ out:
 return ret;
 }
 
+void ramblock_recv_map_init(void)
+

[Qemu-devel] [PATCH v9 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache

2017-06-27 Thread Lluís Vilanova

The function is reused in later patches.

Signed-off-by: Lluís Vilanova 
Reviewed-by: Richard Henderson 
---
 accel/tcg/cputlb.c|2 +-
 accel/tcg/translate-all.c |   15 ++-
 include/exec/exec-all.h   |6 ++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 743776ae19..6a2b762325 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -118,7 +118,7 @@ static void tlb_flush_nocheck(CPUState *cpu)
 
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
-memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+tb_flush_jmp_cache_all(cpu);
 
 env->vtlb_index = 0;
 env->tlb_flush_addr = -1;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f6ad46b613..03820e3aeb 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -928,11 +928,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data 
tb_flush_count)
 }
 
 CPU_FOREACH(cpu) {
-int i;
-
-for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
-atomic_set(&cpu->tb_jmp_cache[i], NULL);
-}
+tb_flush_jmp_cache_all(cpu);
 }
 
 tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -949,6 +945,15 @@ done:
 tb_unlock();
 }
 
+void tb_flush_jmp_cache_all(CPUState *cpu)
+{
+int i;
+
+for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
+atomic_set(&cpu->tb_jmp_cache[i], NULL);
+}
+}
+
 void tb_flush(CPUState *cpu)
 {
 if (tcg_enabled()) {
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 724ec73dce..b0281b000f 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -366,6 +366,12 @@ struct TranslationBlock {
 };
 
 void tb_free(TranslationBlock *tb);
+/**
+ * tb_flush_jmp_cache_all:
+ *
+ * Flush the virtual translation block cache.
+ */
+void tb_flush_jmp_cache_all(CPUState *env);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,

Re: [Qemu-devel] [PATCH 14/16] tcg: Use per-temp state data in optimize

2017-06-27 Thread Alex Bennée


Richard Henderson  writes:

> While we're touching many of the lines anyway, adjust the naming
> of the functions to better distinguish when "TCGArg" vs "TCGTemp"
> should be used.

Could we add definitions of TCGArg and TCGTemp into tcg/README?

>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/optimize.c | 424 
> +
>  tcg/tcg.h  |   5 +
>  2 files changed, 249 insertions(+), 180 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 55f9e83..eb09ae5 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -34,34 +34,63 @@
>
>  struct tcg_temp_info {
>  bool is_const;
> -uint16_t prev_copy;
> -uint16_t next_copy;
> +TCGTemp *prev_copy;
> +TCGTemp *next_copy;
>  tcg_target_ulong val;
>  tcg_target_ulong mask;
>  };
>
> -static struct tcg_temp_info temps[TCG_MAX_TEMPS];
> +static struct tcg_temp_info temps_[TCG_MAX_TEMPS];
>  static TCGTempSet temps_used;
>
> -static inline bool temp_is_const(TCGArg arg)
> +static inline struct tcg_temp_info *ts_info(TCGTemp *ts)
>  {
> -return temps[arg].is_const;
> +return ts->state_ptr;
>  }
>
> -static inline bool temp_is_copy(TCGArg arg)
> +static inline struct tcg_temp_info *arg_info(TCGArg arg)
>  {
> -return temps[arg].next_copy != arg;
> +return ts_info(arg_temp(arg));
> +}
> +
> +static inline bool ts_is_const(TCGTemp *ts)
> +{
> +return ts_info(ts)->is_const;
> +}
> +
> +static inline bool arg_is_const(TCGArg arg)
> +{
> +return ts_is_const(arg_temp(arg));
> +}
> +
> +static inline bool ts_is_copy(TCGTemp *ts)
> +{
> +return ts_info(ts)->next_copy != ts;
> +}
> +
> +static inline bool arg_is_copy(TCGArg arg)
> +{
> +return ts_is_copy(arg_temp(arg));
>  }
>
>  /* Reset TEMP's state, possibly removing the temp for the list of copies.  */
> -static void reset_temp(TCGArg temp)
> +static void reset_ts(TCGTemp *ts)
>  {
> -temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
> -temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
> -temps[temp].next_copy = temp;
> -temps[temp].prev_copy = temp;
> -temps[temp].is_const = false;
> -temps[temp].mask = -1;
> +struct tcg_temp_info *ti = ts_info(ts);
> +struct tcg_temp_info *pi = ts_info(ti->prev_copy);
> +struct tcg_temp_info *ni = ts_info(ti->next_copy);
> +
> +ni->prev_copy = ti->prev_copy;
> +pi->next_copy = ti->next_copy;
> +ti->next_copy = ts;
> +ti->prev_copy = ts;
> +ti->is_const = false;
> +ti->mask = -1;
> +}
> +
> +static void reset_temp(TCGArg arg)
> +{
> +reset_ts(arg_temp(arg));
>  }
>
>  /* Reset all temporaries, given that there are NB_TEMPS of them.  */
> @@ -71,17 +100,26 @@ static void reset_all_temps(int nb_temps)
>  }
>
>  /* Initialize and activate a temporary.  */
> -static void init_temp_info(TCGArg temp)
> +static void init_ts_info(TCGTemp *ts)
>  {
> -if (!test_bit(temp, temps_used.l)) {
> -temps[temp].next_copy = temp;
> -temps[temp].prev_copy = temp;
> -temps[temp].is_const = false;
> -temps[temp].mask = -1;
> -set_bit(temp, temps_used.l);
> +size_t idx = temp_idx(ts);
> +if (!test_bit(idx, temps_used.l)) {
> +struct tcg_temp_info *ti = &temps_[idx];
> +
> +ts->state_ptr = ti;
> +ti->next_copy = ts;
> +ti->prev_copy = ts;
> +ti->is_const = false;
> +ti->mask = -1;
> +set_bit(idx, temps_used.l);
>  }
>  }
>
> +static void init_arg_info(TCGArg arg)
> +{
> +init_ts_info(arg_temp(arg));
> +}
> +
>  static int op_bits(TCGOpcode op)
>  {
>  const TCGOpDef *def = &tcg_op_defs[op];
> @@ -119,7 +157,7 @@ static TCGOpcode op_to_movi(TCGOpcode op)
>  static TCGArg find_better_copy(TCGContext *s, TCGArg arg)
>  {
>  TCGTemp *ts = arg_temp(arg);
> -TCGArg i;
> +TCGTemp *i;
>
>  /* If this is already a global, we can't do better. */
>  if (ts->temp_global) {
> @@ -127,17 +165,17 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg 
> arg)
>  }
>
>  /* Search for a global first. */
> -for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
> -if (i < s->nb_globals) {
> -return i;
> +for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
> +if (i->temp_global) {
> +return temp_arg(i);
>  }
>  }
>
>  /* If it is a temp, search for a temp local. */
>  if (!ts->temp_local) {
> -for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
> -if (s->temps[i].temp_local) {
> -return i;
> +for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) 
> {
> +if (ts->temp_local) {
> +return temp_arg(i);
>  }
>  }
>  }
> @@ -146,20 +184,20 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg 
> arg)
>  return arg;
>  }
>
> -static bool temps_ar

[Qemu-devel] [PATCH v9 2/7] trace: Allocate cpu->trace_dstate in place

2017-06-27 Thread Lluís Vilanova

There's little point in dynamically allocating the bitmap if we
know at compile-time the max number of events we want to support.
Thus, make room in the struct for the bitmap, which will make things
easier later: this paves the way for upcoming changes, in which
we'll use a u32 to fully capture cpu->trace_dstate.

This change also increases performance by saving a dereference and
improving locality--note that this is important since upcoming work
makes reading this bitmap fairly common.

Signed-off-by: Emilio G. Cota 
Reviewed-by: Lluís Vilanova 
---
 include/qom/cpu.h |9 +++--
 qom/cpu.c |8 
 trace/control.c   |9 -
 3 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 89ddb686fb..bc6e20f056 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -259,6 +259,7 @@ typedef void (*run_on_cpu_func)(CPUState *cpu, 
run_on_cpu_data data);
 struct qemu_work_item;
 
 #define CPU_UNSET_NUMA_NODE_ID -1
+#define CPU_TRACE_DSTATE_MAX_EVENTS 32
 
 /**
  * CPUState:
@@ -373,12 +374,8 @@ struct CPUState {
 struct KVMState *kvm_state;
 struct kvm_run *kvm_run;
 
-/*
- * Used for events with 'vcpu' and *without* the 'disabled' properties.
- * Dynamically allocated based on bitmap requried to hold up to
- * trace_get_vcpu_event_count() entries.
- */
-unsigned long *trace_dstate;
+/* Used for events with 'vcpu' and *without* the 'disabled' properties */
+DECLARE_BITMAP(trace_dstate, CPU_TRACE_DSTATE_MAX_EVENTS);
 
 /* TODO Move common fields from CPUArchState here. */
 int cpu_index; /* used by alpha TCG */
diff --git a/qom/cpu.c b/qom/cpu.c
index 50698767dd..69fbb9cc95 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -382,7 +382,6 @@ static void cpu_common_unrealizefn(DeviceState *dev, Error 
**errp)
 
 static void cpu_common_initfn(Object *obj)
 {
-uint32_t count;
 CPUState *cpu = CPU(obj);
 CPUClass *cc = CPU_GET_CLASS(obj);
 
@@ -397,18 +396,11 @@ static void cpu_common_initfn(Object *obj)
 QTAILQ_INIT(&cpu->breakpoints);
 QTAILQ_INIT(&cpu->watchpoints);
 
-count = trace_get_vcpu_event_count();
-if (count) {
-cpu->trace_dstate = bitmap_new(count);
-}
-
 cpu_exec_initfn(cpu);
 }
 
 static void cpu_common_finalize(Object *obj)
 {
-CPUState *cpu = CPU(obj);
-g_free(cpu->trace_dstate);
 }
 
 static int64_t cpu_common_get_arch_id(CPUState *cpu)
diff --git a/trace/control.c b/trace/control.c
index 9b157b0ca7..83740aa7ee 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -65,8 +65,15 @@ void trace_event_register_group(TraceEvent **events)
 size_t i;
 for (i = 0; events[i] != NULL; i++) {
 events[i]->id = next_id++;
-if (events[i]->vcpu_id != TRACE_VCPU_EVENT_NONE) {
+if (events[i]->vcpu_id == TRACE_VCPU_EVENT_NONE) {
+continue;
+}
+
+if (likely(next_vcpu_id < CPU_TRACE_DSTATE_MAX_EVENTS)) {
 events[i]->vcpu_id = next_vcpu_id++;
+} else {
+error_report("WARNING: too many vcpu trace events; dropping '%s'",
+ events[i]->name);
 }
 }
 event_groups = g_renew(TraceEventGroup, event_groups, nevent_groups + 1);

[Qemu-devel] [PATCH v9 3/7] trace: [tcg] Delay changes to dynamic state when translating

2017-06-27 Thread Lluís Vilanova

This keeps consistency across all decisions taken during translation
when the dynamic state of a vCPU is changed in the middle of translating
some guest code.

Signed-off-by: Lluís Vilanova 
Reviewed-by: Richard Henderson 
---
 include/qom/cpu.h  |3 +++
 trace/control-target.c |   20 +---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index bc6e20f056..29f4a32572 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -303,6 +303,8 @@ struct qemu_work_item;
  * @kvm_fd: vCPU file descriptor for KVM.
  * @work_mutex: Lock to prevent multiple access to queued_work_*.
  * @queued_work_first: First asynchronous work pending.
+ * @trace_dstate_delayed: Delayed changes to trace_dstate (includes all changes
+ *to @trace_dstate).
  * @trace_dstate: Dynamic tracing state of events for this vCPU (bitmask).
  *
  * State of one CPU core or thread.
@@ -375,6 +377,7 @@ struct CPUState {
 struct kvm_run *kvm_run;
 
 /* Used for events with 'vcpu' and *without* the 'disabled' properties */
+DECLARE_BITMAP(trace_dstate_delayed, CPU_TRACE_DSTATE_MAX_EVENTS);
 DECLARE_BITMAP(trace_dstate, CPU_TRACE_DSTATE_MAX_EVENTS);
 
 /* TODO Move common fields from CPUArchState here. */
diff --git a/trace/control-target.c b/trace/control-target.c
index 6266e6380d..9d44566c95 100644
--- a/trace/control-target.c
+++ b/trace/control-target.c
@@ -1,7 +1,7 @@
 /*
  * Interface for configuring and controlling the state of tracing events.
  *
- * Copyright (C) 2014-2016 Lluís Vilanova 
+ * Copyright (C) 2014-2017 Lluís Vilanova 
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -57,6 +57,13 @@ void trace_event_set_state_dynamic(TraceEvent *ev, bool 
state)
 }
 }
 
+static void trace_event_synchronize_vcpu_state_dynamic(
+CPUState *vcpu, run_on_cpu_data ignored)
+{
+bitmap_copy(vcpu->trace_dstate, vcpu->trace_dstate_delayed,
+trace_get_vcpu_event_count());
+}
+
 void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
 TraceEvent *ev, bool state)
 {
@@ -69,13 +76,20 @@ void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
 if (state_pre != state) {
 if (state) {
 trace_events_enabled_count++;
-set_bit(vcpu_id, vcpu->trace_dstate);
+set_bit(vcpu_id, vcpu->trace_dstate_delayed);
 (*ev->dstate)++;
 } else {
 trace_events_enabled_count--;
-clear_bit(vcpu_id, vcpu->trace_dstate);
+clear_bit(vcpu_id, vcpu->trace_dstate_delayed);
 (*ev->dstate)--;
 }
+/*
+ * Delay changes until next TB; we want all TBs to be built from a
+ * single set of dstate values to ensure consistency of generated
+ * tracing code.
+ */
+async_run_on_cpu(vcpu, trace_event_synchronize_vcpu_state_dynamic,
+ RUN_ON_CPU_NULL);
 }
 }

[Qemu-devel] [PATCH v9 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state

2017-06-27 Thread Lluís Vilanova

Every vCPU now uses a separate set of TBs for each set of dynamic
tracing event state values. Each set of TBs can be used by any number of
vCPUs to maximize TB reuse when vCPUs have the same tracing state.

This feature is later used by tracetool to optimize tracing of guest
code events.

The maximum number of TB sets is defined as 2^E, where E is the number
of events that have the 'vcpu' property (their state is stored in
CPUState->trace_dstate).

For this to work, a change on the dynamic tracing state of a vCPU will
force it to flush its virtual TB cache (which is only indexed by
address), and fall back to the physical TB cache (which now contains the
vCPU's dynamic tracing state as part of the hashing function).

Signed-off-by: Lluís Vilanova 
Reviewed-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c  |8 ++--
 accel/tcg/translate-all.c |   11 +--
 include/exec/exec-all.h   |6 ++
 include/exec/tb-hash-xx.h |7 +--
 include/exec/tb-hash.h|5 +++--
 tcg/tcg-runtime.c |3 ++-
 tests/qht-bench.c |2 +-
 trace/control-target.c|1 +
 trace/control.h   |3 +++
 9 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 3581618bc0..d84b01d1b8 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -280,6 +280,7 @@ struct tb_desc {
 CPUArchState *env;
 tb_page_addr_t phys_page1;
 uint32_t flags;
+uint32_t trace_vcpu_dstate;
 };
 
 static bool tb_cmp(const void *p, const void *d)
@@ -291,6 +292,7 @@ static bool tb_cmp(const void *p, const void *d)
 tb->page_addr[0] == desc->phys_page1 &&
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
+tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
 !atomic_read(&tb->invalid)) {
 /* check next page if needed */
 if (tb->page_addr[1] == -1) {
@@ -319,10 +321,11 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
 desc.env = (CPUArchState *)cpu->env_ptr;
 desc.cs_base = cs_base;
 desc.flags = flags;
+desc.trace_vcpu_dstate = *cpu->trace_dstate;
 desc.pc = pc;
 phys_pc = get_page_addr_code(desc.env, pc);
 desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
-h = tb_hash_func(phys_pc, pc, flags);
+h = tb_hash_func(phys_pc, pc, flags, *cpu->trace_dstate);
 return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
 }
 
@@ -342,7 +345,8 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
 tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
 if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
- tb->flags != flags)) {
+ tb->flags != flags ||
+ tb->trace_vcpu_dstate != *cpu->trace_dstate)) {
 tb = tb_htable_lookup(cpu, pc, cs_base, flags);
 if (!tb) {
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 03820e3aeb..71badaaa6b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -54,6 +54,7 @@
 #include "exec/tb-hash.h"
 #include "translate-all.h"
 #include "qemu/bitmap.h"
+#include "qemu/error-report.h"
 #include "qemu/timer.h"
 #include "qemu/main-loop.h"
 #include "exec/log.h"
@@ -112,6 +113,11 @@ typedef struct PageDesc {
 #define V_L2_BITS 10
 #define V_L2_SIZE (1 << V_L2_BITS)
 
+/* Make sure all possible CPU event bits fit in tb->trace_ds */
+QEMU_BUILD_BUG_ON(CPU_TRACE_DSTATE_MAX_EVENTS >
+  sizeof(((TranslationBlock *)0)->trace_vcpu_dstate)
+  * BITS_PER_BYTE);
+
 uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
 
@@ -1102,7 +1108,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 
 /* remove the TB from the hash list */
 phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
-h = tb_hash_func(phys_pc, tb->pc, tb->flags);
+h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
 qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
 
 /* remove the TB from the page list */
@@ -1247,7 +1253,7 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 }
 
 /* add in the hash table */
-h = tb_hash_func(phys_pc, tb->pc, tb->flags);
+h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
 qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
 
 #ifdef DEBUG_TB_CHECK
@@ -1293,6 +1299,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tb->cs_base = cs_base;
 tb->flags = flags;
 tb->cflags = cflags;
+tb->trace_vcpu_dstate = *cpu->trace_dstate;
 tb->invalid = false;
 
 #ifdef CONFIG_PROFILER
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b0281b000f..d918410944 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -310,6 +310,10 @@ static inline void 
tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
 #defin

[Qemu-devel] [PATCH v9 5/7] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events

2017-06-27 Thread Lluís Vilanova

If an event is dynamically disabled, the TCG code that calls the
execution-time tracer is not generated.

Removes the overheads of execution-time tracers for dynamically disabled
events. As a bonus, also avoids checking the event state when the
execution-time tracer is called from TCG-generated code (since otherwise
TCG would simply not call it).

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool/__init__.py|3 ++-
 scripts/tracetool/format/h.py|   26 +++---
 scripts/tracetool/format/tcg_h.py|   21 +
 scripts/tracetool/format/tcg_helper_c.py |5 +++--
 4 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 1ffbc1dc40..d4c204a472 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -6,7 +6,7 @@ Machinery for generating tracing-related intermediate files.
 """
 
 __author__ = "Lluís Vilanova "
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova "
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova "
 __license__= "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -268,6 +268,7 @@ class Event(object):
 return self._FMT.findall(self.fmt)
 
 QEMU_TRACE   = "trace_%(name)s"
+QEMU_TRACE_NOCHECK   = "_nocheck__" + QEMU_TRACE
 QEMU_TRACE_TCG   = QEMU_TRACE + "_tcg"
 QEMU_DSTATE  = "_TRACE_%(NAME)s_DSTATE"
 QEMU_EVENT   = "_TRACE_%(NAME)s_EVENT"
diff --git a/scripts/tracetool/format/h.py b/scripts/tracetool/format/h.py
index 3682f4e6a8..aecf249d66 100644
--- a/scripts/tracetool/format/h.py
+++ b/scripts/tracetool/format/h.py
@@ -6,7 +6,7 @@ trace/generated-tracers.h
 """
 
 __author__ = "Lluís Vilanova "
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova "
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova "
 __license__= "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -49,6 +49,19 @@ def generate(events, backend, group):
 backend.generate_begin(events, group)
 
 for e in events:
+# tracer without checks
+out('',
+'static inline void %(api)s(%(args)s)',
+'{',
+api=e.api(e.QEMU_TRACE_NOCHECK),
+args=e.args)
+
+if "disable" not in e.properties:
+backend.generate(e, group)
+
+out('}')
+
+# tracer wrapper with checks (per-vCPU tracing)
 if "vcpu" in e.properties:
 trace_cpu = next(iter(e.args))[1]
 cond = "trace_event_get_vcpu_state(%(cpu)s,"\
@@ -63,16 +76,15 @@ def generate(events, backend, group):
 'static inline void %(api)s(%(args)s)',
 '{',
 'if (%(cond)s) {',
+'%(api_nocheck)s(%(names)s);',
+'}',
+'}',
 api=e.api(),
+api_nocheck=e.api(e.QEMU_TRACE_NOCHECK),
 args=e.args,
+names=", ".join(e.args.names()),
 cond=cond)
 
-if "disable" not in e.properties:
-backend.generate(e, group)
-
-out('}',
-'}')
-
 backend.generate_end(events, group)
 
 out('#endif /* TRACE_%s_GENERATED_TRACERS_H */' % group.upper())
diff --git a/scripts/tracetool/format/tcg_h.py 
b/scripts/tracetool/format/tcg_h.py
index db55f52eb5..1651cc3f71 100644
--- a/scripts/tracetool/format/tcg_h.py
+++ b/scripts/tracetool/format/tcg_h.py
@@ -6,7 +6,7 @@ Generate .h file for TCG code generation.
 """
 
 __author__ = "Lluís Vilanova "
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova "
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova "
 __license__= "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -46,7 +46,7 @@ def generate(events, backend, group):
 
 for e in events:
 # just keep one of them
-if "tcg-trans" not in e.properties:
+if "tcg-exec" not in e.properties:
 continue
 
 out('static inline void %(name_tcg)s(%(args)s)',
@@ -58,12 +58,25 @@ def generate(events, backend, group):
 args_trans = e.original.event_trans.args
 args_exec = tracetool.vcpu.transform_args(
 "tcg_helper_c", e.original.event_exec, "wrapper")
+if "vcpu" in e.properties:
+trace_cpu = e.args.names()[0]
+cond = "trace_event_get_vcpu_state(%(cpu)s,"\
+   " TRACE_%(id)s)"\
+   % dict(
+   cpu=trace_cpu,
+   id=e.original.event_exec.name.upper())
+else:
+cond = "true"
+
 out('%(name_trans)s(%(argnames_trans)s);',
-'gen_helper_%(name_exec)s(%(argnames_exec)s);',
+'if (%(cond)s) {',
+'gen_h

[Qemu-devel] [PATCH v9 6/7] trace: [tcg, trivial] Re-align generated code

2017-06-27 Thread Lluís Vilanova

Last patch removed a nesting level in generated code. Re-align all code
generated by backends to be 4-column aligned.

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool/backend/dtrace.py |4 ++--
 scripts/tracetool/backend/ftrace.py |   20 ++--
 scripts/tracetool/backend/log.py|   19 ++-
 scripts/tracetool/backend/simple.py |4 ++--
 scripts/tracetool/backend/syslog.py |6 +++---
 scripts/tracetool/backend/ust.py|4 ++--
 6 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/scripts/tracetool/backend/dtrace.py 
b/scripts/tracetool/backend/dtrace.py
index c469cbd1a3..c6812b70a2 100644
--- a/scripts/tracetool/backend/dtrace.py
+++ b/scripts/tracetool/backend/dtrace.py
@@ -6,7 +6,7 @@ DTrace/SystemTAP backend.
 """
 
 __author__ = "Lluís Vilanova "
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova "
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova "
 __license__= "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -46,6 +46,6 @@ def generate_h_begin(events, group):
 
 
 def generate_h(event, group):
-out('QEMU_%(uppername)s(%(argnames)s);',
+out('QEMU_%(uppername)s(%(argnames)s);',
 uppername=event.name.upper(),
 argnames=", ".join(event.args.names()))
diff --git a/scripts/tracetool/backend/ftrace.py 
b/scripts/tracetool/backend/ftrace.py
index db9fe7ad57..dd0eda4441 100644
--- a/scripts/tracetool/backend/ftrace.py
+++ b/scripts/tracetool/backend/ftrace.py
@@ -29,17 +29,17 @@ def generate_h(event, group):
 if len(event.args) > 0:
 argnames = ", " + argnames
 
-out('{',
-'char ftrace_buf[MAX_TRACE_STRLEN];',
-'int unused __attribute__ ((unused));',
-'int trlen;',
-'if (trace_event_get_state(%(event_id)s)) {',
-'trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
-' "%(name)s " %(fmt)s "\\n" 
%(argnames)s);',
-'trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
-'unused = write(trace_marker_fd, ftrace_buf, trlen);',
-'}',
+out('{',
+'char ftrace_buf[MAX_TRACE_STRLEN];',
+'int unused __attribute__ ((unused));',
+'int trlen;',
+'if (trace_event_get_state(%(event_id)s)) {',
+'trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
+' "%(name)s " %(fmt)s "\\n" 
%(argnames)s);',
+'trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
+'unused = write(trace_marker_fd, ftrace_buf, trlen);',
 '}',
+'}',
 name=event.name,
 args=event.args,
 event_id="TRACE_" + event.name.upper(),
diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py
index 4f4a4d38b1..54f0a69886 100644
--- a/scripts/tracetool/backend/log.py
+++ b/scripts/tracetool/backend/log.py
@@ -6,7 +6,7 @@ Stderr built-in backend.
 """
 
 __author__ = "Lluís Vilanova "
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova "
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova "
 __license__= "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -35,14 +35,15 @@ def generate_h(event, group):
 else:
 cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-out('if (%(cond)s) {',
-'struct timeval _now;',
-'gettimeofday(&_now, NULL);',
-'qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " 
%(fmt)s "\\n",',
-'  getpid(),',
-'  (size_t)_now.tv_sec, (size_t)_now.tv_usec',
-'  %(argnames)s);',
-'}',
+out('if (%(cond)s) {',
+'struct timeval _now;',
+'gettimeofday(&_now, NULL);',
+'qemu_log_mask(LOG_TRACE,',
+'  "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
+'  getpid(),',
+'  (size_t)_now.tv_sec, (size_t)_now.tv_usec',
+'  %(argnames)s);',
+'}',
 cond=cond,
 name=event.name,
 fmt=event.fmt.rstrip("\n"),
diff --git a/scripts/tracetool/backend/simple.py 
b/scripts/tracetool/backend/simple.py
index 4acc06e81c..f983670ee1 100644
--- a/scripts/tracetool/backend/simple.py
+++ b/scripts/tracetool/backend/simple.py
@@ -6,7 +6,7 @@ Simple built-in backend.
 """
 
 __author__ = "Lluís Vilanova "
-__copyright__  = "Copyright 2012-2014, Lluís Vilanova "
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova "
 __license__= "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi

Re: [Qemu-devel] [PATCH v5 3/3] migration: add bitmap for received page

2017-06-27 Thread Peter Xu

On Tue, Jun 27, 2017 at 05:50:27AM -0400, Alexey Perevalov wrote:

[...]

> @@ -60,6 +62,14 @@ static inline void *ramblock_ptr(RAMBlock *block, 
> ram_addr_t offset)
>  return (char *)block->host + offset;
>  }
>  
> +static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
> +RAMBlock *rb)
> +{
> +uint64_t host_addr_offset =
> +(uint64_t)(uintptr_t)(host_addr - (void *)rb->host);
> +return host_addr_offset >> TARGET_PAGE_BITS;
> +}
> +
>  long qemu_getrampagesize(void);
>  unsigned long last_ram_page(void);
>  RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
> diff --git a/migration/migration.c b/migration/migration.c
> index 71e38bc..53fbd41 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -143,6 +143,7 @@ MigrationIncomingState 
> *migration_incoming_get_current(void)
>  qemu_mutex_init(&mis_current.rp_mutex);
>  qemu_event_init(&mis_current.main_thread_load_event, false);
>  once = true;
> +ramblock_recv_map_init();

One tiny more comment: shall we init this at the beginning of incoming
migration? Maybe into migration_fd_process_incoming(), before entering
the coroutine?

Then, for the destruction of it below...

[...]

> @@ -2324,8 +2352,14 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
>  
>  static int ram_load_cleanup(void *opaque)
>  {
> +RAMBlock *rb;
>  xbzrle_load_cleanup();
>  compress_threads_load_cleanup();
> +
> +RAMBLOCK_FOREACH(rb) {
> +g_free(rb->receivedmap);
> +rb->receivedmap = NULL;
> +}

... maybe move to migration_incoming_state_destroy()?

And, I didn't really find ram_load_cleanup() in my repo. Am I missing
something?

Other than above, this patch looks good to me.  Thanks,

-- 
Peter Xu

[Qemu-devel] [PATCH v9 7/7] trace: [trivial] Statically enable all guest events

2017-06-27 Thread Lluís Vilanova

The optimizations of this series makes it feasible to have them
available on all builds.

Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input
(medium size - suns.pl) and the guest_mem_before event:

* vanilla, statically disabled
real0m2,259s
user0m2,252s
sys 0m0,004s

* vanilla, statically enabled (overhead: 2.18x)
real0m4,921s
user0m4,912s
sys 0m0,008s

* multi-tb, statically disabled (overhead: 0.99x) [within noise range]
real0m2,228s
user0m2,216s
sys 0m0,008s

* multi-tb, statically enabled (overhead: 0.99x) [within noise range]
real0m2,229s
user0m2,224s
sys 0m0,004s


Now enabling all events when booting an ARM system that immediately shuts down 
(https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg04085.html):

* vanilla, statically disabled
real0m32,153s
user0m31,276s
sys 0m0,108s

* vanilla, statically enabled (overhead: 1.35x)
real0m43,507s
user0m42,680s
sys 0m0,168s

* multi-tb, statically disabled (overhead: 1.03x)
real0m32,993s
user0m32,516s
sys 0m0,104s

* multi-tb, statically enabled (overhead: 1.00x) [within noise range]
real0m32,110s
user0m31,176s
sys 0m0,156s

Signed-off-by: Lluís Vilanova 
---
 trace-events |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/trace-events b/trace-events
index bae63fdb1d..f9dbd7f509 100644
--- a/trace-events
+++ b/trace-events
@@ -106,7 +106,7 @@ vcpu guest_cpu_reset(void)
 #
 # Mode: user, softmmu
 # Targets: TCG(all)
-disable vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", 
"vaddr=0x%016"PRIx64" info=%d"
+vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", 
"vaddr=0x%016"PRIx64" info=%d"
 
 # @num: System call number.
 # @arg*: System call argument value.
@@ -115,7 +115,7 @@ disable vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) 
"info=%d", "vaddr=0x
 #
 # Mode: user
 # Targets: TCG(all)
-disable vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, 
uint64_t arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6, uint64_t arg7, 
uint64_t arg8) "num=0x%016"PRIx64" arg1=0x%016"PRIx64" arg2=0x%016"PRIx64" 
arg3=0x%016"PRIx64" arg4=0x%016"PRIx64" arg5=0x%016"PRIx64" arg6=0x%016"PRIx64" 
arg7=0x%016"PRIx64" arg8=0x%016"PRIx64
+vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, uint64_t 
arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6, uint64_t arg7, uint64_t 
arg8) "num=0x%016"PRIx64" arg1=0x%016"PRIx64" arg2=0x%016"PRIx64" 
arg3=0x%016"PRIx64" arg4=0x%016"PRIx64" arg5=0x%016"PRIx64" arg6=0x%016"PRIx64" 
arg7=0x%016"PRIx64" arg8=0x%016"PRIx64
 
 # @num: System call number.
 # @ret: System call result value.
@@ -124,4 +124,4 @@ disable vcpu guest_user_syscall(uint64_t num, uint64_t 
arg1, uint64_t arg2, uint
 #
 # Mode: user
 # Targets: TCG(all)
-disable vcpu guest_user_syscall_ret(uint64_t num, uint64_t ret) 
"num=0x%016"PRIx64" ret=0x%016"PRIx64
+vcpu guest_user_syscall_ret(uint64_t num, uint64_t ret) "num=0x%016"PRIx64" 
ret=0x%016"PRIx64

Re: [Qemu-devel] [RFC 0/5] vfio: Introduce Live migration capability to vfio_mdev device

2017-06-27 Thread Dr. David Alan Gilbert

* Yulei Zhang (yulei.zh...@intel.com) wrote:
> Summary
> 
> This series RFC would like to introduce the live migration capability
> to vfio_mdev device. 
> 
> As currently vfio_mdev device don't support migration, we introduce a
> device flag VFIO_DEVICE_FLAGS_MIGRATABLE to help determine whether the
> mdev device can be migrate or not, it will check the flag during the 
> device initialization and decide to init the new vfio region 
> VFIO_PCI_DEVICE_STATE_REGION_INDEX. 
> 
> The intention to add the new region is using it for vfio_mdev device
> status save and restore during the migration. The access to this region
> will be trapped and forward to the vfio_mdev device driver. There is 
> an alternative way to achieve it is to add a new vfio ioctl to help fetch
> and save the device status.
> 
> Also this series include two new vfio ioctl 
> #define VFIO_DEVICE_PCI_STATUS_SET_IO(VFIO_TYPE, VFIO_BASE + 14)
> #define VFIO_DEVICE_PCI_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 15)
> 
> The first one is used to contorl the device running status, we want to
> stop the mdev device before quary the status from its device driver and
> restart the device after migration.
> The second one is used to do the mdev device dirty page synchronization.
> 
> So the vfio_mdev device migration sequence would be
> Source VM side:
>   start migration
>   |
>   V
>get the cpu state change callback
>   use status set ioctl to stop the mdev device
>   |
>   V
>save the deivce status into Qemufile which is 
>  read from the new vfio device status region
>   |
>   V
>  quary the dirty page bitmap from deivce
>   and add into qemu dirty list for sync

That ordering is interesting; I think the main migration flow
is normally to complete migration of RAM and then migrate the
devices; so I worry about that order.

Dave

> Target VM side:
>  restore the mdev device after get the
>saved status context from Qemufile
>   |
>   V
>get the cpu state change callback
>use status set ioctl to start the mdev 
>device to put it in running status
>   |
>   V
>   finish migration
> 
> Yulei Zhang (5):
>   vfio: introduce a new VFIO region for migration support
>   vfio: Add struct vfio_vmstate_info to introduce vfio device put/get
> funtion
>   vfio: introduce new VFIO ioctl VFIO_DEVICE_PCI_STATUS_SET
>   vfio: use vfio_device_put/vfio_device_get for device status
> save/restore
>   vifo: introduce new VFIO ioctl VFIO_DEVICE_PCI_GET_DIRTY_BITMAP
> 
>  hw/vfio/pci.c  | 204 
> -
>  hw/vfio/pci.h  |   3 +
>  linux-headers/linux/vfio.h |  34 +++-
>  3 files changed, 239 insertions(+), 2 deletions(-)
> 
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v5 3/3] migration: add bitmap for received page

2017-06-27 Thread Alexey

On Tue, Jun 27, 2017 at 06:17:40PM +0800, Peter Xu wrote:
> On Tue, Jun 27, 2017 at 05:50:27AM -0400, Alexey Perevalov wrote:
> 
> [...]
> 
> > @@ -60,6 +62,14 @@ static inline void *ramblock_ptr(RAMBlock *block, 
> > ram_addr_t offset)
> >  return (char *)block->host + offset;
> >  }
> >  
> > +static inline unsigned long int ramblock_recv_bitmap_offset(void 
> > *host_addr,
> > +RAMBlock *rb)
> > +{
> > +uint64_t host_addr_offset =
> > +(uint64_t)(uintptr_t)(host_addr - (void *)rb->host);
> > +return host_addr_offset >> TARGET_PAGE_BITS;
> > +}
> > +
> >  long qemu_getrampagesize(void);
> >  unsigned long last_ram_page(void);
> >  RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 71e38bc..53fbd41 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -143,6 +143,7 @@ MigrationIncomingState 
> > *migration_incoming_get_current(void)
> >  qemu_mutex_init(&mis_current.rp_mutex);
> >  qemu_event_init(&mis_current.main_thread_load_event, false);
> >  once = true;
> > +ramblock_recv_map_init();
> 
> One tiny more comment: shall we init this at the beginning of incoming
> migration? Maybe into migration_fd_process_incoming(), before entering
> the coroutine?
maybe this function (migration_incoming_get_current) is not best place
to initialize something in ramblock list from point of
view maintainability.
> 
> Then, for the destruction of it below...
> 
> [...]
> 
> > @@ -2324,8 +2352,14 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
> >  
> >  static int ram_load_cleanup(void *opaque)
> >  {
> > +RAMBlock *rb;
> >  xbzrle_load_cleanup();
> >  compress_threads_load_cleanup();
> > +
> > +RAMBLOCK_FOREACH(rb) {
> > +g_free(rb->receivedmap);
> > +rb->receivedmap = NULL;
> > +}
> 
> ... maybe move to migration_incoming_state_destroy()?
I'll think about it, because ram_load_cleanup in current Juan's
patch set is not calling in postcopy scenario.

> 
> And, I didn't really find ram_load_cleanup() in my repo. Am I missing
> something?
you need Juan's [PATCH v2 0/5] Create setup/cleanup methods for
migration incoming side

> 
> Other than above, this patch looks good to me.  Thanks,
> 
> -- 
> Peter Xu
> 

-- 

BR
Alexey

Re: [Qemu-devel] [RFC 4/5] vfio: use vfio_device_put/vfio_device_get for device status save/restore

2017-06-27 Thread Dr. David Alan Gilbert

* Yulei Zhang (yulei.zh...@intel.com) wrote:
> For VFIO pci device status migrate, on the source side with
> funtion vfio_device_put to save the following states
> 1. pci configuration space addr0~addr5
> 2. pci configuration space msi_addr msi_data
> 3. pci device status fetch from device driver
> 
> And on the target side with funtion vfio_device_get to restore
> the same states
> 1. re-setup the pci bar configuration
> 2. re-setup the pci device msi configuration
> 3. restore the pci device status
> 
> Signed-off-by: Yulei Zhang 
> ---
>  hw/vfio/pci.c | 105 
> +-
>  1 file changed, 104 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 605a473..833cd90 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2961,18 +2961,121 @@ static void vfio_vm_change_state_handler(void *pv, 
> int running, RunState state)
>  vfio_status->flags = running ? VFIO_DEVICE_PCI_START :
> VFIO_DEVICE_PCI_STOP;
>  
> -ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_STATUS_SET, vfio_status);
> +if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_STATUS_SET, vfio_status)) {
> +error_report("vfio: Failed to %s device\n", running ? "start" : 
> "stop");
> +}
>  g_free(vfio_status);
>  }
>  
>  static int vfio_device_put(QEMUFile *f, void *pv, size_t size, VMStateField 
> *field,
>  QJSON *vmdesc)
>  {
> +VFIOPCIDevice *vdev = pv;
> +PCIDevice *pdev = &vdev->pdev;
> +VFIORegion *region = &vdev->device_state.region;
> +int sz = region->size;
> +uint8_t *buf = NULL;
> +uint32_t msi_cfg, msi_lo, msi_hi, msi_data, bar_cfg, i;
> +bool msi_64bit;


Please use vmstate rather than qemu_get/qemu_put - we're trying to
get rid of all the qemu_put/qemu_get throughout the devices.
Probably using a pre_save/post_load is the right way to then tie the
data you've loaded to call the code that uploads it to the device.

> +for (i = 0; i < PCI_ROM_SLOT; i++) {
> +bar_cfg = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i*4, 4);
> +qemu_put_be32(f, bar_cfg);
> +}

> +msi_cfg = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 
> 2);
> +msi_64bit = !!(msi_cfg & PCI_MSI_FLAGS_64BIT);
> +
> +msi_lo = pci_default_read_config(pdev, pdev->msi_cap + 
> PCI_MSI_ADDRESS_LO, 4);
> +qemu_put_be32(f, msi_lo);
> +
> +if (msi_64bit) {
> +msi_hi = pci_default_read_config(pdev, pdev->msi_cap + 
> PCI_MSI_ADDRESS_HI, 4);
> +qemu_put_be32(f, msi_hi);
> +}
> +
> +msi_data = pci_default_read_config(pdev,
> + pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
> PCI_MSI_DATA_32), 2);
> +qemu_put_be32(f, msi_data);

Isn't all this stuff standard PCI config data that's already migrated?

> +buf = g_malloc(sz);
> +if (buf == NULL) {
> +error_report("vfio: Failed to allocate memory for migrate\n");
> +goto exit;
> +}

g_malloc asserts rather than returns NULL if it fails to allocate.

> +if (pread(vdev->vbasedev.fd, buf, sz, region->fd_offset) != sz) {
> +error_report("vfio: Failed to read Device State Region\n");

Note error_report's shouldn't have \n's

> +goto exit;
> +}
> +
> +qemu_put_buffer(f, buf, sz);

OK, so this is an opaque blob coming from the device.  Hmm.
Is it versioned?  Does it really need to be a closed blob?

Dave

> +exit:
> +if (buf)
> +g_free(buf);
> +
>  return 0;
>  }
>  
>  static int vfio_device_get(QEMUFile *f, void *pv, size_t size, VMStateField 
> *field)
>  {
> +VFIOPCIDevice *vdev = pv;
> +PCIDevice *pdev = &vdev->pdev;
> +VFIORegion *region = &vdev->device_state.region;
> +int sz = region->size;
> +uint8_t *buf = NULL;
> +uint32_t ctl, msi_lo, msi_hi, msi_data, bar_cfg, i;
> +bool msi_64bit;
> +
> +/* retore pci bar configuration */
> +ctl = pci_default_read_config(pdev, PCI_COMMAND, 2);
> +vfio_pci_write_config(pdev, PCI_COMMAND,
> +  ctl & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
> +for (i = 0; i < PCI_ROM_SLOT; i++) {
> +bar_cfg = qemu_get_be32(f);
> +vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i*4, bar_cfg, 4);
> +}
> +vfio_pci_write_config(pdev, PCI_COMMAND,
> +  ctl | PCI_COMMAND_IO | PCI_COMMAND_MEMORY, 2);
> +
> +/* restore msi configuration */
> +ctl = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 2);
> +msi_64bit = !!(ctl & PCI_MSI_FLAGS_64BIT);
> +
> +vfio_pci_write_config(&vdev->pdev,
> +  pdev->msi_cap + PCI_MSI_FLAGS,
> +  ctl & (!PCI_MSI_FLAGS_ENABLE), 2);
> +
> +msi_lo = qemu_get_be32(f);
> +vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_LO, msi_lo, 
> 4);
> +
> +if (msi_64bit) {
> +msi_hi = qemu_get_be32(f);
> +vfio_pci_write_

Re: [Qemu-devel] [PATCH v5 3/3] migration: add bitmap for received page

2017-06-27 Thread Juan Quintela

Peter Xu  wrote:
> On Tue, Jun 27, 2017 at 05:50:27AM -0400, Alexey Perevalov wrote:
>
> [...]
>
>> @@ -60,6 +62,14 @@ static inline void *ramblock_ptr(RAMBlock *block, 
>> ram_addr_t offset)
>>  return (char *)block->host + offset;
>>  }
>>  
>> +static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
>> +RAMBlock *rb)
>> +{
>> +uint64_t host_addr_offset =
>> +(uint64_t)(uintptr_t)(host_addr - (void *)rb->host);
>> +return host_addr_offset >> TARGET_PAGE_BITS;
>> +}
>> +
>>  long qemu_getrampagesize(void);
>>  unsigned long last_ram_page(void);
>>  RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 71e38bc..53fbd41 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -143,6 +143,7 @@ MigrationIncomingState 
>> *migration_incoming_get_current(void)
>>  qemu_mutex_init(&mis_current.rp_mutex);
>>  qemu_event_init(&mis_current.main_thread_load_event, false);
>>  once = true;
>> +ramblock_recv_map_init();
>
> One tiny more comment: shall we init this at the beginning of incoming
> migration? Maybe into migration_fd_process_incoming(), before entering
> the coroutine?
>
> Then, for the destruction of it below...
>
> [...]
>
>> @@ -2324,8 +2352,14 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
>>  
>>  static int ram_load_cleanup(void *opaque)
>>  {
>> +RAMBlock *rb;
>>  xbzrle_load_cleanup();
>>  compress_threads_load_cleanup();
>> +
>> +RAMBLOCK_FOREACH(rb) {
>> +g_free(rb->receivedmap);
>> +rb->receivedmap = NULL;
>> +}
>
> ... maybe move to migration_incoming_state_destroy()?
>
> And, I didn't really find ram_load_cleanup() in my repo. Am I missing
> something?

On top of my load_setup patches.

>
> Other than above, this patch looks good to me.  Thanks,

Later, Juan.

Re: [Qemu-devel] [Qemu-block] [PATCH] docs: add qemu-block-drivers(7) man page

2017-06-27 Thread Stefan Hajnoczi

On Mon, Jun 26, 2017 at 04:40:58PM -0400, John Snow wrote:
> 
> 
> On 06/22/2017 08:17 AM, Stefan Hajnoczi wrote:
> > Block driver documentation is available in qemu-doc.html.  It would be
> > convenient to have documentation for formats, protocols, and filter
> > drivers in a man page.
> > 
> > Extract the relevant part of qemu-doc.html into a new file called
> > docs/qemu-block-drivers.texi.  This file can also be built as a
> > stand-alone document (man, html, etc).
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> Thanks, this is really useful information to have. Looks good to me, but
> Paolo's changes prevent it from applying now.
> 
> So, as of last week:
> 
> Reviewed-by: John Snow 

Thanks for the reminder.  Will rebase and resend.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Intel-gfx][RFC 9/9] drm/i915/gvt: Add support to VFIO region VFIO_PCI_DEVICE_STATE_REGION_INDEX

2017-06-27 Thread Dr. David Alan Gilbert

* Yulei Zhang (yulei.zh...@intel.com) wrote:
> Add new VFIO region VFIO_PCI_DEVICE_STATE_REGION_INDEX support in vGPU, 
> through
> this new region it can fetch the status from mdev device for migration, on
> the target side it can retrieve the device status and reconfigure the device 
> to
> continue running after resume the guest.
> 
> Signed-off-by: Yulei Zhang 

This is a HUGE patch.
I can't really tell how it wires into the rest of migration.
It would probably be best to split it up into cunks
to make it easier to review.

Dave

> ---
>  drivers/gpu/drm/i915/gvt/Makefile  |   2 +-
>  drivers/gpu/drm/i915/gvt/gvt.c |   1 +
>  drivers/gpu/drm/i915/gvt/gvt.h |   5 +
>  drivers/gpu/drm/i915/gvt/kvmgt.c   |  19 +
>  drivers/gpu/drm/i915/gvt/migrate.c | 715 
> +
>  drivers/gpu/drm/i915/gvt/migrate.h |  82 +
>  drivers/gpu/drm/i915/gvt/mmio.c|  14 +
>  drivers/gpu/drm/i915/gvt/mmio.h|   1 +
>  include/uapi/linux/vfio.h  |   3 +-
>  9 files changed, 840 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gvt/migrate.c
>  create mode 100644 drivers/gpu/drm/i915/gvt/migrate.h
> 
> diff --git a/drivers/gpu/drm/i915/gvt/Makefile 
> b/drivers/gpu/drm/i915/gvt/Makefile
> index f5486cb9..a7e2e34 100644
> --- a/drivers/gpu/drm/i915/gvt/Makefile
> +++ b/drivers/gpu/drm/i915/gvt/Makefile
> @@ -1,7 +1,7 @@
>  GVT_DIR := gvt
>  GVT_SOURCE := gvt.o aperture_gm.o handlers.o vgpu.o trace_points.o 
> firmware.o \
>   interrupt.o gtt.o cfg_space.o opregion.o mmio.o display.o edid.o \
> - execlist.o scheduler.o sched_policy.o render.o cmd_parser.o
> + execlist.o scheduler.o sched_policy.o render.o cmd_parser.o migrate.o
>  
>  ccflags-y+= -I$(src) -I$(src)/$(GVT_DIR)
>  i915-y   += $(addprefix $(GVT_DIR)/, 
> $(GVT_SOURCE))
> diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c
> index c27c683..e40af70 100644
> --- a/drivers/gpu/drm/i915/gvt/gvt.c
> +++ b/drivers/gpu/drm/i915/gvt/gvt.c
> @@ -54,6 +54,7 @@ static const struct intel_gvt_ops intel_gvt_ops = {
>   .vgpu_reset = intel_gvt_reset_vgpu,
>   .vgpu_activate = intel_gvt_activate_vgpu,
>   .vgpu_deactivate = intel_gvt_deactivate_vgpu,
> + .vgpu_save_restore = intel_gvt_save_restore,
>  };
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
> index 23eeb7c..12aa3b8 100644
> --- a/drivers/gpu/drm/i915/gvt/gvt.h
> +++ b/drivers/gpu/drm/i915/gvt/gvt.h
> @@ -46,6 +46,7 @@
>  #include "sched_policy.h"
>  #include "render.h"
>  #include "cmd_parser.h"
> +#include "migrate.h"
>  
>  #define GVT_MAX_VGPU 8
>  
> @@ -431,6 +432,8 @@ void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, 
> bool dmlr,
>  void intel_gvt_reset_vgpu(struct intel_vgpu *vgpu);
>  void intel_gvt_activate_vgpu(struct intel_vgpu *vgpu);
>  void intel_gvt_deactivate_vgpu(struct intel_vgpu *vgpu);
> +int intel_gvt_save_restore(struct intel_vgpu *vgpu, char *buf,
> + size_t count, uint64_t off, bool restore);
>  
>  /* validating GM functions */
>  #define vgpu_gmadr_is_aperture(vgpu, gmadr) \
> @@ -513,6 +516,8 @@ struct intel_gvt_ops {
>   void (*vgpu_reset)(struct intel_vgpu *);
>   void (*vgpu_activate)(struct intel_vgpu *);
>   void (*vgpu_deactivate)(struct intel_vgpu *);
> + int  (*vgpu_save_restore)(struct intel_vgpu *, char *buf,
> +   size_t count, uint64_t off, bool restore);
>  };
>  
>  
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index e9f11a9..d4ede29 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -670,6 +670,9 @@ static ssize_t intel_vgpu_rw(struct mdev_device *mdev, 
> char *buf,
>   bar0_start + pos, buf, count);
>   }
>   break;
> + case VFIO_PCI_DEVICE_STATE_REGION_INDEX:
> + ret = intel_gvt_ops->vgpu_save_restore(vgpu, buf, count, pos, 
> is_write);
> + break;
>   case VFIO_PCI_BAR2_REGION_INDEX:
>   case VFIO_PCI_BAR3_REGION_INDEX:
>   case VFIO_PCI_BAR4_REGION_INDEX:
> @@ -688,6 +691,10 @@ static ssize_t intel_vgpu_read(struct mdev_device *mdev, 
> char __user *buf,
>  {
>   unsigned int done = 0;
>   int ret;
> + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +
> + if (index == VFIO_PCI_DEVICE_STATE_REGION_INDEX)
> + return intel_vgpu_rw(mdev, (char *)buf, count, ppos, false);
>  
>   while (count) {
>   size_t filled;
> @@ -748,6 +755,10 @@ static ssize_t intel_vgpu_write(struct mdev_device *mdev,
>  {
>   unsigned int done = 0;
>   int ret;
> + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +
> + if (index == VFIO_PCI_DEVICE_STATE_REGION_INDEX)
> + return intel_vgpu_rw(mdev, (char *)buf, count, pp

Re: [Qemu-devel] [PATCH 1/5] virtio-pci: use ioeventfd even when KVM is disabled

2017-06-27 Thread Kevin Wolf

Am 27.06.2017 um 10:43 hat Fam Zheng geschrieben:
> On Thu, 06/15 17:38, Stefan Hajnoczi wrote:
> > Old kvm.ko versions only supported a tiny number of ioeventfds so
> > virtio-pci avoids ioeventfds when kvm_has_many_ioeventfds() returns 0.
> > 
> > Do not check kvm_has_many_ioeventfds() when KVM is disabled since it
> > always returns 0.  Since commit 8c56c1a592b5092d91da8d8943c1d6462a6f
> > ("memory: emulate ioeventfd") it has been possible to use ioeventfds in
> > qtest or TCG mode.
> > 
> > This patch makes -device virtio-blk-pci,iothread=iothread0 work even
> > when KVM is disabled.
> > 
> > I have tested that virtio-blk-pci works under TCG both with and without
> > iothread.
> > 
> > Cc: Michael S. Tsirkin 
> > Signed-off-by: Stefan Hajnoczi 
> 
> This one was dropped out from Kevin's pull request but the iotest case
> update on 068 which depends on it is merged. Now the test fails for
> me:

Whoops, sorry about that. Anyway, I think if we can, the way to fix it
is to find out why this patch is failing qtest, and merge a fixed v2,
rather than reverting the test cases.

Stefan, can you reproduce the failure?

Kevin

> 068 2s ... - output mismatch (see 068.out.bad)
> --- /stor/work/qemu/tests/qemu-iotests/068.out2017-06-27 
> 16:22:55.003815188 +0800
> +++ 068.out.bad   2017-06-27 16:41:37.903626275 +0800
> @@ -12,9 +12,8 @@
>  === Saving and reloading a VM state to/from a qcow2 image (-object 
> iothread,id=iothread0 -set device.hba0.iothread=iothread0) ===
>  
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
> +qemu-system-x86_64: -device virtio-scsi-pci,id=hba0: ioeventfd is required 
> for iothread
>  QEMU X.Y.Z monitor - type 'help' for more information
> -(qemu) savevm 0
> -(qemu) quit
> +(qemu) qemu-system-x86_64: -device virtio-scsi-pci,id=hba0: ioeventfd is 
> required for iothread
>  QEMU X.Y.Z monitor - type 'help' for more information
> -(qemu) quit
> -*** done
> +(qemu) *** done
> Failures: 068
> Failed 1 of 1 tests
> 
> Fam

Re: [Qemu-devel] [PATCH v6 06/10] migration: move only_migratable to MigrationState

2017-06-27 Thread Eric Blake

On 06/26/2017 11:10 PM, Peter Xu wrote:
> One less global variable, and it does only matter with migration.
> 
> We keep the old "--only-migratable" option, but also now we support:
> 
>   -global migration.only-migratable=true

Does command line introspection work to see that this new spelling is
available?  If not, what else can we do to let libvirt learn that the
new form is preferred?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v9 04/26] target: [tcg] Add generic translation framework

2017-06-27 Thread Lluís Vilanova

Emilio G Cota writes:

> On Sun, Jun 25, 2017 at 11:59:54 +0300, Lluís Vilanova wrote:
>> Signed-off-by: Lluís Vilanova 
>> ---
>> Makefile.target|1 
>> include/exec/gen-icount.h  |2 
>> include/exec/translate-block.h |  125 +++
>> include/qom/cpu.h  |   22 +
>> translate-block.c  |  185 
>> 
>> 5 files changed, 334 insertions(+), 1 deletion(-)
>> create mode 100644 include/exec/translate-block.h
>> create mode 100644 translate-block.c
> (snip)
>> diff --git a/include/exec/translate-block.h b/include/exec/translate-block.h
>> new file mode 100644
>> index 00..d14d23f2cb
>> --- /dev/null
>> +++ b/include/exec/translate-block.h
> (snip)
>> +/**
>> + * DisasJumpType:
>> + * @DJ_NEXT: Next instruction in program order.
>> + * @DJ_TOO_MANY: Too many instructions translated.
>> + * @DJ_TARGET: Start of target-specific conditions.
>> + *
>> + * What instruction to disassemble next.
>> + */
>> +typedef enum DisasJumpType {
>> +DJ_NEXT,
>> +DJ_TOO_MANY,
>> +DJ_TARGET,
>> +} DisasJumpType;

> I'd give up on the enum to avoid unnecessary casts. Just define DJ_TARGET
> (or rather, DISAS_TARGET :>) and let the architecture code add more define's
> using it.

I'm all for restoring the original name (haven't checked if it will produce any
redefine errors).

But using an enum makes the API more explicit about the intended values. Still,
if the churn of casting outweighs the API clarity, I can revert this.

Another option previously suggested on the list is defining DISAS_TARGET_[0..N]
on the enum, and letting targets simply define their own name when mapped to
those. I'll try that one before completely dropping the enum. That is, unless
someone is strongly for going back to defines.

Cheers,
  Lluis

Re: [Qemu-devel] [PATCH 3/5] qemu-iotests: 068: extract _qemu() function

2017-06-27 Thread Eric Blake

On 06/15/2017 11:38 AM, Stefan Hajnoczi wrote:
> Avoid duplicating the QEMU command-line.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  tests/qemu-iotests/068 | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 

> +# Give qemu some time to boot before saving the VM state
> +bash -c 'sleep 1; echo -e "savevm 0\nquit"' | _qemu

Are we sure that 'bash' on PATH is the same as the /bin/bash running the
script?

Also, while bash has more deterministic behavior for 'echo -e' (it is
non-portable if you are porting a script to other shells), it is still
possible to set bash to a mode where it does not work (see xhopt
xpg_echo).  I'd much prefer you use 'printf' instead of 'set -e'.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 3/5] qemu-iotests: 068: extract _qemu() function

2017-06-27 Thread Eric Blake

On 06/27/2017 06:40 AM, Eric Blake wrote:
> On 06/15/2017 11:38 AM, Stefan Hajnoczi wrote:
>> Avoid duplicating the QEMU command-line.
>>
>> Signed-off-by: Stefan Hajnoczi 
>> ---
>>  tests/qemu-iotests/068 | 13 -
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
> 
>> +# Give qemu some time to boot before saving the VM state
>> +bash -c 'sleep 1; echo -e "savevm 0\nquit"' | _qemu
> 
> Are we sure that 'bash' on PATH is the same as the /bin/bash running the
> script?
> 
> Also, while bash has more deterministic behavior for 'echo -e' (it is
> non-portable if you are porting a script to other shells), it is still
> possible to set bash to a mode where it does not work (see xhopt

shopt (I can't type first thing in the morning)

> xpg_echo).  I'd much prefer you use 'printf' instead of 'set -e'.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [RFC PATCH 02/14] pc-bios/s390-ccw: Start using the libc from SLOF

2017-06-27 Thread Thomas Huth

Change the Makefiles to make the libc compilable within the
s390-ccw firmware build system, link it and start using it by
switching the implementations of the memset() and memcpy()
functions to the ones from the libc.

Signed-off-by: Thomas Huth 
---
 configure |  5 ++--
 pc-bios/s390-ccw/Makefile |  8 ++-
 pc-bios/s390-ccw/bootmap.h|  1 +
 pc-bios/s390-ccw/libc/Makefile| 38 ++-
 pc-bios/s390-ccw/libc/ctype/Makefile.inc  |  3 ++-
 pc-bios/s390-ccw/libc/stdio/Makefile.inc  |  9 
 pc-bios/s390-ccw/libc/stdlib/Makefile.inc |  3 ++-
 pc-bios/s390-ccw/libc/string/Makefile.inc |  3 ++-
 pc-bios/s390-ccw/s390-ccw.h   | 30 +++-
 9 files changed, 37 insertions(+), 63 deletions(-)

diff --git a/configure b/configure
index c571ad1..954c286 100755
--- a/configure
+++ b/configure
@@ -6377,7 +6377,8 @@ fi
 # build tree in object directory in case the source is not in the current 
directory
 DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32 tests/libqos 
tests/qapi-schema tests/tcg/xtensa tests/qemu-iotests"
 DIRS="$DIRS docs docs/interop fsdev"
-DIRS="$DIRS pc-bios/optionrom pc-bios/spapr-rtas pc-bios/s390-ccw"
+DIRS="$DIRS pc-bios/optionrom pc-bios/spapr-rtas"
+DIRS="$DIRS pc-bios/s390-ccw pc-bios/s390-ccw/libc"
 DIRS="$DIRS roms/seabios roms/vgabios"
 DIRS="$DIRS qapi-generated"
 FILES="Makefile tests/tcg/Makefile qdict-test-data.txt"
@@ -6385,7 +6386,7 @@ FILES="$FILES tests/tcg/cris/Makefile 
tests/tcg/cris/.gdbinit"
 FILES="$FILES tests/tcg/lm32/Makefile tests/tcg/xtensa/Makefile po/Makefile"
 FILES="$FILES pc-bios/optionrom/Makefile pc-bios/keymaps"
 FILES="$FILES pc-bios/spapr-rtas/Makefile"
-FILES="$FILES pc-bios/s390-ccw/Makefile"
+FILES="$FILES pc-bios/s390-ccw/Makefile pc-bios/s390-ccw/libc/Makefile"
 FILES="$FILES roms/seabios/Makefile roms/vgabios/Makefile"
 FILES="$FILES pc-bios/qemu-icon.bmp"
 FILES="$FILES .gdbinit scripts" # scripts needed by relative path in .gdbinit
diff --git a/pc-bios/s390-ccw/Makefile b/pc-bios/s390-ccw/Makefile
index fb88c13..3371c5b 100644
--- a/pc-bios/s390-ccw/Makefile
+++ b/pc-bios/s390-ccw/Makefile
@@ -7,12 +7,14 @@ include $(SRC_PATH)/rules.mak
 
 $(call set-vpath, $(SRC_PATH)/pc-bios/s390-ccw)
 
-.PHONY : all clean build-all
+.PHONY : all clean build-all libc.a
 
 OBJECTS = start.o main.o bootmap.o sclp.o virtio.o virtio-scsi.o
+OBJECTS += libc.a
 QEMU_CFLAGS := $(filter -W%, $(QEMU_CFLAGS))
 QEMU_CFLAGS += -ffreestanding -fno-delete-null-pointer-checks -msoft-float
 QEMU_CFLAGS += -march=z900 -fPIE -fno-strict-aliasing
+QEMU_CFLAGS += -I$(SRC_PATH)/pc-bios/s390-ccw/libc/include
 QEMU_CFLAGS += $(call cc-option, $(QEMU_CFLAGS), -fno-stack-protector)
 LDFLAGS += -Wl,-pie -nostdlib
 
@@ -21,6 +23,9 @@ build-all: s390-ccw.img
 s390-ccw.elf: $(OBJECTS)
$(call quiet-command,$(CC) $(LDFLAGS) -o $@ 
$(OBJECTS),"BUILD","$(TARGET_DIR)$@")
 
+libc.a:
+   @$(MAKE) -C libc V="$(V)"
+
 STRIP ?= strip
 
 s390-ccw.img: s390-ccw.elf
@@ -30,3 +35,4 @@ $(OBJECTS): Makefile
 
 clean:
rm -f *.o *.d *.img *.elf *~
+   @$(MAKE) -C libc clean
diff --git a/pc-bios/s390-ccw/bootmap.h b/pc-bios/s390-ccw/bootmap.h
index 7f36782..608bb5d 100644
--- a/pc-bios/s390-ccw/bootmap.h
+++ b/pc-bios/s390-ccw/bootmap.h
@@ -11,6 +11,7 @@
 #ifndef _PC_BIOS_S390_CCW_BOOTMAP_H
 #define _PC_BIOS_S390_CCW_BOOTMAP_H
 
+#include 
 #include "s390-ccw.h"
 #include "virtio.h"
 
diff --git a/pc-bios/s390-ccw/libc/Makefile b/pc-bios/s390-ccw/libc/Makefile
index 0c762ec..12f57e8 100644
--- a/pc-bios/s390-ccw/libc/Makefile
+++ b/pc-bios/s390-ccw/libc/Makefile
@@ -10,52 +10,38 @@
 # * IBM Corporation - initial implementation
 # /
 
-TOPCMNDIR ?= ../..
+include ../../../config-host.mak
+include $(SRC_PATH)/rules.mak
 
-LIBCCMNDIR = $(shell pwd)
+LIBCCMNDIR = $(SRC_PATH)/pc-bios/s390-ccw/libc
 STRINGCMNDIR = $(LIBCCMNDIR)/string
 CTYPECMNDIR = $(LIBCCMNDIR)/ctype
 STDLIBCMNDIR = $(LIBCCMNDIR)/stdlib
 STDIOCMNDIR = $(LIBCCMNDIR)/stdio
-GETOPTCMNDIR = $(LIBCCMNDIR)/getopt
 
-include $(TOPCMNDIR)/make.rules
-
-
-CPPFLAGS = -I$(LIBCCMNDIR)/include
-LDFLAGS= -nostdlib
+QEMU_CFLAGS := $(filter -W%, $(QEMU_CFLAGS))
+QEMU_CFLAGS += -ffreestanding -fno-delete-null-pointer-checks -msoft-float
+QEMU_CFLAGS += -march=z900 -fPIE -fno-strict-aliasing
+QEMU_CFLAGS += $(call cc-option, $(QEMU_CFLAGS), -fno-stack-protector)
+QEMU_CFLAGS += -I$(LIBCCMNDIR)/include
+LDFLAGS += -Wl,-pie -nostdlib
 
 TARGET = ../libc.a
 
-
 all: $(TARGET)
 
-# Use the following target to build a native version of the lib
-# (for example for debugging purposes):
-native:
-   $(MAKE) CROSS="" CC=$(HOSTCC) NATIVEBUILD=1
-
-
 include $(STRINGCMNDIR)/Makefile.inc
 include $(CTYPECMNDIR)/Makefile.inc
 include $(STDLIBCMNDIR)/Makefile.inc
 include $(STDIOCMNDIR)/Makefile.inc
-include $(GETOPTCMNDIR)/Makefile.inc
-
-OBJS

[Qemu-devel] [RFC PATCH 00/14] Implement network booting directly into the s390-ccw BIOS

2017-06-27 Thread Thomas Huth

It's already possible to do a network boot of an s390x guest with an
external netboot image (based on a Linux installation), but it would
be much more convenient if the s390-ccw firmware supported network
booting right out of the box, without the need to assemble such an
external image first.

This patch series now introduces network booting via DHCP and TFTP
directly into the s390-ccw firmware by re-using the networking stack
from the SLOF firmware (see https://github.com/aik/SLOF/ for details),
and adds a driver for virtio-net-ccw devices.

Once the patches have been applied, you can download an .INS file
via TFTP which contains the information about the further files
that have to be loaded - kernel, initrd, etc. For example, you can
use the built-in TFTP and DHCP server of QEMU for this by starting
QEMU with:

 qemu-system-s390x ... -device virtio-net,netdev=n1,bootindex=1 \
   -netdev user,id=n1,tftp=/path/to/tftp,bootfile=generic.ins

The .INS file has to use the same syntax as the .INS files that can
already be found on s390x Linux distribution installation CD-ROMs.

The patches are still in a rough shape, but before I continue here,
I though I'd get some feedback first. Specifically:

- This adds a lot of additional code to the s390-ccw firmware (and
  the binary is afterwards three times as big as before, 75k instead
  of 25k) ... is that still acceptable?

- Is it OK to require loading an .INS file first? Or does anybody
  have a better idea how to load multiple files (kernel, initrd,
  etc. ...)?

- The code from SLOF uses a different coding style (TABs instead
  of space) ... is it OK to keep that coding style here so we
  can share patches between SLOF and s390-ccw more easily?

- The code only supports TFTP (via UDP) ... I think that is OK for
  most use-cases, but if we ever want to support network booting
  via HTTP or something else that is based on TCP, we would need to
  use something else instead... Should we maybe rather head towards
  grub2, petitboot or something different instead?

 Thomas


Thomas Huth (14):
  pc-bios/s390-ccw: Add the libc from the SLOF firmware
  pc-bios/s390-ccw: Start using the libc from SLOF
  pc-bios/s390-ccw: Add a write() function for stdio
  pc-bios/s390-ccw: Add implementation of sbrk()
  pc-bios/s390-ccw: Add the TFTP network loading stack from SLOF
  libnet: Remove remainders of netsave code
  libnet: Rework error message printing
  libnet: Refactor some code of netload() into a separate function
  pc-bios/s390-ccw: Make the basic libnet code compilable
  pc-bios/s390-ccw: Add timer code for the libnet
  pc-bios/s390-ccw: Add virtio-net driver code
  pc-bios/s390-ccw: Load file via an intermediate .INS file
  pc-bios/s390-ccw: Allow loading to address 0
  pc-bios/s390-ccw: Wire up the netload code

 configure  |   6 +-
 hw/s390x/ipl.c |   2 +-
 pc-bios/s390-ccw/Makefile  |  14 +-
 pc-bios/s390-ccw/bootmap.c |  10 +-
 pc-bios/s390-ccw/bootmap.h |   1 +
 pc-bios/s390-ccw/libc/Makefile |  47 ++
 pc-bios/s390-ccw/libc/README.txt   |  49 ++
 pc-bios/s390-ccw/libc/ctype/Makefile.inc   |  21 +
 pc-bios/s390-ccw/libc/ctype/isdigit.c  |  25 +
 pc-bios/s390-ccw/libc/ctype/isprint.c  |  18 +
 pc-bios/s390-ccw/libc/ctype/isspace.c  |  29 +
 pc-bios/s390-ccw/libc/ctype/isxdigit.c |  21 +
 pc-bios/s390-ccw/libc/ctype/tolower.c  |  18 +
 pc-bios/s390-ccw/libc/ctype/toupper.c  |  21 +
 pc-bios/s390-ccw/libc/include/ctype.h  |  24 +
 pc-bios/s390-ccw/libc/include/errno.h  |  34 +
 pc-bios/s390-ccw/libc/include/limits.h |  32 +
 pc-bios/s390-ccw/libc/include/stdarg.h |  22 +
 pc-bios/s390-ccw/libc/include/stdbool.h|  20 +
 pc-bios/s390-ccw/libc/include/stddef.h |  25 +
 pc-bios/s390-ccw/libc/include/stdint.h |  28 +
 pc-bios/s390-ccw/libc/include/stdio.h  |  63 ++
 pc-bios/s390-ccw/libc/include/stdlib.h |  34 +
 pc-bios/s390-ccw/libc/include/string.h |  37 ++
 pc-bios/s390-ccw/libc/include/sys/socket.h |  53 ++
 pc-bios/s390-ccw/libc/include/unistd.h |  28 +
 pc-bios/s390-ccw/libc/stdio/Makefile.inc   |  24 +
 pc-bios/s390-ccw/libc/stdio/fileno.c   |  19 +
 pc-bios/s390-ccw/libc/stdio/fprintf.c  |  26 +
 pc-bios/s390-ccw/libc/stdio/printf.c   |  27 +
 pc-bios/s390-ccw/libc/stdio/putc.c |  25 +
 pc-bios/s390-ccw/libc/stdio/putchar.c  |  21 +
 pc-bios/s390-ccw/libc/stdio/puts.c |  28 +
 pc-bios/s390-ccw/libc/stdio/setvbuf.c  |  28 +
 pc-bios/s390-ccw/libc/stdio/sprintf.c  |  30 +
 pc-bios/s390-ccw/libc/stdio/stdchnls.c |  23 +
 pc-bios/s390-ccw/libc/stdio/vfprintf.c |  27 +
 pc-bios/s390-ccw/libc/stdio/vsnprintf.c| 298 +
 pc-bios/s390-ccw/libc/stdio/vsprintf.c |  19 +
 pc-bios/s390-ccw/libc/stdlib/Makefile.inc  |  23 +
 pc-bios/s390-ccw/libc/stdlib/atoi.c|  18 +
 pc-bios/s390-ccw/libc/stdlib/atol.c|

[Qemu-devel] [RFC PATCH 08/14] libnet: Refactor some code of netload() into a separate function

2017-06-27 Thread Thomas Huth

netload() is a huge function, it's easy to lose track here. So
let's refactor the TFTP-related loading and error printing code
into a separate function instead.

Signed-off-by: Thomas Huth 
---
 pc-bios/s390-ccw/libnet/netload.c | 177 --
 1 file changed, 93 insertions(+), 84 deletions(-)

diff --git a/pc-bios/s390-ccw/libnet/netload.c 
b/pc-bios/s390-ccw/libnet/netload.c
index 8afe341..f872884 100644
--- a/pc-bios/s390-ccw/libnet/netload.c
+++ b/pc-bios/s390-ccw/libnet/netload.c
@@ -403,13 +403,101 @@ static void seed_rng(uint8_t mac[])
srand(seed);
 }
 
+static int tftp_load(filename_ip_t *fnip, unsigned char *buffer, int len,
+unsigned int retries, int ip_vers)
+{
+   tftp_err_t tftp_err;
+   int rc;
+
+   rc = tftp(fnip, buffer, len, retries, &tftp_err, 1, 1428, ip_vers);
+
+   if (rc > 0) {
+   printf("  TFTP: Received %s (%d KBytes)\n", fnip->filename,
+  rc / 1024);
+   } else if (rc == -1) {
+   netload_error(0x3003, "unknown TFTP error");
+   return -103;
+   } else if (rc == -2) {
+   netload_error(0x3004, "TFTP buffer of %d bytes "
+   "is too small for %s",
+   len, fnip->filename);
+   return -104;
+   } else if (rc == -3) {
+   netload_error(0x3009, "file not found: %s",
+   fnip->filename);
+   return -108;
+   } else if (rc == -4) {
+   netload_error(0x3010, "TFTP access violation");
+   return -109;
+   } else if (rc == -5) {
+   netload_error(0x3011, "illegal TFTP operation");
+   return -110;
+   } else if (rc == -6) {
+   netload_error(0x3012, "unknown TFTP transfer ID");
+   return -111;
+   } else if (rc == -7) {
+   netload_error(0x3013, "no such TFTP user");
+   return -112;
+   } else if (rc == -8) {
+   netload_error(0x3017, "TFTP blocksize negotiation failed");
+   return -116;
+   } else if (rc == -9) {
+   netload_error(0x3018, "file exceeds maximum TFTP transfer 
size");
+   return -117;
+   } else if (rc <= -10 && rc >= -15) {
+   char *icmp_err_str;
+   switch (rc) {
+   case -ICMP_NET_UNREACHABLE - 10:
+   icmp_err_str = "net unreachable";
+   break;
+   case -ICMP_HOST_UNREACHABLE - 10:
+   icmp_err_str = "host unreachable";
+   break;
+   case -ICMP_PROTOCOL_UNREACHABLE - 10:
+   icmp_err_str = "protocol unreachable";
+   break;
+   case -ICMP_PORT_UNREACHABLE - 10:
+   icmp_err_str = "port unreachable";
+   break;
+   case -ICMP_FRAGMENTATION_NEEDED - 10:
+   icmp_err_str = "fragmentation needed and DF set";
+   break;
+   case -ICMP_SOURCE_ROUTE_FAILED - 10:
+   icmp_err_str = "source route failed";
+   break;
+   default:
+   icmp_err_str = " UNKNOWN";
+   break;
+   }
+   netload_error(0x3005, "ICMP ERROR \"%s\"", icmp_err_str);
+   return -105;
+   } else if (rc == -40) {
+   netload_error(0x3014, "TFTP error occurred after "
+   "%d bad packets received",
+   tftp_err.bad_tftp_packets);
+   return -113;
+   } else if (rc == -41) {
+   netload_error(0x3015, "TFTP error occurred after "
+   "missing %d responses",
+   tftp_err.no_packets);
+   return -114;
+   } else if (rc == -42) {
+   netload_error(0x3016, "TFTP error missing block %d, "
+   "expected block was %d",
+   tftp_err.blocks_missed,
+   tftp_err.blocks_received);
+   return -115;
+   }
+
+   return rc;
+}
+
 int netload(char *buffer, int len, char *ret_buffer, int huge_load,
int block_size, char *args_fs, int alen)
 {
int rc;
filename_ip_t fn_ip;
int fd_device;
-   tftp_err_t tftp_err;
obp_tftp_args_t obp_tftp_args;
char null_ip[4] = { 0x00, 0x00, 0x00, 0x00 };
char null_ip6[16] = { 0x00, 0x00, 0x00, 0x00,
@@ -633,94 +721,15 @@ int netload(char *buffer, int len, char *ret_buffer, int 
huge_load,
printf("%s\n", ip6_str);
}
 
-   // accept at most 20 bad packets
-   // wait at most for 40 packets
-   rc = tftp(&fn_ip, (unsigned char *) buffer,
- len, obp_tftp_args.tftp_retries,
- &tftp_err, huge_load, block_size, i

[Qemu-devel] [RFC PATCH 06/14] libnet: Remove remainders of netsave code

2017-06-27 Thread Thomas Huth

The code does not exist in the repository, so it does not make
sense to keep the prototypes and the Forth wrapper around.

Signed-off-by: Thomas Huth 
---
 pc-bios/s390-ccw/libnet/netapps.h | 1 -
 pc-bios/s390-ccw/libnet/tftp.h| 2 --
 2 files changed, 3 deletions(-)

diff --git a/pc-bios/s390-ccw/libnet/netapps.h 
b/pc-bios/s390-ccw/libnet/netapps.h
index 91c1ebd..2fea4a7 100644
--- a/pc-bios/s390-ccw/libnet/netapps.h
+++ b/pc-bios/s390-ccw/libnet/netapps.h
@@ -20,7 +20,6 @@ struct filename_ip;
 
 extern int netload(char *buffer, int len, char *ret_buffer, int huge_load,
   int block_size, char *args_fs, int alen);
-extern int netsave(int argc, char *argv[]);
 extern int ping(char *args_fs, int alen);
 extern int dhcp(char *ret_buffer, struct filename_ip *fn_ip,
unsigned int retries, int flags);
diff --git a/pc-bios/s390-ccw/libnet/tftp.h b/pc-bios/s390-ccw/libnet/tftp.h
index 303feaf..b1dbc21 100644
--- a/pc-bios/s390-ccw/libnet/tftp.h
+++ b/pc-bios/s390-ccw/libnet/tftp.h
@@ -42,8 +42,6 @@ typedef struct {
 
 int tftp(filename_ip_t *, unsigned char  *, int, unsigned int,
  tftp_err_t *, int32_t mode, int32_t blocksize, int ip_version);
-int tftp_netsave(filename_ip_t *, uint8_t * buffer, int len,
-int use_ci, unsigned int retries, tftp_err_t * tftp_err);
 
 int32_t handle_tftp(int fd, uint8_t *, int32_t);
 void handle_tftp_dun(uint8_t err_code);
-- 
1.8.3.1

[Qemu-devel] [RFC PATCH 03/14] pc-bios/s390-ccw: Add a write() function for stdio

2017-06-27 Thread Thomas Huth

The stdio functions from the SLOF libc need a write() function for
printing text to stdout/stderr. Let's implement this function by
refactoring the code from sclp_print().

Signed-off-by: Thomas Huth 
---
 pc-bios/s390-ccw/sclp.c | 32 ++--
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/pc-bios/s390-ccw/sclp.c b/pc-bios/s390-ccw/sclp.c
index a1639ba..a23bdd3 100644
--- a/pc-bios/s390-ccw/sclp.c
+++ b/pc-bios/s390-ccw/sclp.c
@@ -8,6 +8,7 @@
  * directory.
  */
 
+#include 
 #include "s390-ccw.h"
 #include "sclp.h"
 
@@ -51,34 +52,29 @@ void sclp_setup(void)
 sclp_set_write_mask();
 }
 
-static int _strlen(const char *str)
+ssize_t write(int fd, const void *buf, size_t len)
 {
-int i;
-for (i = 0; *str; i++)
-str++;
-return i;
-}
-
-static void _memcpy(char *dest, const char *src, int len)
-{
-int i;
-for (i = 0; i < len; i++)
-dest[i] = src[i];
-}
-
-void sclp_print(const char *str)
-{
-int len = _strlen(str);
 WriteEventData *sccb = (void *)_sccb;
 
+if (fd != 1 && fd != 2) {
+return -EIO;
+}
+
 sccb->h.length = sizeof(WriteEventData) + len;
 sccb->h.function_code = SCLP_FC_NORMAL_WRITE;
 sccb->ebh.length = sizeof(EventBufferHeader) + len;
 sccb->ebh.type = SCLP_EVENT_ASCII_CONSOLE_DATA;
 sccb->ebh.flags = 0;
-_memcpy(sccb->data, str, len);
+memcpy(sccb->data, buf, len);
 
 sclp_service_call(SCLP_CMD_WRITE_EVENT_DATA, sccb);
+
+return len;
+}
+
+void sclp_print(const char *str)
+{
+write(1, str, strlen(str));
 }
 
 void sclp_get_loadparm_ascii(char *loadparm)
-- 
1.8.3.1

[Qemu-devel] [RFC PATCH 01/14] pc-bios/s390-ccw: Add the libc from the SLOF firmware

2017-06-27 Thread Thomas Huth

To be able to use some more advanced libc functions in the s390-ccw
firmware, like printf() and malloc(), we need a better libc here.
This patch adds the C library from the SLOF firmware (taken from
the SLOF commit ID 62674aabe20612a9786fa03e87cf6916ba97a99a). The
files are copied without modifications here and will be adapted for
the s390-ccw firmware by the next patch. I just removed the getopt()
and scanf()-like functions from the libc since we likely do not need
them in the s390-ccw firmware.

Signed-off-by: Thomas Huth 
---
 pc-bios/s390-ccw/libc/Makefile |  61 ++
 pc-bios/s390-ccw/libc/README.txt   |  49 +
 pc-bios/s390-ccw/libc/ctype/Makefile.inc   |  20 ++
 pc-bios/s390-ccw/libc/ctype/isdigit.c  |  25 +++
 pc-bios/s390-ccw/libc/ctype/isprint.c  |  18 ++
 pc-bios/s390-ccw/libc/ctype/isspace.c  |  29 +++
 pc-bios/s390-ccw/libc/ctype/isxdigit.c |  21 ++
 pc-bios/s390-ccw/libc/ctype/tolower.c  |  18 ++
 pc-bios/s390-ccw/libc/ctype/toupper.c  |  21 ++
 pc-bios/s390-ccw/libc/include/ctype.h  |  24 +++
 pc-bios/s390-ccw/libc/include/errno.h  |  34 
 pc-bios/s390-ccw/libc/include/limits.h |  32 
 pc-bios/s390-ccw/libc/include/stdarg.h |  22 +++
 pc-bios/s390-ccw/libc/include/stdbool.h|  20 ++
 pc-bios/s390-ccw/libc/include/stddef.h |  25 +++
 pc-bios/s390-ccw/libc/include/stdint.h |  28 +++
 pc-bios/s390-ccw/libc/include/stdio.h  |  63 ++
 pc-bios/s390-ccw/libc/include/stdlib.h |  34 
 pc-bios/s390-ccw/libc/include/string.h |  37 
 pc-bios/s390-ccw/libc/include/sys/socket.h |  53 +
 pc-bios/s390-ccw/libc/include/unistd.h |  28 +++
 pc-bios/s390-ccw/libc/stdio/Makefile.inc   |  23 +++
 pc-bios/s390-ccw/libc/stdio/fileno.c   |  19 ++
 pc-bios/s390-ccw/libc/stdio/fprintf.c  |  26 +++
 pc-bios/s390-ccw/libc/stdio/printf.c   |  27 +++
 pc-bios/s390-ccw/libc/stdio/putc.c |  25 +++
 pc-bios/s390-ccw/libc/stdio/putchar.c  |  21 ++
 pc-bios/s390-ccw/libc/stdio/puts.c |  28 +++
 pc-bios/s390-ccw/libc/stdio/setvbuf.c  |  28 +++
 pc-bios/s390-ccw/libc/stdio/sprintf.c  |  30 +++
 pc-bios/s390-ccw/libc/stdio/stdchnls.c |  23 +++
 pc-bios/s390-ccw/libc/stdio/vfprintf.c |  27 +++
 pc-bios/s390-ccw/libc/stdio/vsnprintf.c| 298 +
 pc-bios/s390-ccw/libc/stdio/vsprintf.c |  19 ++
 pc-bios/s390-ccw/libc/stdlib/Makefile.inc  |  22 +++
 pc-bios/s390-ccw/libc/stdlib/atoi.c|  18 ++
 pc-bios/s390-ccw/libc/stdlib/atol.c|  18 ++
 pc-bios/s390-ccw/libc/stdlib/error.c   |  15 ++
 pc-bios/s390-ccw/libc/stdlib/free.c|  26 +++
 pc-bios/s390-ccw/libc/stdlib/malloc.c  | 157 +++
 pc-bios/s390-ccw/libc/stdlib/malloc_defs.h |  16 ++
 pc-bios/s390-ccw/libc/stdlib/memalign.c|  26 +++
 pc-bios/s390-ccw/libc/stdlib/rand.c|  29 +++
 pc-bios/s390-ccw/libc/stdlib/realloc.c |  40 
 pc-bios/s390-ccw/libc/stdlib/strtol.c  | 115 +++
 pc-bios/s390-ccw/libc/stdlib/strtoul.c | 105 ++
 pc-bios/s390-ccw/libc/string/Makefile.inc  |  22 +++
 pc-bios/s390-ccw/libc/string/memchr.c  |  29 +++
 pc-bios/s390-ccw/libc/string/memcmp.c  |  30 +++
 pc-bios/s390-ccw/libc/string/memcpy.c  |  27 +++
 pc-bios/s390-ccw/libc/string/memmove.c |  42 
 pc-bios/s390-ccw/libc/string/memset.c  |  25 +++
 pc-bios/s390-ccw/libc/string/strcasecmp.c  |  28 +++
 pc-bios/s390-ccw/libc/string/strcat.c  |  24 +++
 pc-bios/s390-ccw/libc/string/strchr.c  |  28 +++
 pc-bios/s390-ccw/libc/string/strcmp.c  |  28 +++
 pc-bios/s390-ccw/libc/string/strcpy.c  |  25 +++
 pc-bios/s390-ccw/libc/string/strlen.c  |  27 +++
 pc-bios/s390-ccw/libc/string/strncasecmp.c |  32 
 pc-bios/s390-ccw/libc/string/strncmp.c |  31 +++
 pc-bios/s390-ccw/libc/string/strncpy.c |  33 
 pc-bios/s390-ccw/libc/string/strstr.c  |  37 
 pc-bios/s390-ccw/libc/string/strtok.c  |  45 +
 63 files changed, 2356 insertions(+)
 create mode 100644 pc-bios/s390-ccw/libc/Makefile
 create mode 100644 pc-bios/s390-ccw/libc/README.txt
 create mode 100644 pc-bios/s390-ccw/libc/ctype/Makefile.inc
 create mode 100644 pc-bios/s390-ccw/libc/ctype/isdigit.c
 create mode 100644 pc-bios/s390-ccw/libc/ctype/isprint.c
 create mode 100644 pc-bios/s390-ccw/libc/ctype/isspace.c
 create mode 100644 pc-bios/s390-ccw/libc/ctype/isxdigit.c
 create mode 100644 pc-bios/s390-ccw/libc/ctype/tolower.c
 create mode 100644 pc-bios/s390-ccw/libc/ctype/toupper.c
 create mode 100644 pc-bios/s390-ccw/libc/include/ctype.h
 create mode 100644 pc-bios/s390-ccw/libc/include/errno.h
 create mode 100644 pc-bios/s390-ccw/libc/include/limits.h
 create mode 100644 pc-bios/s390-ccw/libc/include/stdarg.h
 create mode 100644 pc-bios/s390-ccw/libc/include/stdbool.h
 create mode 100644 pc-bios/s390-ccw/libc/include/stddef.h
 create mode 100644 pc-bios/s390-ccw/libc/include/stdint.h
 create mode 100644 pc-bios/s390-ccw/

[Qemu-devel] [RFC PATCH 04/14] pc-bios/s390-ccw: Add implementation of sbrk()

2017-06-27 Thread Thomas Huth

To be able to use malloc() and friends from the SLOF libc, we need to
provide an implementation of the sbrk() function. This patch adds such
an implemenation, which has been taken from the SLOF firmware, too.
Since the sbrk() function uses a big array as the heap, we now also
have got to lower the fwbase in hw/s390x/ipl.c accordingly.

Signed-off-by: Thomas Huth 
---
 hw/s390x/ipl.c|  2 +-
 pc-bios/s390-ccw/Makefile |  2 +-
 pc-bios/s390-ccw/sbrk.c   | 39 +++
 3 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 pc-bios/s390-ccw/sbrk.c

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index 4e6469d..913eee5 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -113,7 +113,7 @@ static void s390_ipl_realize(DeviceState *dev, Error **errp)
  * even if an external kernel has been defined.
  */
 if (!ipl->kernel || ipl->enforce_bios) {
-uint64_t fwbase = (MIN(ram_size, 0x8000U) - 0x20) & ~0xUL;
+uint64_t fwbase = (MIN(ram_size, 0x8000U) - 0x40) & ~0xUL;
 
 if (bios_name == NULL) {
 bios_name = ipl->firmware;
diff --git a/pc-bios/s390-ccw/Makefile b/pc-bios/s390-ccw/Makefile
index 3371c5b..8fbefe8 100644
--- a/pc-bios/s390-ccw/Makefile
+++ b/pc-bios/s390-ccw/Makefile
@@ -10,7 +10,7 @@ $(call set-vpath, $(SRC_PATH)/pc-bios/s390-ccw)
 .PHONY : all clean build-all libc.a
 
 OBJECTS = start.o main.o bootmap.o sclp.o virtio.o virtio-scsi.o
-OBJECTS += libc.a
+OBJECTS += libc.a sbrk.o
 QEMU_CFLAGS := $(filter -W%, $(QEMU_CFLAGS))
 QEMU_CFLAGS += -ffreestanding -fno-delete-null-pointer-checks -msoft-float
 QEMU_CFLAGS += -march=z900 -fPIE -fno-strict-aliasing
diff --git a/pc-bios/s390-ccw/sbrk.c b/pc-bios/s390-ccw/sbrk.c
new file mode 100644
index 000..2ec1b5f
--- /dev/null
+++ b/pc-bios/s390-ccw/sbrk.c
@@ -0,0 +1,39 @@
+/**
+ * Copyright (c) 2004, 2008 IBM Corporation
+ * All rights reserved.
+ * This program and the accompanying materials
+ * are made available under the terms of the BSD License
+ * which accompanies this distribution, and is available at
+ * http://www.opensource.org/licenses/bsd-license.php
+ *
+ * Contributors:
+ * IBM Corporation - initial implementation
+ */
+
+#include 
+
+#define HEAP_SIZE 0x20
+
+
+static char heap[HEAP_SIZE];
+static char *actptr;
+
+void *sbrk(int increment)
+{
+   char *oldptr;
+
+   /* Called for the first time? Then init the actual pointer */
+   if (!actptr) {
+   actptr = heap;
+   }
+
+   if (actptr + increment > heap + HEAP_SIZE) {
+   /* Out of memory */
+   return (void *)-1;
+   }
+
+   oldptr = actptr;
+   actptr += increment;
+
+   return oldptr;
+}
-- 
1.8.3.1

[Qemu-devel] [RFC PATCH 07/14] libnet: Rework error message printing

2017-06-27 Thread Thomas Huth

There is a repetive pattern of code in netload.c to print out
error message: snprintf(buf, ...) + bootmsg_error() + write_mm_log().
The code can be simplified / shortened quite a bit by consolidating
this pattern in a helper function.

Signed-off-by: Thomas Huth 
---
 pc-bios/s390-ccw/libnet/netload.c | 126 +-
 1 file changed, 44 insertions(+), 82 deletions(-)

diff --git a/pc-bios/s390-ccw/libnet/netload.c 
b/pc-bios/s390-ccw/libnet/netload.c
index cd3720a..8afe341 100644
--- a/pc-bios/s390-ccw/libnet/netload.c
+++ b/pc-bios/s390-ccw/libnet/netload.c
@@ -52,6 +52,23 @@ typedef struct {
int  tftp_retries;
 } obp_tftp_args_t;
 
+/**
+ * Print error with preceeding error code
+ */
+static void netload_error(int errcode, const char *format, ...)
+{
+   va_list vargs;
+   char buf[256];
+
+   sprintf(buf, "E%04X: (net) ", errcode);
+
+   va_start(vargs, format);
+   vsnprintf(&buf[13], sizeof(buf) - 13, format, vargs);
+   va_end(vargs);
+
+   bootmsg_error(errcode, &buf[7]);
+   write_mm_log(buf, strlen(buf), 0x91);
+}
 
 /**
  * Parses a argument string for IPv6 booting, extracts all
@@ -389,7 +406,6 @@ static void seed_rng(uint8_t mac[])
 int netload(char *buffer, int len, char *ret_buffer, int huge_load,
int block_size, char *args_fs, int alen)
 {
-   char buf[256];
int rc;
filename_ip_t fn_ip;
int fd_device;
@@ -427,17 +443,11 @@ int netload(char *buffer, int len, char *ret_buffer, int 
huge_load,
}
 
if (fd_device == -1) {
-   strcpy(buf,"E3000: (net) Could not read MAC address");
-   bootmsg_error(0x3000, &buf[7]);
-
-   write_mm_log(buf, strlen(buf), 0x91);
+   netload_error(0x3000, "Could not read MAC address");
return -100;
}
else if (fd_device == -2) {
-   strcpy(buf,"E3006: (net) Could not initialize network device");
-   bootmsg_error(0x3006, &buf[7]);
-
-   write_mm_log(buf, strlen(buf), 0x91);
+   netload_error(0x3006, "Could not initialize network device");
return -101;
}
 
@@ -567,10 +577,7 @@ int netload(char *buffer, int len, char *ret_buffer, int 
huge_load,
memcpy(&fn_ip.server_ip6.addr, 
&obp_tftp_args.si6addr.addr, 16);
}
if (rc == -1) {
-   strcpy(buf,"E3001: (net) Could not get IP address");
-   bootmsg_error(0x3001, &buf[7]);
-
-   write_mm_log(buf, strlen(buf), 0x91);
+   netload_error(0x3001, "Could not get IP address");
close(fn_ip.fd);
return -101;
}
@@ -586,29 +593,21 @@ int netload(char *buffer, int len, char *ret_buffer, int 
huge_load,
}
 
if (rc == -2) {
-   sprintf(buf,
-   "E3002: (net) ARP request to TFTP server "
+   netload_error(0x3002, "ARP request to TFTP server "
"(%d.%d.%d.%d) failed",
((fn_ip.server_ip >> 24) & 0xFF),
((fn_ip.server_ip >> 16) & 0xFF),
((fn_ip.server_ip >>  8) & 0xFF),
( fn_ip.server_ip& 0xFF));
-   bootmsg_error(0x3002, &buf[7]);
-
-   write_mm_log(buf, strlen(buf), 0x91);
close(fn_ip.fd);
return -102;
}
if (rc == -4 || rc == -3) {
-   strcpy(buf,"E3008: (net) Can't obtain TFTP server IP address");
-   bootmsg_error(0x3008, &buf[7]);
-
-   write_mm_log(buf, strlen(buf), 0x91);
+   netload_error(0x3008, "Can't obtain TFTP server IP address");
close(fn_ip.fd);
return -107;
}
 
-
/***
 *
 * Load file via TFTP into buffer provided by OpenFirmware
@@ -649,114 +648,77 @@ int netload(char *buffer, int len, char *ret_buffer, int 
huge_load,
printf("  TFTP: Received %s (%d KBytes)\n", fn_ip.filename,
   rc / 1024);
} else if (rc == -1) {
-   bootmsg_error(0x3003, "(net) unknown TFTP error");
+   netload_error(0x3003, "unknown TFTP error");
return -103;
} else if (rc == -2) {
-   sprintf(buf,
-   "E3004: (net) TFTP buffer of %d bytes "
+   netload_error(0x3004, "TFTP buffer of %d bytes "
"is too small for %s",
len, fn_ip.filename);
-   bootmsg_error(0x3004, &buf[7]);
-
-   write_mm_log(buf, strlen(buf), 0x91);
return -104;
} else if (rc == -3) {
-   sprintf(buf,"E3009: (net) file not found: %s",
-  fn_ip.filename);
-   bootmsg_error(0x3009, &buf[7]);
-
-

[Qemu-devel] [RFC PATCH 13/14] pc-bios/s390-ccw: Allow loading to address 0

2017-06-27 Thread Thomas Huth

Kernels are normally loaded to address 0, so the check for NULL
in the TFTP code has to be replaced.

Signed-off-by: Thomas Huth 
---
 pc-bios/s390-ccw/libnet/tftp.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/pc-bios/s390-ccw/libnet/tftp.c b/pc-bios/s390-ccw/libnet/tftp.c
index 34f448c..108092b 100644
--- a/pc-bios/s390-ccw/libnet/tftp.c
+++ b/pc-bios/s390-ccw/libnet/tftp.c
@@ -26,6 +26,7 @@
 
 #define MAX_BLOCKSIZE 1428
 #define BUFFER_LEN 256
+#define INVALID_BUFFER ((void *)-1L)
 
 #define ENOTFOUND 1
 #define EACCESS   2
@@ -45,7 +46,7 @@
 
 /* Local variables */
 static unsigned char packet[BUFFER_LEN];
-static unsigned char  *buffer = NULL;
+static unsigned char  *buffer = INVALID_BUFFER;
 static unsigned short block;
 static unsigned short blocksize;
 static char blocksize_str[6];/* Blocksize string for read request */
@@ -337,7 +338,7 @@ int32_t handle_tftp(int fd, uint8_t *pkt, int32_t 
packetsize)
struct tftphdr *tftp;
 
/* buffer is only set if we are handling TFTP */
-   if (buffer == NULL )
+   if (buffer == INVALID_BUFFER)
return 0;
 
 #ifndef __DEBUG__
@@ -536,7 +537,7 @@ int tftp(filename_ip_t * _fn_ip, unsigned char *_buffer, 
int _len,
printf("  Receiving data:  ");
print_progress(-1, 0);
 
-   // Setting buffer to a non-zero address enabled handling of received 
TFTP packets.
+   /* Set buffer to a valid address, enables handling of received packets 
*/
buffer = _buffer;
 
set_timer(TICKS_SEC);
@@ -579,8 +580,8 @@ int tftp(filename_ip_t * _fn_ip, unsigned char *_buffer, 
int _len,
}
}
 
-   // Setting buffer to NULL disables handling of received TFTP packets.
-   buffer = NULL;
+   /* Setting buffer invalid to disable handling of received packets */
+   buffer = INVALID_BUFFER;
 
if (tftp_errno)
return tftp_errno;
-- 
1.8.3.1

[Qemu-devel] [RFC PATCH 10/14] pc-bios/s390-ccw: Add timer code for the libnet

2017-06-27 Thread Thomas Huth

The libnet expects certain timer functions to exist, so that it
is able to deal with timeouts etc.
This patch implements these timer functions via the STORE CLOCK (stck)
CPU instruction.

Signed-off-by: Thomas Huth 
---
 pc-bios/s390-ccw/libnet/Makefile |  2 +-
 pc-bios/s390-ccw/libnet/timer.c  | 40 
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 pc-bios/s390-ccw/libnet/timer.c

diff --git a/pc-bios/s390-ccw/libnet/Makefile b/pc-bios/s390-ccw/libnet/Makefile
index 72e12d7..c8235f3 100644
--- a/pc-bios/s390-ccw/libnet/Makefile
+++ b/pc-bios/s390-ccw/libnet/Makefile
@@ -24,7 +24,7 @@ QEMU_CFLAGS += $(call cc-option, $(QEMU_CFLAGS), 
-fno-stack-protector)
 LDFLAGS += -Wl,-pie -nostdlib
 
 SRCS = ethernet.c ipv4.c udp.c tcp.c dns.c dhcp.c tftp.c \
-   ipv6.c dhcpv6.c icmpv6.c ndp.c netload.c args.c
+   ipv6.c dhcpv6.c icmpv6.c ndp.c netload.c args.c timer.c
 
 OBJS = $(SRCS:%.c=%.o)
 
diff --git a/pc-bios/s390-ccw/libnet/timer.c b/pc-bios/s390-ccw/libnet/timer.c
new file mode 100644
index 000..ddbd7a2
--- /dev/null
+++ b/pc-bios/s390-ccw/libnet/timer.c
@@ -0,0 +1,40 @@
+/*
+ * Timer functions for libnet
+ *
+ * Copyright 2017 Thomas Huth, Red Hat Inc.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+#include 
+#include "time.h"
+
+static uint64_t dest_timer;
+
+static uint64_t get_timer_ms(void)
+{
+   uint64_t clk;
+
+   asm volatile(" stck %0 " : : "Q"(clk) : "memory");
+
+   /* Bit 51 is incrememented each microsecond */
+   return (clk >> (63 - 51)) / 1000;
+}
+
+void set_timer(int val)
+{
+   dest_timer = get_timer_ms() + val;
+}
+
+int get_timer(void)
+{
+   return dest_timer - get_timer_ms();
+}
+
+int get_sec_ticks(void)
+{
+   return 1000;/* number of ticks in 1 second */
+}
-- 
1.8.3.1

[Qemu-devel] [RFC PATCH 09/14] pc-bios/s390-ccw: Make the basic libnet code compilable

2017-06-27 Thread Thomas Huth

Adjust the Makefiles, remove non-required code and fix some spots that
generated compiler warnings / errors with the s390-ccw firmware CFLAGS.

Signed-off-by: Thomas Huth 
---
 configure |   3 +-
 pc-bios/s390-ccw/Makefile |   6 +-
 pc-bios/s390-ccw/libnet/Makefile  |  38 ++---
 pc-bios/s390-ccw/libnet/netapps.h |   4 +-
 pc-bios/s390-ccw/libnet/netload.c | 319 ++
 pc-bios/s390-ccw/libnet/tftp.c|   2 +-
 6 files changed, 33 insertions(+), 339 deletions(-)

diff --git a/configure b/configure
index 954c286..0ac761e 100755
--- a/configure
+++ b/configure
@@ -6378,7 +6378,7 @@ fi
 DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32 tests/libqos 
tests/qapi-schema tests/tcg/xtensa tests/qemu-iotests"
 DIRS="$DIRS docs docs/interop fsdev"
 DIRS="$DIRS pc-bios/optionrom pc-bios/spapr-rtas"
-DIRS="$DIRS pc-bios/s390-ccw pc-bios/s390-ccw/libc"
+DIRS="$DIRS pc-bios/s390-ccw pc-bios/s390-ccw/libc pc-bios/s390-ccw/libnet"
 DIRS="$DIRS roms/seabios roms/vgabios"
 DIRS="$DIRS qapi-generated"
 FILES="Makefile tests/tcg/Makefile qdict-test-data.txt"
@@ -6387,6 +6387,7 @@ FILES="$FILES tests/tcg/lm32/Makefile 
tests/tcg/xtensa/Makefile po/Makefile"
 FILES="$FILES pc-bios/optionrom/Makefile pc-bios/keymaps"
 FILES="$FILES pc-bios/spapr-rtas/Makefile"
 FILES="$FILES pc-bios/s390-ccw/Makefile pc-bios/s390-ccw/libc/Makefile"
+FILES="$FILES pc-bios/s390-ccw/libnet/Makefile"
 FILES="$FILES roms/seabios/Makefile roms/vgabios/Makefile"
 FILES="$FILES pc-bios/qemu-icon.bmp"
 FILES="$FILES .gdbinit scripts" # scripts needed by relative path in .gdbinit
diff --git a/pc-bios/s390-ccw/Makefile b/pc-bios/s390-ccw/Makefile
index 8fbefe8..02b9b08 100644
--- a/pc-bios/s390-ccw/Makefile
+++ b/pc-bios/s390-ccw/Makefile
@@ -7,7 +7,7 @@ include $(SRC_PATH)/rules.mak
 
 $(call set-vpath, $(SRC_PATH)/pc-bios/s390-ccw)
 
-.PHONY : all clean build-all libc.a
+.PHONY : all clean build-all libc.a libnet.a
 
 OBJECTS = start.o main.o bootmap.o sclp.o virtio.o virtio-scsi.o
 OBJECTS += libc.a sbrk.o
@@ -26,6 +26,9 @@ s390-ccw.elf: $(OBJECTS)
 libc.a:
@$(MAKE) -C libc V="$(V)"
 
+libnet.a:
+   @$(MAKE) -C libnet V="$(V)"
+
 STRIP ?= strip
 
 s390-ccw.img: s390-ccw.elf
@@ -36,3 +39,4 @@ $(OBJECTS): Makefile
 clean:
rm -f *.o *.d *.img *.elf *~
@$(MAKE) -C libc clean
+   @$(MAKE) -C libnet clean
diff --git a/pc-bios/s390-ccw/libnet/Makefile b/pc-bios/s390-ccw/libnet/Makefile
index 83ac1e5..72e12d7 100644
--- a/pc-bios/s390-ccw/libnet/Makefile
+++ b/pc-bios/s390-ccw/libnet/Makefile
@@ -10,16 +10,21 @@
 # * IBM Corporation - initial implementation
 # /
 
-ifndef TOP
-  TOP = $(shell while ! test -e make.rules; do cd ..  ; done; pwd)
-  export TOP
-endif
-include $(TOP)/make.rules
+include ../../../config-host.mak
+include $(SRC_PATH)/rules.mak
 
-CFLAGS += -I. -I.. -I../libc/include -I$(TOP)/include
+$(call set-vpath, $(SRC_PATH)/pc-bios/s390-ccw/libnet)
 
-SRCS = ethernet.c ipv4.c udp.c tcp.c dns.c bootp.c dhcp.c tftp.c \
-   ipv6.c dhcpv6.c icmpv6.c ndp.c netload.c ping.c args.c
+QEMU_CFLAGS := $(filter -W%, $(QEMU_CFLAGS))
+QEMU_CFLAGS += -ffreestanding -fno-delete-null-pointer-checks -msoft-float
+QEMU_CFLAGS += -march=z900 -fPIE -fno-strict-aliasing -Wno-redundant-decls
+QEMU_CFLAGS += -I$(SRC_PATH)/pc-bios/s390-ccw/libnet
+QEMU_CFLAGS += -I$(SRC_PATH)/pc-bios/s390-ccw/libc/include
+QEMU_CFLAGS += $(call cc-option, $(QEMU_CFLAGS), -fno-stack-protector)
+LDFLAGS += -Wl,-pie -nostdlib
+
+SRCS = ethernet.c ipv4.c udp.c tcp.c dns.c dhcp.c tftp.c \
+   ipv6.c dhcpv6.c icmpv6.c ndp.c netload.c args.c
 
 OBJS = $(SRCS:%.c=%.o)
 
@@ -28,23 +33,10 @@ TARGET = ../libnet.a
 all: $(TARGET)
 
 $(TARGET): $(OBJS)
-   $(AR) -rc $@ $(OBJS)
-   $(RANLIB) $@
+   $(call quiet-command,$(AR) -rc $@ $(OBJS),"AR","$(TARGET_DIR)$@")
 
 clean:
-   $(RM) $(TARGET) $(OBJS)
+   rm -f $(TARGET) $(OBJS)
 
 distclean: clean
-   $(RM) Makefile.dep
-
-
-# Rules for creating the dependency file:
-depend:
-   $(RM) Makefile.dep
-   $(MAKE) Makefile.dep
-
-Makefile.dep: Makefile
-   $(CC) -M $(CPPFLAGS) $(CFLAGS) $(SRCS) > Makefile.dep
 
-# Include dependency file if available:
--include Makefile.dep
diff --git a/pc-bios/s390-ccw/libnet/netapps.h 
b/pc-bios/s390-ccw/libnet/netapps.h
index 2fea4a7..d2283af 100644
--- a/pc-bios/s390-ccw/libnet/netapps.h
+++ b/pc-bios/s390-ccw/libnet/netapps.h
@@ -18,9 +18,7 @@
 
 struct filename_ip;
 
-extern int netload(char *buffer, int len, char *ret_buffer, int huge_load,
-  int block_size, char *args_fs, int alen);
-extern int ping(char *args_fs, int alen);
+extern int netload(char *buffer, int len, char *ret_buffer);
 extern int dhcp(char *ret_buffer, struct filename_ip *fn_ip,
unsigned int retries, int flags);
 
diff --git a/pc-bios/s390-ccw/libnet/netload.c 
b/pc-bios/s390-ccw/libnet/netload.

[Qemu-devel] [RFC PATCH 11/14] pc-bios/s390-ccw: Add virtio-net driver code

2017-06-27 Thread Thomas Huth

This virtio-net driver contains the recv() and send() functions
that are required by libnet to receive and send packets.

Signed-off-by: Thomas Huth 
---
 pc-bios/s390-ccw/Makefile |   2 +-
 pc-bios/s390-ccw/virtio-net.c | 125 ++
 pc-bios/s390-ccw/virtio.c |  16 --
 pc-bios/s390-ccw/virtio.h |  11 
 4 files changed, 149 insertions(+), 5 deletions(-)
 create mode 100644 pc-bios/s390-ccw/virtio-net.c

diff --git a/pc-bios/s390-ccw/Makefile b/pc-bios/s390-ccw/Makefile
index 02b9b08..369bd65 100644
--- a/pc-bios/s390-ccw/Makefile
+++ b/pc-bios/s390-ccw/Makefile
@@ -9,7 +9,7 @@ $(call set-vpath, $(SRC_PATH)/pc-bios/s390-ccw)
 
 .PHONY : all clean build-all libc.a libnet.a
 
-OBJECTS = start.o main.o bootmap.o sclp.o virtio.o virtio-scsi.o
+OBJECTS = start.o main.o bootmap.o sclp.o virtio.o virtio-scsi.o virtio-net.o
 OBJECTS += libc.a sbrk.o
 QEMU_CFLAGS := $(filter -W%, $(QEMU_CFLAGS))
 QEMU_CFLAGS += -ffreestanding -fno-delete-null-pointer-checks -msoft-float
diff --git a/pc-bios/s390-ccw/virtio-net.c b/pc-bios/s390-ccw/virtio-net.c
new file mode 100644
index 000..5c2f439
--- /dev/null
+++ b/pc-bios/s390-ccw/virtio-net.c
@@ -0,0 +1,125 @@
+/*
+ * Virtio-net driver for the s390-ccw firmware
+ *
+ * Copyright 2017 Thomas Huth, Red Hat Inc.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "libnet/ethernet.h"
+#include "virtio.h"
+
+#define VQ_RX 0 /* Receive queue */
+#define VQ_TX 1 /* Transmit queue */
+
+struct VirtioNetHdr {
+uint8_t flags;
+uint8_t gso_type;
+uint16_t hdr_len;
+uint16_t gso_size;
+uint16_t csum_start;
+uint16_t csum_offset;
+/*uint16_t num_buffers;*/ /* Only with VIRTIO_NET_F_MRG_RXBUF or VIRTIO1 */
+};
+typedef struct VirtioNetHdr VirtioNetHdr;
+
+static uint16_t rx_last_idx;  /* Last index in receive queue "used" ring */
+
+int socket(int domain, int type, int proto, char *mac_addr)
+{
+VDev *vdev = virtio_get_device();
+VRing *rxvq = &vdev->vrings[VQ_RX];
+void *buf;
+int i;
+
+memcpy(mac_addr, vdev->config.net.mac, ETH_ALEN);
+
+for (i = 0; i < 64; i++) {
+buf = malloc(ETH_MTU_SIZE + sizeof(VirtioNetHdr));
+IPL_assert(buf != NULL, "Can not allocate memory for receive buffers");
+vring_send_buf(rxvq, buf, ETH_MTU_SIZE + sizeof(VirtioNetHdr),
+   VRING_DESC_F_WRITE);
+}
+vring_notify(rxvq);
+
+return 0;
+}
+
+int send(int fd, const void *buf, int len, int flags)
+{
+VirtioNetHdr tx_hdr;
+VDev *vdev = virtio_get_device();
+VRing *txvq = &vdev->vrings[VQ_TX];
+
+/* Set up header - we do not use anything special, so simply clear it */
+memset(&tx_hdr, 0, sizeof(tx_hdr));
+
+vring_send_buf(txvq, &tx_hdr, sizeof(tx_hdr), VRING_DESC_F_NEXT);
+vring_send_buf(txvq, (void *)buf, len, VRING_HIDDEN_IS_CHAIN);
+while (!vr_poll(txvq)) {
+yield();
+}
+if (drain_irqs(txvq->schid)) {
+puts("send: drain irqs failed");
+return -1;
+}
+
+return len;
+}
+
+int recv(int fd, void *buf, int maxlen, int flags)
+{
+VDev *vdev = virtio_get_device();
+VRing *rxvq = &vdev->vrings[VQ_RX];
+int len, id;
+uint8_t *pkt;
+
+if (rx_last_idx == rxvq->used->idx) {
+return 0;
+}
+
+len = rxvq->used->ring[rx_last_idx % rxvq->num].len - sizeof(VirtioNetHdr);
+if (len > maxlen) {
+puts("virtio-net: Receive buffer too small");
+len = maxlen;
+}
+id = rxvq->used->ring[rx_last_idx % rxvq->num].id % rxvq->num;
+pkt = (uint8_t *)(rxvq->desc[id].addr + sizeof(VirtioNetHdr));
+
+#if 0
+/* Dump packet */
+printf("\nbuf %p: len=%i\n", (void*)rxvq->desc[id].addr, len);
+int i;
+for (i = 0; i < 64; i++) {
+printf(" %02x", pkt[i]);
+if ((i%16)==15)
+printf("\n");
+}
+printf("\n");
+#endif
+
+/* Copy data to destination buffer */
+memcpy(buf, pkt, len);
+
+/* Mark buffer as available to the host again */
+rxvq->avail->ring[rxvq->avail->idx % rxvq->num] = id;
+rxvq->avail->idx = rxvq->avail->idx + 1;
+vring_notify(rxvq);
+
+/* Move index to next entry */
+rx_last_idx = rx_last_idx + 1;
+
+return len;
+}
+
+int close(int fd)
+{
+return 0;
+}
diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
index 6ee93d5..b5219aa 100644
--- a/pc-bios/s390-ccw/virtio.c
+++ b/pc-bios/s390-ccw/virtio.c
@@ -69,7 +69,7 @@ static long virtio_notify(SubChannelId schid, int vq_idx, 
long cookie)
  * Virtio functions*
  ***/
 
-static int drain_irqs(SubChannelId schid)
+int drain_irqs(SubChannelId schid)

[Qemu-devel] [RFC PATCH 12/14] pc-bios/s390-ccw: Load file via an intermediate .INS file

2017-06-27 Thread Thomas Huth

We need to know which files have to be loaded at which address.
Normally you need at least two files, a kernel image and an initrd
image. Since the normal DHCP information only provides one file,
we now load an intermediate .INS file first which has to contain
the information about the other files. The .INS file has the
same syntax as the .INS files that are already used on s390x
Linux distribution CD-ROMs: First line is the title (starting
with "* "), and in the following lines you can find the file
names followed by the address where they should be loaded.

Signed-off-by: Thomas Huth 
---
 pc-bios/s390-ccw/libnet/netapps.h |  2 +-
 pc-bios/s390-ccw/libnet/netload.c | 72 ---
 2 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/pc-bios/s390-ccw/libnet/netapps.h 
b/pc-bios/s390-ccw/libnet/netapps.h
index d2283af..61e8a11 100644
--- a/pc-bios/s390-ccw/libnet/netapps.h
+++ b/pc-bios/s390-ccw/libnet/netapps.h
@@ -18,7 +18,7 @@
 
 struct filename_ip;
 
-extern int netload(char *buffer, int len, char *ret_buffer);
+extern int netload(void);
 extern int dhcp(char *ret_buffer, struct filename_ip *fn_ip,
unsigned int retries, int flags);
 
diff --git a/pc-bios/s390-ccw/libnet/netload.c 
b/pc-bios/s390-ccw/libnet/netload.c
index eae8333..28e8711 100644
--- a/pc-bios/s390-ccw/libnet/netload.c
+++ b/pc-bios/s390-ccw/libnet/netload.c
@@ -26,6 +26,8 @@
 #include "args.h"
 #include "netapps.h"
 
+#define MAX_INS_FILE_LEN 16384
+
 #define IP_INIT_DEFAULT 5
 #define IP_INIT_NONE0
 #define IP_INIT_BOOTP   1
@@ -138,7 +140,7 @@ static void seed_rng(uint8_t mac[])
srand(seed);
 }
 
-static int tftp_load(filename_ip_t *fnip, unsigned char *buffer, int len,
+static int tftp_load(filename_ip_t *fnip, void *buffer, int len,
 unsigned int retries, int ip_vers)
 {
tftp_err_t tftp_err;
@@ -227,7 +229,56 @@ static int tftp_load(filename_ip_t *fnip, unsigned char 
*buffer, int len,
return rc;
 }
 
-int netload(char *buffer, int len, char *ret_buffer)
+static int load_from_ins_file(char *insbuf, filename_ip_t *fn_ip, int retries,
+ int ip_version)
+{
+   char *ptr;
+   int rc = -1, llen;
+   void *destaddr;
+
+   ptr = strchr(insbuf, '\n');
+
+   if (!ptr || insbuf[0] != '*' || insbuf[1] != ' ') {
+   puts("Does not seem to be a valid .INS file");
+   return -1;
+   }
+
+   *ptr = 0;
+   printf("\nParsing .INS file:\n  %s\n", &insbuf[2]);
+
+   insbuf = ptr + 1;
+   while (*insbuf) {
+   ptr = strchr(insbuf, '\n');
+   if (ptr) {
+   *ptr = 0;
+   }
+   llen = strlen(insbuf);
+   if (!llen) {
+   insbuf = ptr + 1;
+   continue;
+   }
+   ptr = strchr(insbuf, ' ');
+   if (!ptr) {
+   puts("Missing space separator in .INS file");
+   return -1;
+   }
+   *ptr = 0;
+   strncpy((char *)fn_ip->filename, insbuf,
+   sizeof(fn_ip->filename));
+   destaddr = (char *)atol(ptr + 1);
+   printf("\n  Loading file \"%s\" via TFTP to %p\n", insbuf,
+  destaddr);
+   rc = tftp_load(fn_ip, destaddr, 5000, retries, ip_version);
+   if (rc <= 0) {
+   break;
+   }
+   insbuf += llen + 1;
+   }
+
+   return rc;
+}
+
+int netload(void)
 {
int rc;
filename_ip_t fn_ip;
@@ -239,6 +290,7 @@ int netload(char *buffer, int len, char *ret_buffer)
 0x00, 0x00, 0x00, 0x00, 
 0x00, 0x00, 0x00, 0x00 };
uint8_t own_mac[6];
+   char *ins_buf, *ret_buffer = NULL;
 
puts("\n Initializing NIC");
memset(&fn_ip, 0, sizeof(filename_ip_t));
@@ -420,9 +472,19 @@ int netload(char *buffer, int len, char *ret_buffer)
printf("%s\n", ip6_str);
}
 
-   /* Do the TFTP load and print error message if necessary */
-   rc = tftp_load(&fn_ip, (unsigned char *)buffer, len,
-  obp_tftp_args.tftp_retries, ip_version);
+   ins_buf = malloc(MAX_INS_FILE_LEN);
+   if (!ins_buf) {
+   puts("Failed to allocate memory for the .INS file");
+   return -1;
+   }
+   memset(ins_buf, 0, MAX_INS_FILE_LEN);
+   rc = tftp_load(&fn_ip, ins_buf, MAX_INS_FILE_LEN - 1,
+  obp_tftp_args.tftp_retries, ip_version);
+   if (rc > 0) {
+   rc = load_from_ins_file(ins_buf, &fn_ip,
+   obp_tftp_args.tftp_retries, ip_version);
+   }
+   free(ins_buf);
 
if (obp_tftp_args.ip_init == IP_INIT_DHCP)
dhcp_send_release(fn_ip.fd);
-- 
1.8.3.1

[Qemu-devel] [RFC PATCH 14/14] pc-bios/s390-ccw: Wire up the netload code

2017-06-27 Thread Thomas Huth

Link the libnet.a to the s390-ccw.elf file, call netload() instead of
the external network boot image, and jump to loaded kernel at address 0,
so that we can finally do a full network boot with the s390-ccw firmware
now.

Signed-off-by: Thomas Huth 
---
 pc-bios/s390-ccw/Makefile  |  2 +-
 pc-bios/s390-ccw/bootmap.c | 10 +-
 pc-bios/s390-ccw/main.c|  3 +--
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/pc-bios/s390-ccw/Makefile b/pc-bios/s390-ccw/Makefile
index 369bd65..827d7be 100644
--- a/pc-bios/s390-ccw/Makefile
+++ b/pc-bios/s390-ccw/Makefile
@@ -10,7 +10,7 @@ $(call set-vpath, $(SRC_PATH)/pc-bios/s390-ccw)
 .PHONY : all clean build-all libc.a libnet.a
 
 OBJECTS = start.o main.o bootmap.o sclp.o virtio.o virtio-scsi.o virtio-net.o
-OBJECTS += libc.a sbrk.o
+OBJECTS += libnet.a libc.a sbrk.o
 QEMU_CFLAGS := $(filter -W%, $(QEMU_CFLAGS))
 QEMU_CFLAGS += -ffreestanding -fno-delete-null-pointer-checks -msoft-float
 QEMU_CFLAGS += -march=z900 -fPIE -fno-strict-aliasing
diff --git a/pc-bios/s390-ccw/bootmap.c b/pc-bios/s390-ccw/bootmap.c
index 523fa78..32befda 100644
--- a/pc-bios/s390-ccw/bootmap.c
+++ b/pc-bios/s390-ccw/bootmap.c
@@ -11,6 +11,7 @@
 #include "s390-ccw.h"
 #include "bootmap.h"
 #include "virtio.h"
+#include "libnet/netapps.h"
 
 #ifdef DEBUG
 /* #define DEBUG_FALLBACK */
@@ -744,7 +745,14 @@ void zipl_load(void)
 }
 
 if (virtio_get_device_type() == VIRTIO_ID_NET) {
-jump_to_IPL_code(vdev->netboot_start_addr);
+long len;
+
+len = netload();
+if (len < 0) {
+panic("Network loading failed");
+}
+sclp_print("Netload done, starting kernel...\n");
+asm volatile (" lpsw 0(%0) " : : "r"(0) : "memory");
 }
 
 ipl_scsi();
diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
index 1cacc1b..b058e1b 100644
--- a/pc-bios/s390-ccw/main.c
+++ b/pc-bios/s390-ccw/main.c
@@ -150,12 +150,11 @@ static void virtio_setup(void)
 
 IPL_assert(found, "No virtio device found");
 
+virtio_setup_device(blk_schid);
 if (virtio_get_device_type() == VIRTIO_ID_NET) {
 sclp_print("Network boot device detected\n");
 vdev->netboot_start_addr = iplb.ccw.netboot_start_addr;
 } else {
-virtio_setup_device(blk_schid);
-
 IPL_assert(virtio_ipl_disk_is_valid(), "No valid IPL device detected");
 }
 }
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH RFC v3 1/8] block: move ThrottleGroup membership to ThrottleGroupMember

2017-06-27 Thread Alberto Garcia

On Fri 23 Jun 2017 02:46:53 PM CEST, Manos Pitsidianakis wrote:
> This commit gathers ThrottleGroup membership details from
> BlockBackendPublic into ThrottleGroupMember and refactors existing code
> to use the structure.
>
> Signed-off-by: Manos Pitsidianakis 

Hey Manos, thanks for the patch. It looks good to me in general, I only
have a couple of comments:

>  /* If no IO are queued for scheduling on the next round robin token
> - * then decide the token is the current bs because chances are
> - * the current bs get the current request queued.
> + * then decide the token is the current tgm because chances are
> + * the current tgm get the current request queued.

I wonder if it's not better to use 'member' instead of 'tgm' in general.
My impression is that the former is clearer and not too long, but I
don't have a very strong opinion so you can keep it like this if you
want.

> -/* Check if an I/O request needs to be throttled, wait and set a timer
> - * if necessary, and schedule the next request using a round robin
> - * algorithm.
> +/* Check if an I/O request needs to be throttled, wait and set a timer if
> + * necessary, and schedule the next request using a round robin algorithm.
>   *

There's a few cases like this when you reformat a paragraph but don't
actually change the text. I think it just adds unnecessary noise to the
diff...

> --- a/include/qemu/throttle.h
> +++ b/include/qemu/throttle.h
> @@ -27,6 +27,7 @@
>  
>  #include "qemu-common.h"
>  #include "qemu/timer.h"
> +#include "qemu/coroutine.h"
>  
>  #define THROTTLE_VALUE_MAX 1000LL
>  
> @@ -153,4 +154,29 @@ bool throttle_schedule_timer(ThrottleState *ts,
>  
>  void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
>  
> +
> +/* The ThrottleGroupMember structure indicates membership in a ThrottleGroup
> + * and holds related data.
> + */
> +
> +typedef struct ThrottleGroupMember {
> +/* throttled_reqs_lock protects the CoQueues for throttled requests.  */
> +CoMutex  throttled_reqs_lock;
> +CoQueue  throttled_reqs[2];
> +
> +/* Nonzero if the I/O limits are currently being ignored; generally
> + * it is zero.  Accessed with atomic operations.
> + */
> +unsigned int io_limits_disabled;
> +
> +/* The following fields are protected by the ThrottleGroup lock.
> + * See the ThrottleGroup documentation for details.
> + * throttle_state tells us if I/O limits are configured. */
> +ThrottleState *throttle_state;
> +ThrottleTimers throttle_timers;
> +unsigned   pending_reqs[2];
> +QLIST_ENTRY(ThrottleGroupMember) round_robin;
> +
> +} ThrottleGroupMember;
> +

Any reason why you add this to throttle.h (which is generic throttling
code independent from the block layer) and not to throttle-groups.h?

Otherwise it all looks good to me, thanks!

Berto

Re: [Qemu-devel] [PATCH 0/3] hw/core: minor fixups

2017-06-27 Thread Philippe Mathieu-Daudé


Hi Eduardo,

On 06/23/2017 04:45 PM, Eduardo Habkost wrote:

Do you have a simple way to trigger the error paths addressed by patches 1/3
and 2/3?


For 1/3 "elf-loader: warn about invalid endianess":

$ wget -q 
https://people.debian.org/~aurel32/qemu/mips/vmlinux-3.2.0-4-4kc-malta


$ file vmlinux-3.2.0-4-4kc-malta
vmlinux-3.2.0-4-4kc-malta: ELF 32-bit MSB executable, MIPS, MIPS32 
version 1 (SYSV), statically linked, 
BuildID[sha1]=66b8748075269e8aedb91d363050f74af8a0ebdd, not stripped


$ qemu-system-mipsel -version
QEMU emulator version 2.8.1(Debian 1:2.8+dfsg-6)
Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers

$ qemu-system-mipsel -kernel vmlinux-3.2.0-4-4kc-malta
qemu: could not load kernel 'vmlinux-3.2.0-4-4kc-malta'

Once applied:

$ mipsel-softmmu/qemu-system-mipsel -kernel vmlinux-3.2.0-4-4kc-malta
vmlinux-3.2.0-4-4kc-malta: wrong endianess
qemu: could not load kernel 'vmlinux-3.2.0-4-4kc-malta'

It could be more verbose/nicer.

I'm doing some dual endianness tests and sometimes it happened I only 
notice I'm stupid enough to load the wrong elf once stepping in gdb...



For 2/3 "fix missing return value in load_image_targphys_as()" I 
extracted it from a WiP branch "unify-arm-mips-loaders" think that if I 
never finish it, at least this one can still be useful for others.
No commits in this branch since 4months so I don't really remember how 
it happens, but looking at rom_add_file() I see:


fprintf(stderr, "Could not open option rom '%s': %s\n",
rom->path, strerror(errno));
goto err;
...
fprintf(stderr, "rom: file %-20s: get size error: %s\n",
rom->name, strerror(errno));
goto err;
...
fprintf(stderr, "rom: file %-20s: read error: rc=%d (expected 
%zd)\n",

rom->name, rc, rom->datasize);
goto err;

So my guess is again I missed something in the command line I used (used 
unfinished bash auto-complete which lead to a directory? use zipped 
rom?) and QEMU was still booting without using the specified rom.


I do remember single stepping there at least 2 times before realize 
again how stupid I was :)


Regards,

Phil.

Re: [Qemu-devel] [RFC PATCH v9 00/26] translate: [tcg] Generic translation framework

2017-06-27 Thread Lluís Vilanova

Eric Blake writes:

> On 06/25/2017 03:43 AM, Lluís Vilanova wrote:
>> This series proposes a generic (target-agnostic) instruction translation
>> framework.
>> 
>> It basically provides a generic main loop for instruction disassembly, which
>> calls target-specific functions when necessary. This generalization makes
>> inserting new code in the main loop easier, and helps in keeping all targets 
>> in
>> synch as to the contents of it.
>> 
>> This series also paves the way towards adding events to trace guest code
>> execution (BBLs and instructions).
>> 
>> I've ported i386/x86-64 and arm/aarch64 as an example to see how it fits in 
>> the
>> current organization, but will port the rest when this series gets merged.
>> 
>> Signed-off-by: Lluís Vilanova 
>> ---
>> 
>> Changes in v9
>> =
>> 
>> * Further increase inter-mail sleep time during sending.
>> 
>> 
>> Changes in v8
>> =
>> 
>> * Increase inter-mail sleep time during sending (list keeps refusing some 
>> emails
>> due to an excessive send rate).

> It's more likely that your rejection message was from your SMTP
> connection than from the list (I've had to deal with my ISP's SMTP
> server prohibiting me from sending more than 10 patches in a minute;
> while using my company's SMTP server did not have that rate-limiting
> restriction).

> But yes, it would be neat if 'git send-email' had a knob to easily tweak
> things to avoid flooding beyond a picky SMTP server's rate limits.

Yup, it's my SMTP, not the list. Since I'm using "stg mail --git" (uses git
send-email underneath), I can set up an inter-mail wait time.


Cheers,
  Lluis

Re: [Qemu-devel] [PATCH RFC v3 1/8] block: move ThrottleGroup membership to ThrottleGroupMember

2017-06-27 Thread Manos Pitsidianakis


On Tue, Jun 27, 2017 at 02:08:54PM +0200, Alberto Garcia wrote:

On Fri 23 Jun 2017 02:46:53 PM CEST, Manos Pitsidianakis wrote:

This commit gathers ThrottleGroup membership details from
BlockBackendPublic into ThrottleGroupMember and refactors existing code
to use the structure.

Signed-off-by: Manos Pitsidianakis 


Hey Manos, thanks for the patch. It looks good to me in general, I only
have a couple of comments:


 /* If no IO are queued for scheduling on the next round robin token
- * then decide the token is the current bs because chances are
- * the current bs get the current request queued.
+ * then decide the token is the current tgm because chances are
+ * the current tgm get the current request queued.


I wonder if it's not better to use 'member' instead of 'tgm' in general.
My impression is that the former is clearer and not too long, but I
don't have a very strong opinion so you can keep it like this if you
want.


I will change it, no problem,


-/* Check if an I/O request needs to be throttled, wait and set a timer
- * if necessary, and schedule the next request using a round robin
- * algorithm.
+/* Check if an I/O request needs to be throttled, wait and set a timer if
+ * necessary, and schedule the next request using a round robin algorithm.
  *


There's a few cases like this when you reformat a paragraph but don't
actually change the text. I think it just adds unnecessary noise to the
diff...


Wiki says "It's OK to fix coding style issues in the immediate area (few 
lines) of the lines you're changing" so I left the reformats. Since they 
are noticed they must be too noisey.. I will either remove the changes 
or do a restyling patch later.



--- a/include/qemu/throttle.h
+++ b/include/qemu/throttle.h
@@ -27,6 +27,7 @@

 #include "qemu-common.h"
 #include "qemu/timer.h"
+#include "qemu/coroutine.h"

 #define THROTTLE_VALUE_MAX 1000LL

@@ -153,4 +154,29 @@ bool throttle_schedule_timer(ThrottleState *ts,

 void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);

+
+/* The ThrottleGroupMember structure indicates membership in a ThrottleGroup
+ * and holds related data.
+ */
+
+typedef struct ThrottleGroupMember {
+/* throttled_reqs_lock protects the CoQueues for throttled requests.  */
+CoMutex  throttled_reqs_lock;
+CoQueue  throttled_reqs[2];
+
+/* Nonzero if the I/O limits are currently being ignored; generally
+ * it is zero.  Accessed with atomic operations.
+ */
+unsigned int io_limits_disabled;
+
+/* The following fields are protected by the ThrottleGroup lock.
+ * See the ThrottleGroup documentation for details.
+ * throttle_state tells us if I/O limits are configured. */
+ThrottleState *throttle_state;
+ThrottleTimers throttle_timers;
+unsigned   pending_reqs[2];
+QLIST_ENTRY(ThrottleGroupMember) round_robin;
+
+} ThrottleGroupMember;
+


Any reason why you add this to throttle.h (which is generic throttling
code independent from the block layer) and not to throttle-groups.h?


I put it there because it's not directly ThrottleGroup-related, but 
you're right, if throttle.h is block layer free it should remain that 
way.


Otherwise it all looks good to me, thanks!

Berto



signature.asc
Description: PGP signature

[Qemu-devel] [PATCH 3/4] block/qcow2: add lzo compression algorithm

2017-06-27 Thread Peter Lieven

Signed-off-by: Peter Lieven 
---
 block/qcow2-cluster.c | 65 +++---
 block/qcow2.c | 72 ++-
 block/qcow2.h |  1 +
 configure |  2 +-
 qemu-img.texi |  1 +
 5 files changed, 95 insertions(+), 46 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3d341fd..ecb059b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -24,6 +24,9 @@
 
 #include "qemu/osdep.h"
 #include 
+#ifdef CONFIG_LZO
+#include 
+#endif
 
 #include "qapi/error.h"
 #include "qemu-common.h"
@@ -1521,30 +1524,49 @@ again:
 }
 
 static int decompress_buffer(uint8_t *out_buf, int out_buf_size,
- const uint8_t *buf, int buf_size)
+ const uint8_t *buf, int buf_size,
+ uint32_t compression_algorithm_id)
 {
 z_stream strm1, *strm = &strm1;
-int ret, out_len;
-
-memset(strm, 0, sizeof(*strm));
-
-strm->next_in = (uint8_t *)buf;
-strm->avail_in = buf_size;
-strm->next_out = out_buf;
-strm->avail_out = out_buf_size;
-
-ret = inflateInit2(strm, -12);
-if (ret != Z_OK)
-return -1;
-ret = inflate(strm, Z_FINISH);
-out_len = strm->next_out - out_buf;
-if ((ret != Z_STREAM_END && ret != Z_BUF_ERROR) ||
-out_len != out_buf_size) {
+int ret = 0, out_len;
+
+switch (compression_algorithm_id) {
+case QCOW2_COMPRESSION_ZLIB:
+memset(strm, 0, sizeof(*strm));
+
+strm->next_in = (uint8_t *)buf;
+strm->avail_in = buf_size;
+strm->next_out = out_buf;
+strm->avail_out = out_buf_size;
+
+ret = inflateInit2(strm, -12);
+if (ret != Z_OK) {
+return -1;
+}
+ret = inflate(strm, Z_FINISH);
+out_len = strm->next_out - out_buf;
+ret = -(ret != Z_STREAM_END);
 inflateEnd(strm);
-return -1;
+break;
+#ifdef CONFIG_LZO
+case QCOW2_COMPRESSION_LZO:
+out_len = out_buf_size;
+ret = lzo1x_decompress_safe(buf, buf_size, out_buf,
+(lzo_uint *) &out_len, NULL);
+if (ret == LZO_E_INPUT_NOT_CONSUMED) {
+/* We always read up to the next sector boundary. Thus
+ * buf_size may be larger than the original compressed size. */
+ret = 0;
+}
+break;
+#endif
+default:
+abort(); /* should never reach this point */
 }
-inflateEnd(strm);
-return 0;
+if (out_len != out_buf_size) {
+ret = -1;
+}
+return ret;
 }
 
 int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
@@ -1565,7 +1587,8 @@ int qcow2_decompress_cluster(BlockDriverState *bs, 
uint64_t cluster_offset)
 return ret;
 }
 if (decompress_buffer(s->cluster_cache, s->cluster_size,
-  s->cluster_data + sector_offset, csize) < 0) {
+  s->cluster_data + sector_offset, csize,
+  s->compression_algorithm_id) < 0) {
 return -EIO;
 }
 s->cluster_cache_offset = coffset;
diff --git a/block/qcow2.c b/block/qcow2.c
index c91eb1f..bd65582 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -26,6 +26,9 @@
 #include "sysemu/block-backend.h"
 #include "qemu/module.h"
 #include 
+#ifdef CONFIG_LZO
+#include 
+#endif
 #include "block/qcow2.h"
 #include "qemu/error-report.h"
 #include "qapi/qmp/qerror.h"
@@ -84,6 +87,10 @@ static uint32_t is_compression_algorithm_supported(char 
*algorithm)
 /* no algorithm means the old default of zlib compression
  * with 12 window bits */
 return QCOW2_COMPRESSION_ZLIB;
+#ifdef CONFIG_LZO
+} else if (!strcmp(algorithm, "lzo")) {
+return QCOW2_COMPRESSION_LZO;
+#endif
 }
 return 0;
 }
@@ -2715,8 +2722,8 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 QEMUIOVector hd_qiov;
 struct iovec iov;
 z_stream strm;
-int ret, out_len;
-uint8_t *buf, *out_buf, *local_buf = NULL;
+int ret, out_len = 0;
+uint8_t *buf, *out_buf = NULL, *local_buf = NULL, *work_buf = NULL;
 uint64_t cluster_offset;
 
 if (bytes == 0) {
@@ -2741,34 +2748,50 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 buf = qiov->iov[0].iov_base;
 }
 
-out_buf = g_malloc(s->cluster_size);
+switch (s->compression_algorithm_id) {
+case QCOW2_COMPRESSION_ZLIB:
+out_buf = g_malloc(s->cluster_size);
 
-/* best compression, small window, no zlib header */
-memset(&strm, 0, sizeof(strm));
-ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION,
-   Z_DEFLATED, -12,
-   9, Z_DEFAULT_STRATEGY);
-if (ret != 0) {
-ret = -EINVAL;
-goto fail;
-}
+/* best compression, small window, no zlib header */
+

[Qemu-devel] [PATCH 2/4] block/qcow2: optimize qcow2_co_pwritev_compressed

2017-06-27 Thread Peter Lieven

if we specify exactly one iov of s->cluster_size bytes we can avoid
the bounce buffer.

Signed-off-by: Peter Lieven 
---
 block/qcow2.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 893b145..c91eb1f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2716,7 +2716,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 struct iovec iov;
 z_stream strm;
 int ret, out_len;
-uint8_t *buf, *out_buf;
+uint8_t *buf, *out_buf, *local_buf = NULL;
 uint64_t cluster_offset;
 
 if (bytes == 0) {
@@ -2726,8 +2726,8 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 return bdrv_truncate(bs->file, cluster_offset, NULL);
 }
 
-buf = qemu_blockalign(bs, s->cluster_size);
-if (bytes != s->cluster_size) {
+if (bytes != s->cluster_size || qiov->niov != 1) {
+buf = local_buf = qemu_blockalign(bs, s->cluster_size);
 if (bytes > s->cluster_size ||
 offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
 {
@@ -2736,8 +2736,10 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 }
 /* Zero-pad last write if image size is not cluster aligned */
 memset(buf + bytes, 0, s->cluster_size - bytes);
+qemu_iovec_to_buf(qiov, 0, buf, bytes);
+} else {
+buf = qiov->iov[0].iov_base;
 }
-qemu_iovec_to_buf(qiov, 0, buf, bytes);
 
 out_buf = g_malloc(s->cluster_size);
 
@@ -2805,7 +2807,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 success:
 ret = 0;
 fail:
-qemu_vfree(buf);
+qemu_vfree(local_buf);
 g_free(out_buf);
 return ret;
 }
-- 
1.9.1

[Qemu-devel] [PATCH 0/4] block/qcow2: add compression_algorithm create option

2017-06-27 Thread Peter Lieven

this adds a create option for QCOW2 images to specify the compression algorithm
for compressed clusters. The series adds 3 algorithms to choose from:
zlib (default), zlib-fast and lzo. zlib-fast optimizes the zlib parameters
without the need for an additional compression library. lzo is choosen
as we already link against it and its widely available. Further
libraries like zstd are in preparation.

Compression time vs. size of an uncompressed Debian 9 QCOW2 image (size 1148MB):

zlib 35.7s 339MB
zlib-fast12.8s 348MB
lzo  4.2s  429MB

Peter Lieven (4):
  block/qcow2: add compression_algorithm create option
  block/qcow2: optimize qcow2_co_pwritev_compressed
  block/qcow2: add lzo compression algorithm
  block/qcow2: add zlib-fast compression algorithm

 block/qcow2-cluster.c |  66 ++--
 block/qcow2.c | 186 +-
 block/qcow2.h |  22 --
 configure |   2 +-
 docs/interop/qcow2.txt|   8 +-
 include/block/block_int.h |  35 -
 qemu-img.texi |  12 +++
 7 files changed, 251 insertions(+), 80 deletions(-)

-- 
1.9.1

Re: [Qemu-devel] BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-06-27 Thread Jean-Philippe Menil


On 06/27/2017 04:13 AM, Jason Wang wrote:



On 2017年06月26日 15:35, Jean-Philippe Menil wrote:

On 06/26/2017 04:50 AM, Jason Wang wrote:



On 2017年06月24日 06:32, Cong Wang wrote:
On Fri, Jun 23, 2017 at 1:43 AM, Jason Wang  
wrote:


On 2017年06月23日 02:53, Michael S. Tsirkin wrote:

On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:

Hi Michael,

from what i see, the race appear when we hit virtnet_reset in
virtnet_xdp_set.
virtnet_reset
_remove_vq_common
  virtnet_del_vqs
virtnet_free_queues
  kfree(vi->sq)
when the xdp program (with two instances of the program to 
trigger it

faster)
is added or removed.

It's easily repeatable, with 2 cpus and 4 queues on the qemu command
line,
running the xdp_ttl tool from Jesper.

For now, i'm able to continue my qualification, testing if xdp_qp 
is not

null,
but do not seem to be a sustainable trick.
if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)

Maybe it will be more clear to you with theses informations.

Best regards.

Jean-Philippe


I'm pretty clear about the issue here, I was trying to figure out 
a fix.

Jason, any thoughts?



Hi Jean:

Does the following fix this issue? (I can't reproduce it locally 
through

xdp_ttl)

It is tricky here.

 From my understanding of the code base, the tx_lock is not sufficient
here, because in virtnet_del_vqs() all vqs are deleted and one vp
maps to one txq.

I am afraid you have to add a spinlock somewhere to serialized
free_old_xmit_skbs() vs. vring_del_virtqueue(). As you can see
they are in different layers, so it is hard to figure out where to add
it...

Also, make sure we don't sleep inside the spinlock, I see a
synchronize_net().


Looks like I miss something. I thought free_old_xmit_skbs() were 
serialized in this case since we disable all tx queues after 
netif_tx_unlock_bh()?


Jean:

I thought this could be easily reproduced by e.g produce some traffic 
and in the same time try to attach an xdp program. But looks not. How 
do you trigger this? What's your qemu command line for this?


Thanks


Hi Jason,

this is how i trigger the bug:
- on the guest, tcpdump on on the interface
- on the guest, run iperf against the host
- on the guest, cat /sys/kernel/debug/tracing/trace_pipe
- on the guest, run one or two instances of xdp_ttl compiled with 
DEBUG uncommented, that i start stop, until i trigger the bug.


qemu command line is as follow:

qemu-system-x86_64 -name ubuntu --enable-kvm -machine pc,accel=kvm 
-smp 2 -drive file=/dev/LocalDisk/ubuntu,if=virtio,format=raw -m 2048 
-rtc base=localtime,clock=host -usbdevice tablet --balloon virtio 
-netdev 
tap,id=ubuntu-0,ifname=ubuntu-0,script=/home/jenfi/WORK/jp/qemu/if-up,downscript=/home/jenfi/WORK/jp/qemu/if-down,vhost=on,queues=4 
-device 
virtio-net-pci,netdev=ubuntu-0,mac=de:ad:be:ef:01:03,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=2 
-vnc 127.0.0.1:3 -nographic -serial 
file:/home/jenfi/WORK/jp/qemu/ubuntu.out -monitor 
unix:/home/jenfi/WORK/jp/qemu/ubuntu.sock,server,nowait


Notice, the smp 2, queues to 4 and vectors to 2.
Seem that if fogot to mention that in the beginning of this thread, 
sorry for that.


Best regards.

Jean-Philippe



Thanks Jean, I manage to reproduce the issue.

I thought netif_tx_unlock_bh() will do tx lock but looks not, that's why 
previous patch doesn't work.


Could you please this this patch? (At least it can't trigger the warning 
after more than 20 times of xdp start/stop).


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1f8c15c..a18f859 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1802,6 +1802,7 @@ static void virtnet_freeze_down(struct 
virtio_device *vdev)

 flush_work(&vi->config_work);

 netif_device_detach(vi->dev);
+   netif_tx_disable(vi->dev);
 cancel_delayed_work_sync(&vi->refill);

 if (netif_running(vi->dev)) {




Hi Jason,

Seem to do the trick !
with your patch, i'm unable to repeat the problem anymore (running more 
than 2h without any issue).


Best regards.

Jean-Philippe

[Qemu-devel] [PATCH 1/4] block/qcow2: add compression_algorithm create option

2017-06-27 Thread Peter Lieven

this patch adds a new compression_algorithm option when creating qcow2 images.
The current default for the compresison algorithm is zlib and zlib will be
used when this option is omitted (like before).

If the option is specified e.g. with:

 qemu-img create -f qcow2 -o compression_algorithm=zlib image.qcow2 1G

then a new compression algorithm header extension is added and an incompatible
feature bit is set. This means that if the header is present it must be parsed
by Qemu on qcow2_open and it must be validated if the specified compression
algorithm is supported by the current build of Qemu.

This means if the compression_algorithm option is specified Qemu prior to this
commit will not be able to open the created image.

Signed-off-by: Peter Lieven 
---
 block/qcow2.c | 93 ---
 block/qcow2.h | 20 +++---
 docs/interop/qcow2.txt|  8 +++-
 include/block/block_int.h | 35 +-
 qemu-img.texi | 10 +
 5 files changed, 138 insertions(+), 28 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2f94f03..893b145 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -60,9 +60,11 @@ typedef struct {
 uint32_t len;
 } QEMU_PACKED QCowExtension;
 
-#define  QCOW2_EXT_MAGIC_END 0
-#define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
-#define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
+#define QCOW2_EXT_MAGIC_END   0
+#define QCOW2_EXT_MAGIC_BACKING_FORMAT0xE2792ACA
+#define QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
+#define QCOW2_EXT_MAGIC_COMPRESSION_ALGORITHM 0xc0318300
+/* 0xc03183xx reserved for further use of compression algorithm parameters */
 
 static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
@@ -76,6 +78,15 @@ static int qcow2_probe(const uint8_t *buf, int buf_size, 
const char *filename)
 return 0;
 }
 
+static uint32_t is_compression_algorithm_supported(char *algorithm)
+{
+if (!algorithm[0] || !strcmp(algorithm, "zlib")) {
+/* no algorithm means the old default of zlib compression
+ * with 12 window bits */
+return QCOW2_COMPRESSION_ZLIB;
+}
+return 0;
+}
 
 /* 
  * read qcow2 extension and fill bs
@@ -148,6 +159,34 @@ static int qcow2_read_extensions(BlockDriverState *bs, 
uint64_t start_offset,
 #endif
 break;
 
+case QCOW2_EXT_MAGIC_COMPRESSION_ALGORITHM:
+if (ext.len >= sizeof(s->compression_algorithm)) {
+error_setg(errp, "ERROR: ext_compression_algorithm: len=%"
+   PRIu32 " too large (>=%zu)", ext.len,
+   sizeof(s->compression_algorithm));
+return 2;
+}
+ret = bdrv_pread(bs->file, offset, s->compression_algorithm,
+ ext.len);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "ERROR: 
ext_compression_algorithm:"
+ " Could not read algorithm name");
+return 3;
+}
+s->compression_algorithm[ext.len] = '\0';
+s->compression_algorithm_id =
+is_compression_algorithm_supported(s->compression_algorithm);
+if (!s->compression_algorithm_id) {
+error_setg(errp, "ERROR: compression algorithm '%s' is "
+   " unsupported", s->compression_algorithm);
+return 4;
+}
+#ifdef DEBUG_EXT
+printf("Qcow2: Got compression algorithm %s\n",
+   s->compression_algorithm);
+#endif
+break;
+
 case QCOW2_EXT_MAGIC_FEATURE_TABLE:
 if (p_feature_table != NULL) {
 void* feature_table = g_malloc0(ext.len + 2 * 
sizeof(Qcow2Feature));
@@ -1104,6 +1143,7 @@ static int qcow2_do_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 s->cluster_cache_offset = -1;
 s->flags = flags;
+s->compression_algorithm_id = QCOW2_COMPRESSION_ZLIB;
 
 ret = qcow2_refcount_init(bs);
 if (ret != 0) {
@@ -1981,6 +2021,21 @@ int qcow2_update_header(BlockDriverState *bs)
 buflen -= ret;
 }
 
+/* Compression Algorithm header extension */
+if (s->compression_algorithm[0]) {
+ret = header_ext_add(buf, QCOW2_EXT_MAGIC_COMPRESSION_ALGORITHM,
+ s->compression_algorithm,
+ strlen(s->compression_algorithm),
+ buflen);
+if (ret < 0) {
+goto fail;
+}
+buf += ret;
+buflen -= ret;
+header->incompatible_features |=
+cpu_to_be64(QCOW2_INCOMPAT_COMPRESSION);
+}
+
 /* Feature table */
 if (s->qcow_version >= 3) {
 Qcow2Feature features[] = {
@@ -1995,6 +2050,11 @@ int qcow2_update_header(BlockDriverState *bs)
 .name = "corrupt bit",
 },
 {
+.type = QC

[Qemu-devel] [PATCH 4/4] block/qcow2: add zlib-fast compression algorithm

2017-06-27 Thread Peter Lieven

this adds support for optimized zlib settings which almost
tripples the compression speed while maintaining about
the same compressed size.

Signed-off-by: Peter Lieven 
---
 block/qcow2-cluster.c |  3 ++-
 block/qcow2.c | 11 +--
 block/qcow2.h |  1 +
 qemu-img.texi |  1 +
 4 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ecb059b..d8e2378 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1532,6 +1532,7 @@ static int decompress_buffer(uint8_t *out_buf, int 
out_buf_size,
 
 switch (compression_algorithm_id) {
 case QCOW2_COMPRESSION_ZLIB:
+case QCOW2_COMPRESSION_ZLIB_FAST:
 memset(strm, 0, sizeof(*strm));
 
 strm->next_in = (uint8_t *)buf;
@@ -1539,7 +1540,7 @@ static int decompress_buffer(uint8_t *out_buf, int 
out_buf_size,
 strm->next_out = out_buf;
 strm->avail_out = out_buf_size;
 
-ret = inflateInit2(strm, -12);
+ret = inflateInit2(strm, -15);
 if (ret != Z_OK) {
 return -1;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index bd65582..f07d8f0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -87,6 +87,8 @@ static uint32_t is_compression_algorithm_supported(char 
*algorithm)
 /* no algorithm means the old default of zlib compression
  * with 12 window bits */
 return QCOW2_COMPRESSION_ZLIB;
+} else if (!strcmp(algorithm, "zlib-fast")) {
+return QCOW2_COMPRESSION_ZLIB_FAST;
 #ifdef CONFIG_LZO
 } else if (!strcmp(algorithm, "lzo")) {
 return QCOW2_COMPRESSION_LZO;
@@ -2722,6 +2724,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 QEMUIOVector hd_qiov;
 struct iovec iov;
 z_stream strm;
+int z_level = Z_DEFAULT_COMPRESSION, z_windowBits = -12;
 int ret, out_len = 0;
 uint8_t *buf, *out_buf = NULL, *local_buf = NULL, *work_buf = NULL;
 uint64_t cluster_offset;
@@ -2749,13 +2752,17 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 }
 
 switch (s->compression_algorithm_id) {
+case QCOW2_COMPRESSION_ZLIB_FAST:
+z_level = Z_BEST_SPEED;
+z_windowBits = -15;
+/* fall-through */
 case QCOW2_COMPRESSION_ZLIB:
 out_buf = g_malloc(s->cluster_size);
 
 /* best compression, small window, no zlib header */
 memset(&strm, 0, sizeof(strm));
-ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION,
-   Z_DEFLATED, -12,
+ret = deflateInit2(&strm, z_level,
+   Z_DEFLATED, z_windowBits,
9, Z_DEFAULT_STRATEGY);
 if (ret != 0) {
 ret = -EINVAL;
diff --git a/block/qcow2.h b/block/qcow2.h
index 716012c..a89f986 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -173,6 +173,7 @@ typedef struct Qcow2UnknownHeaderExtension {
 enum {
 QCOW2_COMPRESSION_ZLIB  = 0xC0318301,
 QCOW2_COMPRESSION_LZO   = 0xC0318302,
+QCOW2_COMPRESSION_ZLIB_FAST = 0xC0318303,
 };
 
 enum {
diff --git a/qemu-img.texi b/qemu-img.texi
index 043c1ba..83a5db2 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -627,6 +627,7 @@ The following options are available if support for the 
respective libraries
 has been enabled at compile time:
 
zlibUses standard zlib compression (default)
+   zlib-fast   Uses zlib compression with optimized compression parameters
lzo Uses LZO1X compression
 
 The compression algorithm can only be defined at image create time and cannot
-- 
1.9.1

Re: [Qemu-devel] [PATCH] ARM: KVM: Enable in-kernel timers with user space gic

2017-06-27 Thread Andrew Jones

On Mon, Jun 26, 2017 at 11:30:56PM +0200, Alexander Graf wrote:
> When running with KVM enabled, you can choose between emulating the
> gic in kernel or user space. If the kernel supports in-kernel virtualization
> of the interrupt controller, it will default to that. If not, if will
> default to user space emulation.
> 
> Unfortunately when running in user mode gic emulation, we miss out on
> interrupt events which are only available from kernel space, such as the 
> timer.
> This patch leverages the new kernel/user space pending line synchronization 
> for
> timer events. It does not handle PMU events yet.
> 
> Signed-off-by: Alexander Graf 
> ---
>  accel/kvm/kvm-all.c|  5 +
>  accel/stubs/kvm-stub.c |  5 +
>  hw/intc/arm_gic.c  |  7 +++
>  include/sysemu/kvm.h   | 11 +++
>  target/arm/cpu.h   |  3 +++
>  target/arm/kvm.c   | 49 +
>  6 files changed, 80 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 75feffa..ade32ea 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2285,6 +2285,11 @@ int kvm_has_intx_set_mask(void)
>  return kvm_state->intx_set_mask;
>  }
>  
> +bool kvm_arm_supports_user_irq(void)
> +{
> +return kvm_check_extension(kvm_state, KVM_CAP_ARM_USER_IRQ);
> +}
> +
>  #ifdef KVM_CAP_SET_GUEST_DEBUG
>  struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *cpu,
>   target_ulong pc)
> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
> index ef0c734..3965c52 100644
> --- a/accel/stubs/kvm-stub.c
> +++ b/accel/stubs/kvm-stub.c
> @@ -155,4 +155,9 @@ void kvm_init_cpu_signals(CPUState *cpu)
>  {
>  abort();
>  }
> +
> +bool kvm_arm_supports_user_irq(void)
> +{
> +return false;
> +}
>  #endif
> diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
> index b305d90..4ffd905 100644
> --- a/hw/intc/arm_gic.c
> +++ b/hw/intc/arm_gic.c
> @@ -25,6 +25,7 @@
>  #include "qom/cpu.h"
>  #include "qemu/log.h"
>  #include "trace.h"
> +#include "sysemu/kvm.h"
>  
>  /* #define DEBUG_GIC */
>  
> @@ -1412,6 +1413,12 @@ static void arm_gic_realize(DeviceState *dev, Error 
> **errp)
>  return;
>  }
>  
> +if (kvm_enabled() && !kvm_arm_supports_user_irq()) {
> +error_setg(errp, "KVM with user space irqchip only works when 
> the "
> + "host kernel supports KVM_CAP_ARM_USER_IRQ");
> +return;

extra spaces here

> +}
> +
>  /* This creates distributor and main CPU interface (s->cpuiomem[0]) */
>  gic_init_irqs_and_mmio(s, gic_set_irq, gic_ops);
>  
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 1e91613..9f11fc0 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -227,6 +227,17 @@ int kvm_init_vcpu(CPUState *cpu);
>  int kvm_cpu_exec(CPUState *cpu);
>  int kvm_destroy_vcpu(CPUState *cpu);
>  
> +/**
> + * kvm_arm_supports_user_irq
> + *
> + * Not all KVM implementations support notifications for kernel generated
> + * interrupt events to user space. This function indicates whether the 
> current
> + * KVM implementation does support them.
> + *
> + * Returns: true if KVM supports using kernel generated IRQs from user space
> + */
> +bool kvm_arm_supports_user_irq(void);
> +
>  #ifdef NEED_CPU_H
>  #include "cpu.h"
>  
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 16a1e59..88ec3ee 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -706,6 +706,9 @@ struct ARMCPU {
>  void *el_change_hook_opaque;
>  
>  int32_t node_id; /* NUMA node this CPU belongs to */
> +
> +/* Used to synchronize KVM and QEMU timer levels */

Eventually it won't only be for timer levels

> +uint8_t device_irq_level;
>  };
>  
>  static inline ARMCPU *arm_env_get_cpu(CPUARMState *env)
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 4555468..29ee72d 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -174,6 +174,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>   */
>  kvm_async_interrupts_allowed = true;
>  
> +/*
> + * PSCI wakes up secondary cores, so we always need to
> + * have vCPUs waiting in kernel space
> + */
> +kvm_halt_in_kernel_allowed = true;
> +
>  cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
>  
>  type_register_static(&host_arm_cpu_type_info);
> @@ -528,6 +534,49 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
>  
>  MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
>  {
> +ARMCPU *cpu;
> +uint32_t switched_level;
> +
> +if (kvm_irqchip_in_kernel()) {
> +/*
> + * We only need to sync timer states with user-space interrupt
> + * controllers, so return early and save cycles if we don't.
> + */
> +return MEMTXATTRS_UNSPECIFIED;
> +}
> +
> +cpu = ARM_CPU(cs);
> +
> +/* Synchronize our internal timer irq lines

Re: [Qemu-devel] [PATCH v2] main_loop: Make main_loop_wait() return void

2017-06-27 Thread Stefan Hajnoczi

On Mon, Jun 26, 2017 at 03:28:00PM +0100, Peter Maydell wrote:
> In commit e330c118f2a5a the last usage of main_loop_wait() that cared
> about the return value was changed to no longer use it. Drop the
> now-useless return value and make the function return void.
> 
> We avoid the awkwardness of ifdeffery to handle the 'ret'
> variable in main_loop_wait() only being wanted if CONFIG_SLIRP
> by simply dropping all the ifdefs. There are stub implementations
> of slirp_pollfds_poll() and slirp_pollfds_fill() already in
> stubs/slirp.c which do nothing, as required.
> 
> Signed-off-by: Peter Maydell 
> ---
> This will coincidentally satisfy Coverity, which currently complains
> in CID 1372464 that we call main_loop_wait() in vl.c and ignore the
> return value which may be reporting a poll() syscall failure.
> Essentially we don't expect poll() to fail, except perhaps with
> a transient EINTR -- if it ever did we'd spin retrying endlessly
> I think.
> ---
>  include/qemu/main-loop.h | 2 +-
>  util/main-loop.c | 8 ++--
>  2 files changed, 3 insertions(+), 7 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH RFC v3 2/8] block: Add aio_context field in ThrottleGroupMember

2017-06-27 Thread Alberto Garcia

On Fri 23 Jun 2017 02:46:54 PM CEST, Manos Pitsidianakis wrote:
> timer_cb() needs to know about the current Aio context of the throttle
> request that is woken up. In order to make ThrottleGroupMember backend
> agnostic, this information is stored in an aio_context field instead of
> accessing it from BlockBackend.
>
> Signed-off-by: Manos Pitsidianakis 

I agree with Stefan's comments, otherwise the patch looks good to me.

Berto

Re: [Qemu-devel] [PATCH RFC v2 2/2] ARM: KVM: Enable in-kernel timers with user space gic

2017-06-27 Thread Andrew Jones

On Mon, Jun 26, 2017 at 11:32:10PM +0200, Alexander Graf wrote:
> On 06/26/2017 05:03 PM, Andrew Jones wrote:
> > On Tue, Dec 13, 2016 at 01:20:50PM +, Peter Maydell wrote:
> > > On 14 November 2016 at 14:32, Alexander Graf  wrote:
> > > > When running with KVM enabled, you can choose between emulating the
> > > > gic in kernel or user space. If the kernel supports in-kernel 
> > > > virtualization
> > > > of the interrupt controller, it will default to that. If not, if will
> > > > default to user space emulation.
> > > > 
> > > > Unfortunately when running in user mode gic emulation, we miss out on
> > > > timer events which are only available from kernel space. This patch 
> > > > leverages
> > > > the new kernel/user space pending line synchronization for those timer 
> > > > events.
> > > > 
> > > > Signed-off-by: Alexander Graf 
> > > Reviewed-by: Peter Maydell 
> > > 
> > Hi everyone,
> > 
> > I probably missed a refresh of this patch, but as I didn't see anything,
> > I picked this one up today in order to test the KVM support recently
> > merged. Tweaking this patch a bit to fit the new ABI allowed me to
> > instantiate a KVM guest without the in-kernel irqchip (tested on a
> > mustang). So, FWIW, this is report of a successful test. Is there a
> > refreshed version of this patch someone can point me to, which I should
> > test instead?
> 
> Sorry, this did fall the cracks way too many times now. I've sent a respin
> that hopefully is slightly more future proof than this RFC :)
> 
> If your tests passed with this patch, please extend them to also cover SMP
> support, as that was broken with this RFC.

Indeed. I retested with the old version and now see that secondaries were
not booting, but they are with the new version.

Thanks,
drew

Re: [Qemu-devel] [Qemu-block] [PATCH RFC v3 3/8] block: add throttle block filter driver

2017-06-27 Thread Stefan Hajnoczi

On Mon, Jun 26, 2017 at 07:01:18PM +0300, Manos Pitsidianakis wrote:
> On Mon, Jun 26, 2017 at 03:30:55PM +0100, Stefan Hajnoczi wrote:
> > > +static BlockDriver bdrv_throttle = {
> > > +.format_name=   "throttle",
> > > +.protocol_name  =   "throttle",
> > > +.instance_size  =   sizeof(ThrottleGroupMember),
> > > +
> > > +.bdrv_file_open =   throttle_open,
> > > +.bdrv_close =   throttle_close,
> > > +.bdrv_co_flush  =   throttle_co_flush,
> > > +
> > > +.bdrv_child_perm=   bdrv_filter_default_perms,
> > > +
> > > +.bdrv_getlength =   throttle_getlength,
> > > +
> > > +.bdrv_co_preadv =   throttle_co_preadv,
> > > +.bdrv_co_pwritev=   throttle_co_pwritev,
> > > +
> > > +.bdrv_co_pwrite_zeroes  =   throttle_co_pwrite_zeroes,
> > > +.bdrv_co_pdiscard   =   throttle_co_pdiscard,
> > > +
> > > +.bdrv_recurse_is_first_non_filter   =   
> > > bdrv_recurse_is_first_non_filter,
> > > +
> > > +.bdrv_attach_aio_context=   throttle_attach_aio_context,
> > > +.bdrv_detach_aio_context=   throttle_detach_aio_context,
> > > +
> > > +.bdrv_reopen_prepare=   throttle_reopen_prepare,
> > > +.bdrv_reopen_commit =   throttle_reopen_commit,
> > > +.bdrv_reopen_abort  =   throttle_reopen_abort,
> > > +
> > > +.is_filter  =   true,
> > > +};
> > 
> > Missing:
> > bdrv_co_get_block_status()
> > bdrv_truncate()
> > bdrv_get_info()
> > bdrv_probe_blocksizes()
> > bdrv_probe_geometry()
> > bdrv_media_changed()
> > bdrv_eject()
> > bdrv_lock_medium()
> > bdrv_co_ioctl()
> > 
> > See block/raw-format.c.
> > 
> > I think most of these could be modified in block.c or block/io.c to
> > automatically call bs->file's function if drv doesn't implement them.
> > This way all block drivers would transparently pass them through by
> > default and block/raw-format.c code could be eliminated.
> 
> Are these truly necessary? Because other filter drivers (ie quorum,
> blkverify) don't implement them.

Both quorum and blkverify are rarely used.  This explains why the issue
hasn't been found yet.

These are the callbacks I identified which do not automatically forward
to bs->file.  Therefore the throttle driver will break these features
when bs->file supports them.

That's why I suggest forwarding to bs->file in block.c.  Then individual
drivers do not have to implement these callbacks just to forward to
bs->file.  And if the driver wishes to prohibit a feature, it can
implement the callback and return -ENOTSUP.

You can send this fix as a separate patch series, independent of the
throttle driver.  Once it has been merged the throttle driver will gain
support for these features.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-block] [PATCH RFC v3 3/8] block: add throttle block filter driver

2017-06-27 Thread Stefan Hajnoczi

On Mon, Jun 26, 2017 at 07:26:41PM +0300, Manos Pitsidianakis wrote:
> On Mon, Jun 26, 2017 at 03:30:55PM +0100, Stefan Hajnoczi wrote:
> > > +bs->file = bdrv_open_child(NULL, options, "file",
> > > +bs, &child_file, false, &local_err);
> > > +
> > > +if (local_err) {
> > > +error_propagate(errp, local_err);
> > > +return -EINVAL;
> > > +}
> > > +
> > > +qdict_flatten(options);
> > > +return throttle_configure_tgm(bs, tgm, options, errp);
> > 
> > Who destroys bs->file on error?
> 
> It is reaped by bdrv_open_inherit() on failure, if I'm not mistaken.
> That's how other drivers handle this as well. Some (eg block/qcow2.c)
> check if bs->file is NULL instead of the error pointer they pass, so
> this is not not very consistent.

Maybe I'm missing it but I don't see relevant bs->file cleanup in
bdrv_open_inherit() or bdrv_open_common().

Please post the exact line where it happens.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 1/4] block/qcow2: add compression_algorithm create option

2017-06-27 Thread Eric Blake

On 06/27/2017 07:34 AM, Peter Lieven wrote:
> this patch adds a new compression_algorithm option when creating qcow2 images.
> The current default for the compresison algorithm is zlib and zlib will be

s/compresison/compression/

> used when this option is omitted (like before).
> 
> If the option is specified e.g. with:
> 
>  qemu-img create -f qcow2 -o compression_algorithm=zlib image.qcow2 1G
> 
> then a new compression algorithm header extension is added and an incompatible
> feature bit is set. This means that if the header is present it must be parsed
> by Qemu on qcow2_open and it must be validated if the specified compression
> algorithm is supported by the current build of Qemu.
> 
> This means if the compression_algorithm option is specified Qemu prior to this
> commit will not be able to open the created image.
> 
> Signed-off-by: Peter Lieven 
> ---
>  block/qcow2.c | 93 
> ---
>  block/qcow2.h | 20 +++---
>  docs/interop/qcow2.txt|  8 +++-

Focusing on just the spec change first:

> +++ b/docs/interop/qcow2.txt
> @@ -85,7 +85,11 @@ in the description of a field.
>  be written to (unless for regaining
>  consistency).
>  
> -Bits 2-63:  Reserved (set to 0)
> +Bit 2:  Compress algorithm bit.  If this bit is set 
> then
> +the compress algorithm extension must be 
> parsed
> +and checked for compatiblity.

s/compatiblity/compatibility/

> +
> +Bits 3-63:  Reserved (set to 0)
>  
>   80 -  87:  compatible_features
>  Bitmask of compatible features. An implementation can
> @@ -135,6 +139,8 @@ be stored. Each extension has a structure like the 
> following:
>  0xE2792ACA - Backing file format name
>  0x6803f857 - Feature name table
>  0x23852875 - Bitmaps extension
> +0xC0318300 - Compression Algorithm
> +0xC03183xx - Reserved for compression algorithm 
> params

s/params/parameters/

You have now introduced 256 different reserved headers, without
documenting any of their formats.  You absolutely MUST include a
documentation of how the new 0xC0318300 header is laid out (see, for
example, our section on "Bitmaps extension"), along with text mentioning
that the new header MUST be present if incompatible-feature bit is set
and MUST be absent otherwise.  But I also think that with a bit of
proper design work, you only need ONE header for all possible algorithm
parameters, rather than burning an additional 255 unspecified
reservations.  That is, make sure your new header includes a common
prefix including a length field and the algorightm in use, and then the
length covers a variable-length suffix that can be parsed in a
per-algorithm-specific manner for whatever additional parameters are
needed for that algorithm.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 16/19] block: protect modification of dirty bitmaps with a mutex

2017-06-27 Thread Paolo Bonzini



On 27/06/2017 11:47, Vladimir Sementsov-Ogievskiy wrote:
> bdrv_enable_dirty_bitmap - writes to the list, it changes 'disabled'
> field. So it requires both BQL and dirty_bitmap_mutex? But the comment
> says only about BQL.

This one is interesting.  There could be a concurrent call to
bdrv_set_dirty, indeed.  I'll send a patch.

> Also, for example, if I want to create a new bitmap and than somehow
> change it, should I do it like this:
> 
> bdrv_create_dirty_bitmap(...)
> 
> bdrv_dirty_bitmaps_lock(bs)
> 
> bitmap  = bdrv_find_dirty_bitmap(bs, name)
> 
> 
> 
> bdrv_dirty_bitmaps_unlock(bs)
> 
> - because we can't now trust the pointer returned by
> bdrv_create_dirty_bitmap, as it releases bitmap lock before return.

If you have the big QEMU lock (you do if you are in the QEMU monitor),
you are protected from changes to the list of bitmaps.

Paolo

Re: [Qemu-devel] [PATCH 4/4] block/qcow2: add zlib-fast compression algorithm

2017-06-27 Thread Eric Blake

On 06/27/2017 07:34 AM, Peter Lieven wrote:
> this adds support for optimized zlib settings which almost

Start sentences with a capital.

> tripples the compression speed while maintaining about

s/tripples/triples/

> the same compressed size.
> 
> Signed-off-by: Peter Lieven 
> ---
>  block/qcow2-cluster.c |  3 ++-
>  block/qcow2.c | 11 +--
>  block/qcow2.h |  1 +
>  qemu-img.texi |  1 +
>  4 files changed, 13 insertions(+), 3 deletions(-)
> 

> +++ b/block/qcow2.h
> @@ -173,6 +173,7 @@ typedef struct Qcow2UnknownHeaderExtension {
>  enum {
>  QCOW2_COMPRESSION_ZLIB  = 0xC0318301,
>  QCOW2_COMPRESSION_LZO   = 0xC0318302,
> +QCOW2_COMPRESSION_ZLIB_FAST = 0xC0318303,

Back to my comments on 1/4 - we MUST first get the qcow2 specification
right, rather than adding undocumented headers in the code.  And I still
think you only need one variable-length header extension for covering
all possible algorithms, rather than one header per algorithm.  Let's
get the spec right first, before worrying about the code implementing
the spec.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH RFC v3 4/8] block: convert ThrottleGroup to object with QOM

2017-06-27 Thread Stefan Hajnoczi

On Mon, Jun 26, 2017 at 06:24:09PM +0300, Manos Pitsidianakis wrote:
> On Mon, Jun 26, 2017 at 03:52:34PM +0100, Stefan Hajnoczi wrote:
> > > +static void throttle_group_set(Object *obj, Visitor *v, const char * 
> > > name,
> > > +void *opaque, Error **errp)
> > > +
> > > +{
> > > +ThrottleGroup *tg = THROTTLE_GROUP(obj);
> > > +ThrottleConfig cfg = tg->ts.cfg;
> > > +Error *local_err = NULL;
> > > +ThrottleParamInfo *info = opaque;
> > > +int64_t value;
> > 
> > What happens if this property is set while QEMU is already running?
> 
> I assume you mean setting a property while a group has active members and
> requests? My best answer would be "don't do that". I couldn't figure a way
> to do this cleanly. Individual limit changes may render a ThrottleConfig
> invalid, so it should not be allowed. ThrottleGroups and throttle nodes
> should be destroyed and recreated to change limits with this version, but in
> the next it will be done through block_set_io_throttle() which is the
> existing way to change limits and check for validity. This was discussed in
> the proposal about the new syntax we had on the list.

Please ask on IRC or the mailing list if you have questions.

If you are aware of a limitation and don't know the answer then
submitting the code without any comment or question is dangerous.
Reviewers may miss the problem :) and broken code gets merged.

UserCreatableClass has a ->complete() callback.  You can use this to
perform final initialization and then "freeze" the properties by setting
a bool flag.  The property accessor functions can do:

  if (tg->init_complete) {
  error_setg(errp, "Property modification is not allowed at run-time");
  return;
  }

Something like this is used by
backends/hostmem.c:host_memory_backend_set_size(), for example.

> > 
> > > +goto out;
> > > +}
> > 
> > This doesn't validate inputs properly for non int64_t types.
> > 
> > I'm also worried that the command-line bps=,iops=,... options do not
> > have unsigned or double types.  Input ranges and validation should match
> > the QEMU command-line (I know this is a bit of a pain with QOM since the
> > property types are different from QEMU option types).
> 
> I don't know what should be done about this, to be honest, except for
> manually checking the limits for each datatype in the QOM setters.

I believe all throttling parameter types are int64_t in QemuOpts.  If we
want to be compatible with the command-line parameters then they should
also be int64_t here instead of unsigned int or double.

This approach makes sense from the QMP user perspective.  QMP clients
shouldn't have to deal with different data types depending on which
throttling API they use.  Let's keep it consistent - there's no real
drawback to using int64_t.

signature.asc
Description: PGP signature

1 2 3 4 >

1 - 100 of 320 matches

Mail list logo