Re: [Qemu-devel] [PATCH 04/17] qapi: merge QInt and QFloat in QNum

2017-05-15 Thread Markus Armbruster
Luiz Capitulino  writes:

> On Fri, 12 May 2017 08:30:36 +0200
> Markus Armbruster  wrote:
>
>> Question for Luiz...
>> 
>> Marc-André Lureau  writes:
>> 
>> [...]
>> > diff --git a/tests/check-qnum.c b/tests/check-qnum.c
>> > new file mode 100644
>> > index 00..d08d35e85a
>> > --- /dev/null
>> > +++ b/tests/check-qnum.c
>> > @@ -0,0 +1,131 @@
>> > +/*
>> > + * QNum unit-tests.
>> > + *
>> > + * Copyright (C) 2009 Red Hat Inc.
>> > + *
>> > + * Authors:
>> > + *  Luiz Capitulino 
>> > + *
>> > + * This work is licensed under the terms of the GNU LGPL, version 2.1 or 
>> > later.
>> > + * See the COPYING.LIB file in the top-level directory.
>> > + */
>> > +#include "qemu/osdep.h"
>> > +
>> > +#include "qapi/qmp/qnum.h"
>> > +#include "qapi/error.h"
>> > +#include "qemu-common.h"
>> > +
>> > +/*
>> > + * Public Interface test-cases
>> > + *
>> > + * (with some violations to access 'private' data)
>> > + */
>> > +
>> > +static void qnum_from_int_test(void)
>> > +{
>> > +QNum *qi;
>> > +const int value = -42;
>> > +
>> > +qi = qnum_from_int(value);
>> > +g_assert(qi != NULL);
>> > +g_assert_cmpint(qi->u.i64, ==, value);
>> > +g_assert_cmpint(qi->base.refcnt, ==, 1);
>> > +g_assert_cmpint(qobject_type(QOBJECT(qi)), ==, QTYPE_QNUM);
>> > +
>> > +// destroy doesn't exit yet
>> > +g_free(qi);
>> > +}  
>> 
>> The comment is enigmatic. 
>
> It was meant for future generations to figure it out :)

Hah!

>> It was first written in commit 33837ba
>> "Introduce QInt unit-tests", and got copied around since.  In
>> check-qlist.c, it's spelled "exist yet".
>
> Yes, "exit" is a typo it should be "exist".
>
>> What is "destroy", why doesn't it exit / exist now, but will exit /
>> exist later?  It can't be qnum_destroy_obj(), because that certainly
>> exists already, exits already in the sense of returning, and shouldn't
>> ever exit in the sense of terminating the program.
>> 
>> The comment applies to a g_free().  Why do we free directly instead
>> decrementing the reference count?  Perhaps the comment tries to explain
>> that (if it does, it fails).
>
> In my personal style of writing unit-tests, I never use a method
> in a test before testing it. So, as QDECREF() wasn't tested yet,
> I wasn't allowed to use it.

It's a good principle for organizing tests.

> While I keep this principle when writing unit-tests today, this
> particular case is very extreme and not useful at all. Today I'd just
> go ahead and use QDECREF().

Makes sense.

> The qint_destroy_test() in the original
> commit is also very bogus, it's not really doing an useful test.

It can demonstrate leaks under valgrind.  But pretty much every other
test can just as well, so...

Marc-André, care to stick a cleanup patch into your series?



Re: [Qemu-devel] [RFC PATCH v2] coccinelle: add a script to optimize tcg op using tcg_gen_extract()

2017-05-15 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> Hi Markus,
>
> On 05/11/2017 06:03 AM, Markus Armbruster wrote:
>> Philippe Mathieu-Daudé  writes:
>>
>>> Ok I just understood Richard explanation, so this patch is WRONG and I
>>> need to get some real rest :(
>>
>> Ha!  Get some sleep; we'll still be around in the morning ;)
>>
>>> On 05/10/2017 08:52 PM, Philippe Mathieu-Daudé wrote:
 Apply this script using:

 $ docker run -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle \
 --sp-file scripts/coccinelle/tcg_gen_extract.cocci \
 --macro-file scripts/cocci-macro-file.h \
 --dir target \
 --in-place

 Signed-off-by: Philippe Mathieu-Daudé 
 ---

 This is a new version of the coccinelle script addressing Richard comments 
 and
 trying to do it correctly. Also changed license to GPLv2+.

 The first rule matches, it calls a python2 script that basically checks the
 target_ulong is not overflowed: (msk << ofs) >> sizeof(target_ulong) == 0
>>>
>>> WRONG
>> [...]
>>
>> Is this script likely to be rerun in the future?  If yes, keeping it in
>> scripts/coccinelle/ is a good idea.  If no, I recommend to store it in
>> the commit message instead.
>
> It is unlikely to be rerun in the future, at least for this specific
> pattern. But it can be easily adapted for another TCG optimization.
>
> I could not find much documentation about how to do a such script
> using Python, except on a thread [1]. If it is documented enough I
> think it is worth to keep it.
>
> About putting it in each commit message, it is now 3 times bigger than
> the patch it generates!

Consider putting it into the first commit message, and have the others
reference back.



Re: [Qemu-devel] [PATCH 1/2] tests: check-qom-proplist: add checks for cmdline-created objects

2017-05-15 Thread Markus Armbruster
"Dr. David Alan Gilbert"  writes:

> I notice this pair of patches doesn't seem to have gone anywhere.
> WHile it's labelled as a monitor fix, it's all QOM stuff, so I don't
> think it should be going via me.

It's somewhere between QOM and QemuOpts.  Andreas, please have a look.
Feel free to ask me to take the patch through my tree.

> Dave
>
> * Michael Roth (mdr...@linux.vnet.ibm.com) wrote:
>> check-qom-proplist originally added tests for verifying that
>> object-creation helpers object_new_with_{props,propv} behaved in
>> similar fashion to the "traditional" method involving setting each
>> individual property separately after object creation rather than
>> via a single call.
>> 
>> Another similar "helper" for creating Objects exists in the form of
>> objects specified via -object command-line parameters. By that
>> rationale, we extend check-qom-proplist to include similar checks
>> for command-line-created objects by employing the same
>> qemu_opts_parse()-based parsing the vl.c employs.
>> 
>> This parser has a side-effect of parsing the object's options into
>> a QemuOpt structure and registering this in the global QemuOptsList
>> using the Object's ID. This can conflict with future Object instances
>> that attempt to use the same ID if we don't ensure this is cleaned
>> up as part of Object finalization, so we include a FIXME stub to test
>> for this case, which will then be resolved in a subsequent patch.
>> 
>> Suggested-by: Daniel Berrange 
>> Cc: "Dr. David Alan Gilbert" 
>> Cc: Markus Armbruster 
>> Cc: Eric Blake 
>> Cc: Daniel Berrange 
>> Cc: qemu-sta...@nongnu.org
>> Signed-off-by: Michael Roth 
>> Reviewed-by: Markus Armbruster 
>> ---
>>  tests/check-qom-proplist.c | 55 
>> ++
>>  1 file changed, 55 insertions(+)
>> 
>> diff --git a/tests/check-qom-proplist.c b/tests/check-qom-proplist.c
>> index a16cefc..e3f56ca 100644
>> --- a/tests/check-qom-proplist.c
>> +++ b/tests/check-qom-proplist.c
>> @@ -23,6 +23,9 @@
>>  #include "qapi/error.h"
>>  #include "qom/object.h"
>>  #include "qemu/module.h"
>> +#include "qemu/option.h"
>> +#include "qemu/config-file.h"
>> +#include "qom/object_interfaces.h"
>>  
>>  
>>  #define TYPE_DUMMY "qemu-dummy"
>> @@ -162,6 +165,10 @@ static const TypeInfo dummy_info = {
>>  .instance_finalize = dummy_finalize,
>>  .class_size = sizeof(DummyObjectClass),
>>  .class_init = dummy_class_init,
>> +.interfaces = (InterfaceInfo[]) {
>> +{ TYPE_USER_CREATABLE },
>> +{ }
>> +}
>>  };
>>  
>>  
>> @@ -320,6 +327,14 @@ static const TypeInfo dummy_backend_info = {
>>  .class_size = sizeof(DummyBackendClass),
>>  };
>>  
>> +static QemuOptsList qemu_object_opts = {
>> +.name = "object",
>> +.implied_opt_name = "qom-type",
>> +.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
>> +.desc = {
>> +{ }
>> +},
>> +};
>>  
>>  
>>  static void test_dummy_createv(void)
>> @@ -388,6 +403,45 @@ static void test_dummy_createlist(void)
>>  object_unparent(OBJECT(dobj));
>>  }
>>  
>> +static void test_dummy_createcmdl(void)
>> +{
>> +QemuOpts *opts;
>> +DummyObject *dobj;
>> +Error *err = NULL;
>> +const char *params = TYPE_DUMMY \
>> + ",id=dev0," \
>> + "bv=yes,sv=Hiss hiss hiss,av=platypus";
>> +
>> +qemu_add_opts(&qemu_object_opts);
>> +opts = qemu_opts_parse(&qemu_object_opts, params, true, &err);
>> +g_assert(err == NULL);
>> +g_assert(opts);
>> +
>> +dobj = DUMMY_OBJECT(user_creatable_add_opts(opts, &err));
>> +g_assert(err == NULL);
>> +g_assert(dobj);
>> +g_assert_cmpstr(dobj->sv, ==, "Hiss hiss hiss");
>> +g_assert(dobj->bv == true);
>> +g_assert(dobj->av == DUMMY_PLATYPUS);
>> +
>> +user_creatable_del("dev0", &err);
>> +g_assert(err == NULL);
>> +error_free(err);
>> +
>> +/* cmdline-parsing via qemu_opts_parse() results in a QemuOpts entry
>> + * corresponding to the Object's ID to be added to the QemuOptsList
>> + * for objects. To avoid having this entry conflict with future
>> + * Objects using the same ID (which can happen in cases where
>> + * qemu_opts_parse() is used to parse the object params, such as
>> + * with hmp_object_add() at the time of this comment), we need to
>> + * check for this in user_creatable_del() and remove the QemuOpts if
>> + * it is present.
>> + *
>> + * FIXME: add an assert to verify that the QemuOpts is cleaned up
>> + * once the corresponding cleanup code is added.
>> + */
>> +}
>> +
>>  static void test_dummy_badenum(void)
>>  {
>>  Error *err = NULL;
>> @@ -525,6 +579,7 @@ int main(int argc, char **argv)
>>  
>>  g_test_add_func("/qom/proplist/createlist", test_dummy_createlist);
>>  g_test_add_func("/qom/proplist/createv", test_dummy_createv);
>> +g_test_add_func("/qom/proplist/createcmdline", test_dummy_createcmdl);
>>  g_test_add_f

Re: [Qemu-devel] [PATCH 06/17] qnum: add uint type

2017-05-15 Thread Markus Armbruster
Marc-André Lureau  writes:

> In order to store integer values superior to INT64_MAX, add a u64
> internal representation.
>
> Signed-off-by: Marc-André Lureau 
> ---
>  include/qapi/qmp/qnum.h |  4 
>  qobject/qnum.c  | 49 +++
>  tests/check-qnum.c  | 51 
> +
>  3 files changed, 104 insertions(+)
>
> diff --git a/include/qapi/qmp/qnum.h b/include/qapi/qmp/qnum.h
> index 0e51427821..89c14e040f 100644
> --- a/include/qapi/qmp/qnum.h
> +++ b/include/qapi/qmp/qnum.h
> @@ -17,6 +17,7 @@
>  
>  typedef enum {
>  QNUM_I64,
> +QNUM_U64,
>  QNUM_DOUBLE
>  } QNumType;
>  
> @@ -25,14 +26,17 @@ typedef struct QNum {
>  QNumType type;
>  union {
>  int64_t i64;
> +uint64_t u64;
>  double dbl;
>  } u;
>  } QNum;
>  
>  QNum *qnum_from_int(int64_t value);
> +QNum *qnum_from_uint(uint64_t value);
>  QNum *qnum_from_double(double value);
>  
>  int64_t qnum_get_int(const QNum *qi, Error **errp);
> +uint64_t qnum_get_uint(const QNum *qi, Error **errp);
>  double qnum_get_double(QNum *qn);
>  
>  char *qnum_to_string(QNum *qn);
> diff --git a/qobject/qnum.c b/qobject/qnum.c
> index 8e9dd38350..be6307accf 100644
> --- a/qobject/qnum.c
> +++ b/qobject/qnum.c
> @@ -13,6 +13,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
> +#include "qemu/error-report.h"

Superfluous, please drop.

>  #include "qapi/qmp/qnum.h"
>  #include "qapi/qmp/qobject.h"
>  #include "qemu-common.h"
> @@ -34,6 +35,22 @@ QNum *qnum_from_int(int64_t value)
>  }
>  
>  /**
> + * qnum_from_uint(): Create a new QNum from an uint64_t
> + *
> + * Return strong reference.
> + */
> +QNum *qnum_from_uint(uint64_t value)
> +{
> +QNum *qn = g_new(QNum, 1);
> +
> +qobject_init(QOBJECT(qn), QTYPE_QNUM);
> +qn->type = QNUM_U64;
> +qn->u.u64 = value;
> +
> +return qn;
> +}
> +
> +/**
>   * qnum_from_double(): Create a new QNum from a double
>   *
>   * Return strong reference.
> @@ -57,6 +74,34 @@ int64_t qnum_get_int(const QNum *qn, Error **errp)
>  switch (qn->type) {
>  case QNUM_I64:
>  return qn->u.i64;
> +case QNUM_U64:
> +if (qn->u.u64 > INT64_MAX) {
> +error_setg(errp, "The number is too large, use qnum_get_uint()");

The user has no way to "use qnum_get_uint()".

> +return 0;
> +}
> +return qn->u.u64;
> +case QNUM_DOUBLE:
> +error_setg(errp, "The number is a float");
> +return 0;
> +}
> +
> +g_assert_not_reached();
> +}
> +
> +/**
> + * qnum_get_uint(): Get an unsigned integer from the number
> + */
> +uint64_t qnum_get_uint(const QNum *qn, Error **errp)
> +{
> +switch (qn->type) {
> +case QNUM_I64:
> +if (qn->u.i64 < 0) {
> +error_setg(errp, "The number is negative");

The new error messages are inadequate in the same way as the existing
one.

> +return 0;
> +}
> +return qn->u.i64;
> +case QNUM_U64:
> +return qn->u.u64;
>  case QNUM_DOUBLE:
>  error_setg(errp, "The number is a float");
>  return 0;
> @@ -73,6 +118,8 @@ double qnum_get_double(QNum *qn)
>  switch (qn->type) {
>  case QNUM_I64:
>  return qn->u.i64;
> +case QNUM_U64:
> +return qn->u.u64;
>  case QNUM_DOUBLE:
>  return qn->u.dbl;
>  }
> @@ -88,6 +135,8 @@ char *qnum_to_string(QNum *qn)
>  switch (qn->type) {
>  case QNUM_I64:
>  return g_strdup_printf("%" PRId64, qn->u.i64);
> +case QNUM_U64:
> +return g_strdup_printf("%" PRIu64, qn->u.u64);
>  case QNUM_DOUBLE:
>  /* FIXME: snprintf() is locale dependent; but JSON requires
>   * numbers to be formatted as if in the C locale. Dependence
> diff --git a/tests/check-qnum.c b/tests/check-qnum.c
> index d08d35e85a..8199546f99 100644
> --- a/tests/check-qnum.c
> +++ b/tests/check-qnum.c
> @@ -36,6 +36,21 @@ static void qnum_from_int_test(void)
>  g_free(qi);
>  }
>  
> +static void qnum_from_uint_test(void)
> +{
> +QNum *qu;
> +const int value = UINT_MAX;
> +
> +qu = qnum_from_int(value);
> +g_assert(qu != NULL);
> +g_assert(qu->u.u64 == value);
> +g_assert(qu->base.refcnt == 1);
> +g_assert(qobject_type(QOBJECT(qu)) == QTYPE_QNUM);
> +
> +// destroy doesn't exit yet
> +g_free(qu);
> +}
> +

Matches the other qnum_from_TYPE_test().  Let's clean them up before
this patch: use QDECREF() instead of g_free(), and drop the comment (see
Luiz's reply to PATCH 04).

>  static void qnum_from_double_test(void)
>  {
>  QNum *qf;
> @@ -73,6 +88,37 @@ static void qnum_get_int_test(void)
>  QDECREF(qi);
>  }
>  
> +static void qnum_get_uint_test(void)
> +{
> +QNum *qn;
> +const int value = 123456;
> +Error *err = NULL;
> +
> +qn = qnum_from_uint(value);
> +g_assert(qnum_get_uint(qn, &error_abort) == value);
> +QDECREF(qn);
> +
> +qn = 

Re: [Qemu-devel] [PATCH V4 03/12] net/filter-mirror.c: Make filter_mirror_send support vnet support.

2017-05-15 Thread Zhang Chen



On 05/15/2017 11:28 AM, Jason Wang wrote:



On 2017年05月12日 09:41, Zhang Chen wrote:

In this patch, if vnet_hdr=on we change the send packet format from
struct {int size; const uint8_t buf[];} to {int size; int 
vnet_hdr_len; const uint8_t buf[];}.
make other module(like colo-compare) know how to parse net packet 
correctly.


Signed-off-by: Zhang Chen 
---
  net/filter-mirror.c | 35 ++-
  1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 3766414..64323fc 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -44,10 +44,11 @@ typedef struct MirrorState {
  SocketReadState rs;
  } MirrorState;
  -static int filter_mirror_send(CharBackend *chr_out,
+static int filter_mirror_send(MirrorState *s,
const struct iovec *iov,
int iovcnt)
  {
+NetFilterState *nf = NETFILTER(s);
  int ret = 0;
  ssize_t size = 0;
  uint32_t len = 0;
@@ -59,14 +60,38 @@ static int filter_mirror_send(CharBackend *chr_out,
  }
len = htonl(size);
-ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)&len, sizeof(len));
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, 
sizeof(len));

  if (ret != sizeof(len)) {
  goto err;
  }
  +if (s->vnet_hdr) {
+/*
+ * If vnet_hdr = on, we send vnet header len to make other
+ * module(like colo-compare) know how to parse net
+ * packet correctly.
+ */
+ssize_t vnet_hdr_len;
+
+if (nf->netdev->using_vnet_hdr) {
+vnet_hdr_len = nf->netdev->vnet_hdr_len;
+} else if (nf->netdev->peer->using_vnet_hdr) {
+vnet_hdr_len = nf->netdev->peer->vnet_hdr_len;


Still questionable here, may need a comment to explain.



OK, I will add comments in next version.


+} else {
+error_report("filter get vnet_hdr_len failed");


Why need error here, could we just use zero?


Yes,we can.

Thanks
Zhang Chen



Thanks


+goto err;
+}
+
+len = htonl(vnet_hdr_len);
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, 
sizeof(len));

+if (ret != sizeof(len)) {
+goto err;
+}
+}
+
  buf = g_malloc(size);
  iov_to_buf(iov, iovcnt, 0, buf, size);
-ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)buf, size);
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
  g_free(buf);
  if (ret != size) {
  goto err;
@@ -142,7 +167,7 @@ static ssize_t 
filter_mirror_receive_iov(NetFilterState *nf,

  MirrorState *s = FILTER_MIRROR(nf);
  int ret;
  -ret = filter_mirror_send(&s->chr_out, iov, iovcnt);
+ret = filter_mirror_send(s, iov, iovcnt);
  if (ret) {
  error_report("filter_mirror_send failed(%s)", strerror(-ret));
  }
@@ -165,7 +190,7 @@ static ssize_t 
filter_redirector_receive_iov(NetFilterState *nf,

  int ret;
if (qemu_chr_fe_get_driver(&s->chr_out)) {
-ret = filter_mirror_send(&s->chr_out, iov, iovcnt);
+ret = filter_mirror_send(s, iov, iovcnt);
  if (ret) {
  error_report("filter_mirror_send failed(%s)", 
strerror(-ret));

  }




.



--
Thanks
Zhang Chen






Re: [Qemu-devel] [PATCH V4 04/12] net/filter-mirror.c: Add new option to enable vnet support for filter-redirector

2017-05-15 Thread Zhang Chen



On 05/15/2017 11:31 AM, Jason Wang wrote:



On 2017年05月12日 09:41, Zhang Chen wrote:

We add the vnet_hdr option for filter-redirector, default is disable.
If you use virtio-net-pci net driver, please enable it.


We need a better commit log here to explain e.g why we need this option.


OK. I will more comments in next version.

Thanks
Zhang Chen



Thanks


You can use it for example:
-object 
filter-redirector,id=r0,netdev=hn0,queue=tx,outdev=red0,vnet_hdr=on


Signed-off-by: Zhang Chen 
---
  net/filter-mirror.c | 33 +
  qemu-options.hx |  5 +++--
  2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 64323fc..a65853c 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -377,6 +377,13 @@ static char *filter_redirector_get_outdev(Object 
*obj, Error **errp)

  return g_strdup(s->outdev);
  }
  +static char *filter_redirector_get_vnet_hdr(Object *obj, Error 
**errp)

+{
+MirrorState *s = FILTER_REDIRECTOR(obj);
+
+return s->vnet_hdr ? g_strdup("on") : g_strdup("off");
+}
+
  static void
  filter_redirector_set_outdev(Object *obj, const char *value, Error 
**errp)

  {
@@ -386,6 +393,21 @@ filter_redirector_set_outdev(Object *obj, const 
char *value, Error **errp)

  s->outdev = g_strdup(value);
  }
  +static void filter_redirector_set_vnet_hdr(Object *obj,
+   const char *value,
+   Error **errp)
+{
+MirrorState *s = FILTER_REDIRECTOR(obj);
+
+if (strcmp(value, "on") && strcmp(value, "off")) {
+error_setg(errp, "Invalid value for filter-redirector 
vnet_hdr, "

+ "should be 'on' or 'off'");
+return;
+}
+
+s->vnet_hdr = !strcmp(value, "on");
+}
+
  static void filter_mirror_init(Object *obj)
  {
  MirrorState *s = FILTER_MIRROR(obj);
@@ -405,10 +427,21 @@ static void filter_mirror_init(Object *obj)
static void filter_redirector_init(Object *obj)
  {
+MirrorState *s = FILTER_REDIRECTOR(obj);
+
  object_property_add_str(obj, "indev", filter_redirector_get_indev,
  filter_redirector_set_indev, NULL);
  object_property_add_str(obj, "outdev", 
filter_redirector_get_outdev,

  filter_redirector_set_outdev, NULL);
+
+/*
+ * The vnet_hdr is disabled by default, if you want to enable
+ * this option, you must enable all the option on related modules
+ * (like other filter or colo-compare).
+ */
+s->vnet_hdr = false;
+object_property_add_str(obj, "vnet_hdr", 
filter_redirector_get_vnet_hdr,

+filter_redirector_set_vnet_hdr, NULL);
  }
static void filter_mirror_fini(Object *obj)
diff --git a/qemu-options.hx b/qemu-options.hx
index 1e08481..0f81c22 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4031,10 +4031,11 @@ filter-mirror on netdev @var{netdevid},mirror 
net packet to chardev

  with vnet_hdr_len.
@item -object 
filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},

-outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
+outdev=@var{chardevid},vnet_hdr=@var{on|off}[,queue=@var{all|rx|tx}]
filter-redirector on netdev @var{netdevid},redirect filter's net 
packet to chardev

-@var{chardevid},and redirect indev's packet to filter.
+@var{chardevid},and redirect indev's packet to filter.if vnet_hdr = on,
+filter-redirector will redirect packet with vnet_hdr_len.
  Create a filter-redirector we need to differ outdev id from indev 
id, id can not
  be the same. we can just use indev or outdev, but at least one of 
indev or outdev

  need to be specified.




.



--
Thanks
Zhang Chen






Re: [Qemu-devel] [PATCH V4 05/12] net/net.c: Add vnet_hdr support in SocketReadState

2017-05-15 Thread Zhang Chen



On 05/15/2017 12:02 PM, Jason Wang wrote:



On 2017年05月12日 09:41, Zhang Chen wrote:

Address Jason Wang's comments add vnet header length to SocketReadState.


Better to use "Suggested-by" instead of this :).


I got it~




We add a flag to dicide whether net_fill_rstate() to read
struct  {int size; int vnet_hdr_len; const uint8_t buf[];} or not.


Few words to explain the format is better than e.g a struct.


OK, will fix it in next version.





Signed-off-by: Zhang Chen 
---
  include/net/net.h   |  9 +++--
  net/colo-compare.c  |  4 ++--
  net/filter-mirror.c |  2 +-
  net/net.c   | 36 
  net/socket.c|  2 +-
  5 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 70edfc0..0763636 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -113,14 +113,19 @@ typedef struct NICState {
  } NICState;
struct SocketReadState {
-int state; /* 0 = getting length, 1 = getting data */
+/* 0 = getting length, 1 = getting vnet header length, 2 = 
getting data */

+int state;
  uint32_t index;
  uint32_t packet_len;
+uint32_t vnet_hdr_len;
  uint8_t buf[NET_BUFSIZE];
  SocketReadStateFinalize *finalize;
  };
  -int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int 
size);

+int net_fill_rstate(SocketReadState *rs,
+const uint8_t *buf,
+int size,
+bool vnet_hdr);


Why not just move vnet_hdr to SocketReadState?


Good idea.

Thanks
Zhang Chen



Thanks


  char *qemu_mac_strdup_printf(const uint8_t *macaddr);
  NetClientState *qemu_find_netdev(const char *id);
  int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 4ab80b1..332f57e 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -530,7 +530,7 @@ static void compare_pri_chr_in(void *opaque, 
const uint8_t *buf, int size)

  CompareState *s = COLO_COMPARE(opaque);
  int ret;
  -ret = net_fill_rstate(&s->pri_rs, buf, size);
+ret = net_fill_rstate(&s->pri_rs, buf, size, false);
  if (ret == -1) {
  qemu_chr_fe_set_handlers(&s->chr_pri_in, NULL, NULL, NULL,
   NULL, NULL, true);
@@ -547,7 +547,7 @@ static void compare_sec_chr_in(void *opaque, 
const uint8_t *buf, int size)

  CompareState *s = COLO_COMPARE(opaque);
  int ret;
  -ret = net_fill_rstate(&s->sec_rs, buf, size);
+ret = net_fill_rstate(&s->sec_rs, buf, size, false);
  if (ret == -1) {
  qemu_chr_fe_set_handlers(&s->chr_sec_in, NULL, NULL, NULL,
   NULL, NULL, true);
diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index a65853c..4649416 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -134,7 +134,7 @@ static void redirector_chr_read(void *opaque, 
const uint8_t *buf, int size)

  MirrorState *s = FILTER_REDIRECTOR(nf);
  int ret;
  -ret = net_fill_rstate(&s->rs, buf, size);
+ret = net_fill_rstate(&s->rs, buf, size, s->vnet_hdr);
if (ret == -1) {
  qemu_chr_fe_set_handlers(&s->chr_in, NULL, NULL, NULL,
diff --git a/net/net.c b/net/net.c
index a00a0c9..a9c97cf 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1618,13 +1618,20 @@ void net_socket_rs_init(SocketReadState *rs,
   * 0: success
   * -1: error occurs
   */
-int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size)
+int net_fill_rstate(SocketReadState *rs,
+const uint8_t *buf,
+int size,
+bool vnet_hdr)
  {
  unsigned int l;
while (size > 0) {
-/* reassemble a packet from the network */
-switch (rs->state) { /* 0 = getting length, 1 = getting data */
+/* Reassemble a packet from the network.
+ * 0 = getting length.
+ * 1 = getting vnet header length.
+ * 2 = getting data.
+ */
+switch (rs->state) {
  case 0:
  l = 4 - rs->index;
  if (l > size) {
@@ -1638,10 +1645,31 @@ int net_fill_rstate(SocketReadState *rs, 
const uint8_t *buf, int size)

  /* got length */
  rs->packet_len = ntohl(*(uint32_t *)rs->buf);
  rs->index = 0;
-rs->state = 1;
+if (vnet_hdr) {
+rs->state = 1;
+} else {
+rs->state = 2;
+rs->vnet_hdr_len = 0;
+}
  }
  break;
  case 1:
+l = 4 - rs->index;
+if (l > size) {
+l = size;
+}
+memcpy(rs->buf + rs->index, buf, l);
+buf += l;
+size -= l;
+rs->index += l;
+if (rs->index == 4) {
+/* got vnet header length */
+rs->vnet_hdr_len = ntohl(*(uint32_t *)rs->buf)

Re: [Qemu-devel] [PATCH V4 06/12] net/colo-compare.c: Add new option to enable vnet support for colo-compare

2017-05-15 Thread Zhang Chen



On 05/15/2017 12:03 PM, Jason Wang wrote:



On 2017年05月12日 09:41, Zhang Chen wrote:

We add the vnet_hdr option for colo-compare, default is disable.
If you use virtio-net-pci net driver, please enable it.
You can use it for example:
-object 
colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr=on


Signed-off-by: Zhang Chen 
---
  net/colo-compare.c | 34 +-
  qemu-options.hx|  3 ++-
  2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 332f57e..99a6912 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -73,6 +73,7 @@ typedef struct CompareState {
  CharBackend chr_out;
  SocketReadState pri_rs;
  SocketReadState sec_rs;
+bool vnet_hdr;
/* connection list: the connections belonged to this NIC 
could be found

   * in this list.
@@ -642,6 +643,28 @@ static void compare_set_outdev(Object *obj, 
const char *value, Error **errp)

  s->outdev = g_strdup(value);
  }
  +static char *compare_get_vnet_hdr(Object *obj, Error **errp)
+{
+CompareState *s = COLO_COMPARE(obj);
+
+return s->vnet_hdr ? g_strdup("on") : g_strdup("off");
+}
+
+static void compare_set_vnet_hdr(Object *obj,
+ const char *value,
+ Error **errp)
+{
+CompareState *s = COLO_COMPARE(obj);
+
+if (strcmp(value, "on") && strcmp(value, "off")) {
+error_setg(errp, "Invalid value for colo-compare vnet_hdr, "
+ "should be 'on' or 'off'");
+return;
+}
+
+s->vnet_hdr = !strcmp(value, "on");
+}
+
  static void compare_pri_rs_finalize(SocketReadState *pri_rs)
  {
  CompareState *s = container_of(pri_rs, CompareState, pri_rs);
@@ -667,7 +690,6 @@ static void 
compare_sec_rs_finalize(SocketReadState *sec_rs)

  }
  }
  -


Unnecessary whitespace change.


I will remove it in next version.




  /*
   * Return 0 is success.
   * Return 1 is failed.
@@ -775,6 +797,8 @@ static void colo_compare_class_init(ObjectClass 
*oc, void *data)

static void colo_compare_init(Object *obj)
  {
+CompareState *s = COLO_COMPARE(obj);
+
  object_property_add_str(obj, "primary_in",
  compare_get_pri_indev, 
compare_set_pri_indev,

  NULL);
@@ -784,6 +808,14 @@ static void colo_compare_init(Object *obj)
  object_property_add_str(obj, "outdev",
  compare_get_outdev, compare_set_outdev,
  NULL);
+/*
+ * The vnet_hdr is disabled by default, if you want to enable
+ * this option, you must enable all the option on related modules
+ * (like other filter or colo-compare).
+ */
+s->vnet_hdr = false;
+object_property_add_str(obj, "vnet_hdr", compare_get_vnet_hdr,
+compare_set_vnet_hdr, NULL);
  }
static void colo_compare_finalize(Object *obj)
diff --git a/qemu-options.hx b/qemu-options.hx
index 0f81c22..115b83f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4061,12 +4061,13 @@ The file format is libpcap, so it can be 
analyzed with tools such as tcpdump

  or Wireshark.
@item -object 
colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},

-outdev=@var{chardevid}
+outdev=@var{chardevid},vnet_hdr=@var{on|off}
Colo-compare gets packet from primary_in@var{chardevid} and 
secondary_in@var{chardevid}, than compare primary packet with

  secondary packet. If the packets are same, we will output primary
  packet to outdev@var{chardevid}, else we will notify colo-frame
  do checkpoint and send primary packet to outdev@var{chardevid}.
+if vnet_hdr = on, colo compare will send/recv packet with vnet_hdr_len.
we must use it with the help of filter-mirror and filter-redirector.


Please squash this into its function implementation.


OK.
Thanks
Zhang Chen



Thanks


.



--
Thanks
Zhang Chen






Re: [Qemu-devel] [PATCH V4 07/12] net/colo.c: Make vnet_hdr_len as packet property

2017-05-15 Thread Zhang Chen



On 05/15/2017 12:05 PM, Jason Wang wrote:



On 2017年05月12日 09:41, Zhang Chen wrote:

We can use this property flush and send packet with vnet_hdr_len.

Signed-off-by: Zhang Chen 


Then I think it's not necessary to store vnet_hdr_len in SocketReadState?


Do you means we keep the patch 05/12 in original?

Thanks
Zhang Chen



Thanks


---
  net/colo-compare.c| 8 ++--
  net/colo.c| 3 ++-
  net/colo.h| 4 +++-
  net/filter-rewriter.c | 2 +-
  4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 99a6912..87a9529 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -122,9 +122,13 @@ static int packet_enqueue(CompareState *s, int 
mode)

  Connection *conn;
if (mode == PRIMARY_IN) {
-pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len);
+pkt = packet_new(s->pri_rs.buf,
+ s->pri_rs.packet_len,
+ s->pri_rs.vnet_hdr_len);
  } else {
-pkt = packet_new(s->sec_rs.buf, s->sec_rs.packet_len);
+pkt = packet_new(s->sec_rs.buf,
+ s->sec_rs.packet_len,
+ s->sec_rs.vnet_hdr_len);
  }
if (parse_packet_early(pkt)) {
diff --git a/net/colo.c b/net/colo.c
index 8cc166b..180eaed 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -153,13 +153,14 @@ void connection_destroy(void *opaque)
  g_slice_free(Connection, conn);
  }
  -Packet *packet_new(const void *data, int size)
+Packet *packet_new(const void *data, int size, int vnet_hdr_len)
  {
  Packet *pkt = g_slice_new(Packet);
pkt->data = g_memdup(data, size);
  pkt->size = size;
  pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+pkt->vnet_hdr_len = vnet_hdr_len;
return pkt;
  }
diff --git a/net/colo.h b/net/colo.h
index 7c524f3..caedb0d 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -43,6 +43,8 @@ typedef struct Packet {
  int size;
  /* Time of packet creation, in wall clock ms */
  int64_t creation_ms;
+/* Get vnet_hdr_len from filter */
+uint32_t vnet_hdr_len;
  } Packet;
typedef struct ConnectionKey {
@@ -82,7 +84,7 @@ Connection *connection_get(GHashTable 
*connection_track_table,

 ConnectionKey *key,
 GQueue *conn_list);
  void connection_hashtable_reset(GHashTable *connection_track_table);
-Packet *packet_new(const void *data, int size);
+Packet *packet_new(const void *data, int size, int vnet_hdr_len);
  void packet_destroy(void *opaque, void *user_data);
#endif /* QEMU_COLO_PROXY_H */
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index afa06e8..63256c7 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -158,7 +158,7 @@ static ssize_t 
colo_rewriter_receive_iov(NetFilterState *nf,

  char *buf = g_malloc0(size);
iov_to_buf(iov, iovcnt, 0, buf, size);
-pkt = packet_new(buf, size);
+pkt = packet_new(buf, size, 0);
  g_free(buf);
/*




.



--
Thanks
Zhang Chen






Re: [Qemu-devel] [PATCH V4 10/12] net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare

2017-05-15 Thread Zhang Chen



On 05/15/2017 12:11 PM, Jason Wang wrote:



On 2017年05月12日 09:41, Zhang Chen wrote:

COLO-Proxy just focus on packet payload, So we skip vnet header.

Signed-off-by: Zhang Chen 
---
  net/colo-compare.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index cb0b04e..bf565f3 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -188,6 +188,8 @@ static int packet_enqueue(CompareState *s, int mode)
   */
  static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, 
int offset)

  {
+int offset_all;
+
  if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
  char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], 
sec_ip_dst[20];
  @@ -201,9 +203,12 @@ static int colo_packet_compare_common(Packet 
*ppkt, Packet *spkt, int offset)

 sec_ip_src, sec_ip_dst);
  }
  +offset_all = ppkt->vnet_hdr_len + offset;


How about just keep using offset by increasing it with vnet_hdr_len?


OK, I will fix it in next version.

Thanks
Zhang Chen



Thanks


+
  if (ppkt->size == spkt->size) {
-return memcmp(ppkt->data + offset, spkt->data + offset,
-  spkt->size - offset);
+return memcmp(ppkt->data + offset_all,
+  spkt->data + offset_all,
+  spkt->size - offset_all);
  } else {
  trace_colo_compare_main("Net packet size are not the same");
  return -1;
@@ -261,8 +266,9 @@ static int colo_packet_compare_tcp(Packet *spkt, 
Packet *ppkt)

   */
  if (ptcp->th_off > 5) {
  ptrdiff_t tcp_offset;
+
  tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
- + (ptcp->th_off * 4);
+ + (ptcp->th_off * 4) - ppkt->vnet_hdr_len;
  res = colo_packet_compare_common(ppkt, spkt, tcp_offset);
  } else if (ptcp->th_sum == stcp->th_sum) {
  res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);




.



--
Thanks
Zhang Chen






Re: [Qemu-devel] [PATCH V4 11/12] net/filter-rewriter.c: Add new option to enable vnet support for filter-rewriter

2017-05-15 Thread Zhang Chen



On 05/15/2017 12:12 PM, Jason Wang wrote:



On 2017年05月12日 09:41, Zhang Chen wrote:

We add the vnet_hdr option for filter-rewriter, default is disable.
If you use virtio-net-pci net driver, please enable it.
You can use it for example:
-object filter-rewriter,id=rew0,netdev=hn0,queue=all,vnet_hdr=on

Signed-off-by: Zhang Chen 


As has been pointed out, let's squash this into patch 12.


OK~

Thanks
Zhang Chen



Thanks






--
Thanks
Zhang Chen






Re: [Qemu-devel] [PATCH V4 01/12] net: Add vnet_hdr_len related arguments in NetClientState

2017-05-15 Thread Jason Wang



On 2017年05月15日 14:56, Zhang Chen wrote:



On 05/15/2017 11:24 AM, Jason Wang wrote:



On 2017年05月14日 05:24, Philippe Mathieu-Daudé wrote:

Hi Zhang

On 05/11/2017 10:41 PM, Zhang Chen wrote:

Add vnet_hdr_len and using_vnet_hdr arguments in NetClientState
that make othermodule get real vnet_hdr_len easily.

Signed-off-by: Zhang Chen 
---
 include/net/net.h | 2 ++
 net/net.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 99b28d5..70edfc0 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -100,6 +100,8 @@ struct NetClientState {
 unsigned int queue_index;
 unsigned rxfilter_notify_enabled:1;
 int vring_enable;
+bool using_vnet_hdr;
+int vnet_hdr_len;
 QTAILQ_HEAD(NetFilterHead, NetFilterState) filters;
 };

diff --git a/net/net.c b/net/net.c
index 0ac3b9e..a00a0c9 100644
--- a/net/net.c
+++ b/net/net.c
@@ -472,6 +472,7 @@ void qemu_using_vnet_hdr(NetClientState *nc, 
bool enable)

 return;
 }

+nc->using_vnet_hdr = enable;
 nc->info->using_vnet_hdr(nc, enable);
 }

@@ -491,6 +492,7 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, 
int len)

 return;
 }

+nc->vnet_hdr_len = len;
 nc->info->set_vnet_hdr_len(nc, len);
 }



For what it's worth, now having those fields in NetClientState it is 
possible to remove a deref to NetClientInfo in qemu_has_vnet_hdr() 
and qemu_has_vnet_hdr_len().


Anyway,
Reviewed-by: Philippe Mathieu-Daudé 




Yes and this could be done on top with removing private e.g 
vnet_hdr_len.


Do you means we will remove the qemu_has_vnet_hdr() and 
qemu_has_vnet_hdr_len() in the future?


Thanks
Zhang Chen


Yes and e.g both tap and netmap have its private vnet header field. We 
can remove them too.


Thanks





Thanks


.








Re: [Qemu-devel] [PATCH V4 01/12] net: Add vnet_hdr_len related arguments in NetClientState

2017-05-15 Thread Zhang Chen



On 05/15/2017 04:06 PM, Jason Wang wrote:



On 2017年05月15日 14:56, Zhang Chen wrote:



On 05/15/2017 11:24 AM, Jason Wang wrote:



On 2017年05月14日 05:24, Philippe Mathieu-Daudé wrote:

Hi Zhang

On 05/11/2017 10:41 PM, Zhang Chen wrote:

Add vnet_hdr_len and using_vnet_hdr arguments in NetClientState
that make othermodule get real vnet_hdr_len easily.

Signed-off-by: Zhang Chen 
---
 include/net/net.h | 2 ++
 net/net.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 99b28d5..70edfc0 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -100,6 +100,8 @@ struct NetClientState {
 unsigned int queue_index;
 unsigned rxfilter_notify_enabled:1;
 int vring_enable;
+bool using_vnet_hdr;
+int vnet_hdr_len;
 QTAILQ_HEAD(NetFilterHead, NetFilterState) filters;
 };

diff --git a/net/net.c b/net/net.c
index 0ac3b9e..a00a0c9 100644
--- a/net/net.c
+++ b/net/net.c
@@ -472,6 +472,7 @@ void qemu_using_vnet_hdr(NetClientState *nc, 
bool enable)

 return;
 }

+nc->using_vnet_hdr = enable;
 nc->info->using_vnet_hdr(nc, enable);
 }

@@ -491,6 +492,7 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, 
int len)

 return;
 }

+nc->vnet_hdr_len = len;
 nc->info->set_vnet_hdr_len(nc, len);
 }



For what it's worth, now having those fields in NetClientState it 
is possible to remove a deref to NetClientInfo in 
qemu_has_vnet_hdr() and qemu_has_vnet_hdr_len().


Anyway,
Reviewed-by: Philippe Mathieu-Daudé 




Yes and this could be done on top with removing private e.g 
vnet_hdr_len.


Do you means we will remove the qemu_has_vnet_hdr() and 
qemu_has_vnet_hdr_len() in the future?


Thanks
Zhang Chen


Yes and e.g both tap and netmap have its private vnet header field. We 
can remove them too.


Maybe I should do this job in this series next version or a independent 
series?


Thanks
Zhang Chen



Thanks





Thanks


.







.



--
Thanks
Zhang Chen






Re: [Qemu-devel] [PATCH V4 07/12] net/colo.c: Make vnet_hdr_len as packet property

2017-05-15 Thread Jason Wang



On 2017年05月15日 16:03, Zhang Chen wrote:



On 05/15/2017 12:05 PM, Jason Wang wrote:



On 2017年05月12日 09:41, Zhang Chen wrote:

We can use this property flush and send packet with vnet_hdr_len.

Signed-off-by: Zhang Chen 


Then I think it's not necessary to store vnet_hdr_len in 
SocketReadState?


Do you means we keep the patch 05/12 in original?

Thanks
Zhang Chen



I mean we could fetch vnet_hdr_len from the buf directly. Or is there 
any advantage to store it in SocketReadState?


Thanks




Re: [Qemu-devel] [PATCH RESEND v2 01/21] qdev: Replace cannot_instantiate_with_device_add_yet with !user_creatable

2017-05-15 Thread Markus Armbruster
Eduardo Habkost  writes:

> cannot_instantiate_with_device_add_yet was introduced by commit
> efec3dd631d94160288392721a5f9c39e50fb2bc to replace no_user. It was
> supposed to be a temporary measure.
>
> When it was introduced, we had 54
> cannot_instantiate_with_device_add_yet=true lines in the code.
> Today (3 years later) this number has not shrinked: we now have

shrunk

> 57 cannot_instantiate_with_device_add_yet=true lines. I think it
> is safe to say it is not a temporary measure, and we won't see
> the flag go away soon.
>
> Instead of a long field name that misleads people to believe it
> is temporary, replace it a shorter and less misleading field:
> user_creatable.
>
> Except for code comments, changes were generated using the
> following Coccinelle patch:
>
>   @@
>   expression DC;
>   @@
>   (
>   -DC->cannot_instantiate_with_device_add_yet = false;
>   +DC->user_creatable = true;
>   |
>   -DC->cannot_instantiate_with_device_add_yet = true;
>   +DC->user_creatable = false;
>   )
>
>   @@
>   typedef ObjectClass;
>   expression dc;
>   identifier class, data;
>   @@
>static void device_class_init(ObjectClass *class, void *data)
>{
>...
>dc->hotpluggable = true;
>   +dc->user_creatable = true;
>...
>}
>
>   @@
>   @@
>struct DeviceClass {
>...
>   -bool cannot_instantiate_with_device_add_yet;
>   +bool user_creatable;
>...
>   }
>
>   @@
>   expression DC;
>   @@
>   (
>   -!DC->cannot_instantiate_with_device_add_yet
>   +DC->user_creatable
>   |
>   -DC->cannot_instantiate_with_device_add_yet
>   +!DC->user_creatable
>   )
>
> Cc: Alistair Francis 
> Cc: Laszlo Ersek 
> Cc: Marcel Apfelbaum 
> Cc: Markus Armbruster 
> Cc: Peter Maydell 
> Cc: Thomas Huth 
> Acked-by: Alistair Francis 
> Reviewed-by: Thomas Huth 
> Reviewed-by: Marcel Apfelbaum 
> Acked-by: Marcel Apfelbaum 
> Signed-off-by: Eduardo Habkost 
> ---
> Changes v1 -> v2:
> * (none)
>
> Changes v2 -> v3:
> * Fixed commit ID reference on commit message
> * (No code changes)
> ---
>  include/hw/qdev-core.h  | 10 +-
>  include/hw/qdev-properties.h|  4 ++--
>  hw/acpi/piix4.c |  2 +-
>  hw/arm/spitz.c  |  2 +-
>  hw/audio/marvell_88w8618.c  |  2 +-
>  hw/audio/pcspk.c|  2 +-
>  hw/core/or-irq.c|  2 +-
>  hw/core/qdev.c  |  1 +
>  hw/core/register.c  |  2 +-
>  hw/dma/i8257.c  |  2 +-
>  hw/dma/sparc32_dma.c|  2 +-
>  hw/gpio/omap_gpio.c |  4 ++--
>  hw/i2c/omap_i2c.c   |  2 +-
>  hw/i2c/smbus_eeprom.c   |  2 +-
>  hw/i2c/smbus_ich9.c |  2 +-
>  hw/i386/pc.c|  2 +-
>  hw/input/vmmouse.c  |  2 +-
>  hw/intc/apic_common.c   |  2 +-
>  hw/intc/etraxfs_pic.c   |  2 +-
>  hw/intc/grlib_irqmp.c   |  2 +-
>  hw/intc/i8259_common.c  |  2 +-
>  hw/intc/nios2_iic.c |  2 +-
>  hw/intc/omap_intc.c |  4 ++--
>  hw/isa/lpc_ich9.c   |  2 +-
>  hw/isa/piix4.c  |  2 +-
>  hw/isa/vt82c686.c   |  2 +-
>  hw/mips/gt64xxx_pci.c   |  2 +-
>  hw/misc/vmport.c|  2 +-
>  hw/net/dp8393x.c|  2 +-
>  hw/net/etraxfs_eth.c|  2 +-
>  hw/net/lance.c  |  2 +-
>  hw/pci-bridge/dec.c |  2 +-
>  hw/pci-bridge/pci_expander_bridge.c |  2 +-
>  hw/pci-host/apb.c   |  2 +-
>  hw/pci-host/bonito.c|  2 +-
>  hw/pci-host/gpex.c  |  2 +-
>  hw/pci-host/grackle.c   |  2 +-
>  hw/pci-host/piix.c  |  6 +++---
>  hw/pci-host/ppce500.c   |  2 +-
>  hw/pci-host/prep.c  |  2 +-
>  hw/pci-host/q35.c   |  4 ++--
>  hw/pci-host/uninorth.c  |  8 
>  hw/pci-host/versatile.c |  2 +-
>  hw/pci-host/xilinx-pcie.c   |  2 +-
>  hw/ppc/ppc4xx_pci.c |  2 +-
>  hw/ppc/spapr_drc.c  |  2 +-
>  hw/s390x/s390-pci-bus.c |  2 +-
>  hw/sd/milkymist-memcard.c   |  2 +-
>  hw/sd/pl181.c   |  2 +-
>  hw/sh4/sh_pci.c |  2 +-
>  hw/timer/i8254_common.c |  2 +-
>  hw/timer/mc146818rtc.c  |  2 +-
>  monitor.c   |  2 +-
>  qdev-monitor.c  |  6 +++---
>  qom/cpu.c   |  2 +-
>  target/i386/cpu.c   |  2 +-
>  56 files changed, 71 insertions(+), 70 deletions(-)
>
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 4bf86b0ad8..6ee49fbe33 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -103,16 +103,16 @@ typedef struct DeviceClass {
>  Property *props;
>  
>  /*
> -

[Qemu-devel] [PATCH] target/ppc: reset reservation in do_rfi()

2017-05-15 Thread Nikunj A Dadhania
For transitioning back to userspace after the interrupt.

Suggested-by: Richard Henderson 
Signed-off-by: Nikunj A Dadhania 
---
 target/ppc/excp_helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index a6bcb47..9cb2123 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -995,6 +995,9 @@ static inline void do_rfi(CPUPPCState *env, target_ulong 
nip, target_ulong msr)
  */
 cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
 
+/* Reset the reservation */
+env->reserve_addr = -1;
+
 /* Context synchronizing: check if TCG TLB needs flush */
 check_tlb_flush(env, false);
 }
-- 
2.9.3




Re: [Qemu-devel] [PATCH RESEND v2 03/21] xen-backend: Remove FIXME comment about user_creatable flag

2017-05-15 Thread Markus Armbruster
Stefano or Anthony, please review.

Eduardo Habkost  writes:

> xen-backend can be plugged/unplugged dynamically when using the
> Xen accelerator, so keep the user_creatable flag on the device
> class and remove the FIXME comment.
>
> Cc: Juergen Gross ,
> Cc: Peter Maydell ,
> Cc: Thomas Huth 
> Cc: sstabell...@kernel.org
> Cc: Markus Armbruster ,
> Cc: Marcel Apfelbaum ,
> Cc: Laszlo Ersek 
> Acked-by: Juergen Gross 
> Acked-by: Marcel Apfelbaum 
> Signed-off-by: Eduardo Habkost 
> ---
> Changes series v1 -> v2:
> * (New patch added to series)
> ---
>  hw/xen/xen_backend.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
> index 67cb4cb9f0..2b91d2d458 100644
> --- a/hw/xen/xen_backend.c
> +++ b/hw/xen/xen_backend.c
> @@ -619,10 +619,7 @@ static void xendev_class_init(ObjectClass *klass, void 
> *data)
>  
>  dc->props = xendev_properties;
>  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> -/*
> - * FIXME: Set only because we are not sure yet if this device
> - * will be outside the q35 sysbus whitelist.
> - */
> +/* xen-backend devices can be plugged/unplugged dynamically */
>  dc->user_creatable = true;
>  }



[Qemu-devel] [PATCH 0/4] exec: address space translation cleanups

2017-05-15 Thread Peter Xu
The problem is that, address_space_get_iotlb_entry() shares a lot with
address_space_translate(). This patch tries to abstract the
shared elements.

Originally, this work is derived from discussion from VT-d passthrough
series discussions [1]. But for sure we can just see this series as a
standalone cleanup. So I posted it separately here.

Smoke tests are done with general VM boots, IOs, especially with vhost
dmar configurations.

I believe with current series I can throw away the old patch [1],
which may be good. But before that, please kindly review. Thanks.

References:

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg02335.html

Peter Xu (4):
  exec: simplify phys_page_find() params
  exec: rename resolve_subpage
  exec: further use is_mmio
  exec: abstract address_space_do_translate()

 exec.c | 122 -
 1 file changed, 76 insertions(+), 46 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH 1/4] exec: simplify phys_page_find() params

2017-05-15 Thread Peter Xu
It really only plays with the dispatchers, so the parameter list does
not need that complexity. This helps for readability at least.

Signed-off-by: Peter Xu 
---
 exec.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/exec.c b/exec.c
index eac6085..1d76c63 100644
--- a/exec.c
+++ b/exec.c
@@ -371,10 +371,11 @@ static inline bool section_covers_addr(const 
MemoryRegionSection *section,
  int128_getlo(section->size), addr);
 }
 
-static MemoryRegionSection *phys_page_find(PhysPageEntry lp, hwaddr addr,
-   Node *nodes, MemoryRegionSection 
*sections)
+static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr 
addr)
 {
-PhysPageEntry *p;
+PhysPageEntry lp = d->phys_map, *p;
+Node *nodes = d->map.nodes;
+MemoryRegionSection *sections = d->map.sections;
 hwaddr index = addr >> TARGET_PAGE_BITS;
 int i;
 
@@ -412,8 +413,7 @@ static MemoryRegionSection 
*address_space_lookup_region(AddressSpaceDispatch *d,
 section_covers_addr(section, addr)) {
 update = false;
 } else {
-section = phys_page_find(d->phys_map, addr, d->map.nodes,
- d->map.sections);
+section = phys_page_find(d, addr);
 update = true;
 }
 if (resolve_subpage && section->mr->subpage) {
@@ -1246,8 +1246,7 @@ static void register_subpage(AddressSpaceDispatch *d, 
MemoryRegionSection *secti
 subpage_t *subpage;
 hwaddr base = section->offset_within_address_space
 & TARGET_PAGE_MASK;
-MemoryRegionSection *existing = phys_page_find(d->phys_map, base,
-   d->map.nodes, 
d->map.sections);
+MemoryRegionSection *existing = phys_page_find(d, base);
 MemoryRegionSection subsection = {
 .offset_within_address_space = base,
 .size = int128_make64(TARGET_PAGE_SIZE),
-- 
2.7.4




[Qemu-devel] [PATCH 3/4] exec: further use is_mmio

2017-05-15 Thread Peter Xu
This is MMIO-specific tunes on the size. Let's skip it for non-MMIO
translations.

Signed-off-by: Peter Xu 
---
 exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index 0adae94..32e5394 100644
--- a/exec.c
+++ b/exec.c
@@ -455,7 +455,7 @@ address_space_translate_internal(AddressSpaceDispatch *d, 
hwaddr addr, hwaddr *x
  * everything works fine.  If the incoming length is large, however,
  * the caller really has to do the clamping through memory_access_size.
  */
-if (memory_region_is_ram(mr)) {
+if (is_mmio && memory_region_is_ram(mr)) {
 diff = int128_sub(section->size, int128_make64(addr));
 *plen = int128_get64(int128_min(diff, int128_make64(*plen)));
 }
-- 
2.7.4




[Qemu-devel] [PATCH 2/4] exec: rename resolve_subpage

2017-05-15 Thread Peter Xu
It is not easy for people to know "what this parameter does" before
knowing "what is subpage". Let's use "is_mmio" to make it easier to
understand.

Signed-off-by: Peter Xu 
---
 exec.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index 1d76c63..0adae94 100644
--- a/exec.c
+++ b/exec.c
@@ -403,7 +403,7 @@ bool memory_region_is_unassigned(MemoryRegion *mr)
 /* Called from RCU critical section */
 static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch 
*d,
 hwaddr addr,
-bool resolve_subpage)
+bool is_mmio)
 {
 MemoryRegionSection *section = atomic_read(&d->mru_section);
 subpage_t *subpage;
@@ -416,7 +416,7 @@ static MemoryRegionSection 
*address_space_lookup_region(AddressSpaceDispatch *d,
 section = phys_page_find(d, addr);
 update = true;
 }
-if (resolve_subpage && section->mr->subpage) {
+if (is_mmio && section->mr->subpage) {
 subpage = container_of(section->mr, subpage_t, iomem);
 section = &d->map.sections[subpage->sub_section[SUBPAGE_IDX(addr)]];
 }
@@ -429,13 +429,13 @@ static MemoryRegionSection 
*address_space_lookup_region(AddressSpaceDispatch *d,
 /* Called from RCU critical section */
 static MemoryRegionSection *
 address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr 
*xlat,
- hwaddr *plen, bool resolve_subpage)
+ hwaddr *plen, bool is_mmio)
 {
 MemoryRegionSection *section;
 MemoryRegion *mr;
 Int128 diff;
 
-section = address_space_lookup_region(d, addr, resolve_subpage);
+section = address_space_lookup_region(d, addr, is_mmio);
 /* Compute offset within MemoryRegionSection */
 addr -= section->offset_within_address_space;
 
-- 
2.7.4




[Qemu-devel] [PATCH 4/4] exec: abstract address_space_do_translate()

2017-05-15 Thread Peter Xu
This function is an abstraction helper for address_space_translate() and
address_space_get_iotlb_entry(). It does the lookup of address into
memory region section, then do proper IOMMU translation if necessary.
With that, refactor the two existing functions a bit to use it.

Signed-off-by: Peter Xu 
---
 exec.c | 99 +++---
 1 file changed, 65 insertions(+), 34 deletions(-)

diff --git a/exec.c b/exec.c
index 32e5394..efc80a8 100644
--- a/exec.c
+++ b/exec.c
@@ -463,18 +463,20 @@ address_space_translate_internal(AddressSpaceDispatch *d, 
hwaddr addr, hwaddr *x
 }
 
 /* Called from RCU critical section */
-IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
-bool is_write)
+static MemoryRegionSection address_space_do_translate(AddressSpace *as,
+  hwaddr addr,
+  hwaddr *xlat,
+  hwaddr *plen,
+  bool is_write,
+  bool is_mmio)
 {
-IOMMUTLBEntry iotlb = {0};
+IOMMUTLBEntry iotlb;
 MemoryRegionSection *section;
 MemoryRegion *mr;
 
 for (;;) {
 AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
-section = address_space_lookup_region(d, addr, false);
-addr = addr - section->offset_within_address_space
-   + section->offset_within_region;
+section = address_space_translate_internal(d, addr, &addr, plen, 
is_mmio);
 mr = section->mr;
 
 if (!mr->iommu_ops) {
@@ -482,55 +484,84 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace 
*as, hwaddr addr,
 }
 
 iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
+| (addr & iotlb.addr_mask));
+*plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
 if (!(iotlb.perm & (1 << is_write))) {
-iotlb.target_as = NULL;
-break;
+goto translate_fail;
 }
 
-addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
-| (addr & iotlb.addr_mask));
 as = iotlb.target_as;
 }
 
-return iotlb;
+*xlat = addr;
+
+return *section;
+
+translate_fail:
+return (MemoryRegionSection) { .mr = &io_mem_unassigned };
 }
 
 /* Called from RCU critical section */
-MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
-  hwaddr *xlat, hwaddr *plen,
-  bool is_write)
+IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
+bool is_write)
 {
-IOMMUTLBEntry iotlb;
-MemoryRegionSection *section;
-MemoryRegion *mr;
+MemoryRegionSection section;
+hwaddr xlat, plen;
 
-for (;;) {
-AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
-section = address_space_translate_internal(d, addr, &addr, plen, true);
-mr = section->mr;
+/* Try to get maximum page mask during translation. */
+plen = (hwaddr)-1;
 
-if (!mr->iommu_ops) {
-break;
-}
+/* This can never be MMIO. */
+section = address_space_do_translate(as, addr, &xlat, &plen,
+ is_write, false);
 
-iotlb = mr->iommu_ops->translate(mr, addr, is_write);
-addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
-| (addr & iotlb.addr_mask));
-*plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
-if (!(iotlb.perm & (1 << is_write))) {
-mr = &io_mem_unassigned;
-break;
-}
+/* Illegal translation */
+if (section.mr == &io_mem_unassigned) {
+goto iotlb_fail;
+}
 
-as = iotlb.target_as;
+if (plen == (hwaddr)-1) {
+/*
+ * We use default page size here. Logically it only happens
+ * for identity mappings.
+ */
+plen = TARGET_PAGE_SIZE;
 }
 
+/* Convert to address mask */
+plen -= 1;
+
+return (IOMMUTLBEntry) {
+.target_as = section.address_space,
+.iova = addr & ~plen,
+.translated_addr = xlat & ~plen,
+.addr_mask = plen,
+/* IOTLBs are for DMAs, and DMA only allows on RAMs. */
+.perm = IOMMU_RW,
+};
+
+iotlb_fail:
+return (IOMMUTLBEntry) {0};
+}
+
+/* Called from RCU critical section */
+MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
+  hwaddr *xlat, hwaddr *plen,
+  bool is_write)
+{
+MemoryRegion *mr;
+MemoryRegionSection section;
+
+/* This can be MMIO, so setup MMIO bit. */
+section = address

Re: [Qemu-devel] [PATCH v3 04/12] memory: fix address_space_get_iotlb_entry()

2017-05-15 Thread Peter Xu
On Fri, May 12, 2017 at 03:25:08PM +1000, David Gibson wrote:
> On Thu, May 11, 2017 at 05:36:03PM +0800, Peter Xu wrote:
> > On Thu, May 11, 2017 at 11:56:38AM +1000, David Gibson wrote:
> > > On Wed, May 10, 2017 at 04:01:47PM +0800, Peter Xu wrote:
> > > > This function has an assumption that we will definitely call translate()
> > > > once (or say, the addr will be located inside one IOMMU memory region),
> > > > otherwise an empty IOTLB will be returned. Nevertheless, this is not
> > > > what we want. When there is no IOMMU memory region, we should build up a
> > > > static mapping for the caller, instead of an invalid IOTLB.
> > > > 
> > > > We won't trigger this path before VT-d passthrough mode. When
> > > > passthrough mode for a vhost device is setup, VT-d is possible to
> > > > disable the IOMMU region for that device. Without current patch, we'll
> > > > get a vhost boot failure, and it'll be failed over to virtio userspace
> > > > mode.
> > > 
> > > This doesn't look right to me.  You're assuming the target is
> > > address_space_memory, which might not be the case - and you should be
> > > able to check from the MR you do hit.  Furthermore it doesn't look
> > > like you're accounting for the trivial translation if the section's
> > > offset in the address space is different from its offset in the MR.
> > 
> > Do you mean this line?
> > 
> > addr = addr - section->offset_within_address_space
> >+ section->offset_within_region;
> 
> Uh.. where is that line?  But.. wait, yes, I think I was mistaken.  I saw:
>   .translated_addr = iova,
>   
> and thought that meant you were assuming an identify mapping from iova
> to translated addr.  But thinking more carefull, IIRC iova and
> translated_addr are both relative to the MR, not the AS, so I think
> that is correct after all.
> 
> > I thought it was calculating the relative address against that memory
> > region. That should only be useful if we want to do further
> > translate(), right? For the path that this patch tries to handle (when
> > there is no translate() call), then this "addr" is useless here?
> > 
> > Regarding to the address space assignment - do you mean, e.g., I
> > should use section->address_space here instead of
> > &system_address_space? If so, I can do the switch.
> 
> Yes, I think you should.
> 
> > But after all, for
> > now address_space_get_iotlb_entry() is only used by vhost codes, and
> > it only check against iotlb.target_as == NULL, so the address space
> > didn't count too much here...
> 
> > Another reason I used &address_space_memory is that in
> > vfio_iommu_map_notify() we have a check against it:
> > 
> > if (iotlb->target_as != &address_space_memory) {
> > error_report("Wrong target AS \"%s\", only system memory is 
> > allowed",
> >  iotlb->target_as->name ? iotlb->target_as->name : 
> > "none");
> > return;
> > }
> > 
> > Or say, we have some assumptions (not only in this patch) that assumes
> > this iotlb.target_as should be system_address_space.
> 
> Right, the vhost code can only handle some IOMMU setups - something
> like nested IOMMUs would break it.  But this way if someone sets up a
> machine with an IOMMU configuration that vhost can't handle, you'll
> get an error message, rather than accesses to unexpected locations,
> which could cause really hard to debug corruption.
> 
> In other words we make assumptions, but we should _test_ those
> assumptions.
> 
> I also think it would make sense to use address_space_translate() if we
> can, since it's an existing interface for a very similar operation.

I did give it a shot to generalize these two functions in this series
(I just posted it):

  [PATCH 0/4] exec: address space translation cleanups

Please see whether that'll be a good replacement of this single patch.

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH 2/4] exec: rename resolve_subpage

2017-05-15 Thread Paolo Bonzini


On 15/05/2017 10:50, Peter Xu wrote:
> It is not easy for people to know "what this parameter does" before
> knowing "what is subpage". Let's use "is_mmio" to make it easier to
> understand.
> 
> Signed-off-by: Peter Xu 

Maybe invert the direction and call it for_iotlb?

Paolo

> ---
>  exec.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 1d76c63..0adae94 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -403,7 +403,7 @@ bool memory_region_is_unassigned(MemoryRegion *mr)
>  /* Called from RCU critical section */
>  static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch 
> *d,
>  hwaddr addr,
> -bool resolve_subpage)
> +bool is_mmio)
>  {
>  MemoryRegionSection *section = atomic_read(&d->mru_section);
>  subpage_t *subpage;
> @@ -416,7 +416,7 @@ static MemoryRegionSection 
> *address_space_lookup_region(AddressSpaceDispatch *d,
>  section = phys_page_find(d, addr);
>  update = true;
>  }
> -if (resolve_subpage && section->mr->subpage) {
> +if (is_mmio && section->mr->subpage) {
>  subpage = container_of(section->mr, subpage_t, iomem);
>  section = &d->map.sections[subpage->sub_section[SUBPAGE_IDX(addr)]];
>  }
> @@ -429,13 +429,13 @@ static MemoryRegionSection 
> *address_space_lookup_region(AddressSpaceDispatch *d,
>  /* Called from RCU critical section */
>  static MemoryRegionSection *
>  address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, 
> hwaddr *xlat,
> - hwaddr *plen, bool resolve_subpage)
> + hwaddr *plen, bool is_mmio)
>  {
>  MemoryRegionSection *section;
>  MemoryRegion *mr;
>  Int128 diff;
>  
> -section = address_space_lookup_region(d, addr, resolve_subpage);
> +section = address_space_lookup_region(d, addr, is_mmio);
>  /* Compute offset within MemoryRegionSection */
>  addr -= section->offset_within_address_space;
>  
> 



Re: [Qemu-devel] [PATCH 3/4] exec: further use is_mmio

2017-05-15 Thread Paolo Bonzini


On 15/05/2017 10:50, Peter Xu wrote:
> This is MMIO-specific tunes on the size. Let's skip it for non-MMIO
> translations.

Can you explain this better?  This is not specific to MMIO, in fact it's
for RAM...

Paolo

> Signed-off-by: Peter Xu 
> ---
>  exec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/exec.c b/exec.c
> index 0adae94..32e5394 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -455,7 +455,7 @@ address_space_translate_internal(AddressSpaceDispatch *d, 
> hwaddr addr, hwaddr *x
>   * everything works fine.  If the incoming length is large, however,
>   * the caller really has to do the clamping through memory_access_size.
>   */
> -if (memory_region_is_ram(mr)) {
> +if (is_mmio && memory_region_is_ram(mr)) {
>  diff = int128_sub(section->size, int128_make64(addr));
>  *plen = int128_get64(int128_min(diff, int128_make64(*plen)));



Re: [Qemu-devel] [Qemu-block] [PULL 05/58] qemu-img: Update documentation for -U

2017-05-15 Thread Fam Zheng
On Fri, 05/12 19:37, Max Reitz wrote:
> On 2017-05-11 16:32, Kevin Wolf wrote:
> > From: Fam Zheng 
> > 
> > Signed-off-by: Fam Zheng 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  qemu-img-cmds.hx | 36 ++--
> >  1 file changed, 18 insertions(+), 18 deletions(-)
> > 
> > diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
> > index bf4ce59..e5bc28f 100644
> > --- a/qemu-img-cmds.hx
> > +++ b/qemu-img-cmds.hx
> 
> [...]
> 
> >  DEF("convert", img_convert,
> > -"convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f 
> > fmt] [-t cache] [-T src_cache] [-O output_fmt] [-B backing_file] [-o 
> > options] [-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m 
> > num_coroutines] [-W] filename [filename2 [...]] output_filename")
> > +"convert [--object objectdef] [--image-opts] [-U] [-c] [-p] [-q] [-n] 
> > [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s 
> > snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m 
> > num_coroutines] [-W] filename [filename2 [...]] output_filename")
> >  STEXI
> > -@item convert [--object @var{objectdef}] [--image-opts] [-c] [-p] [-q] 
> > [-n] [-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O 
> > @var{output_fmt}] [-B @var{backing_file}] [-o @var{options}] [-s 
> > @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}] 
> > [-m @var{num_coroutines}] [-W] @var{filename} [@var{filename2} [...]] 
> > @var{output_filename}
> > +@item convert [--object @var{objectdef}] [--image-opts] [-U] [-c] [-p] 
> > [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O 
> > @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_id_or_name}] [-l 
> > @var{snapshot_param}] [-S @var{sparse_size}] [-m @var{num_coroutines}] [-W] 
> > @var{filename} [@var{filename2} [...]] @var{output_filename}
> 
> Soo... Who gets to add the -B documentation back?
> 

Oops, lost during a rebase, must be.

I'll send a patch! Thanks for catching this.

Fam



Re: [Qemu-devel] QEMU seg-fault with intermediate image streaming -- bdrv_reopen() in stream_start()

2017-05-15 Thread Alberto Garcia
On Sat 13 May 2017 12:45:36 AM CEST, Kashyap Chamarthy wrote:
> Try to perform intermediate streaming (pull clusters from 'disk1.qcow2'
> into 'b.qcow2':
>
> (QEMU) block-stream device=#block832 base=disk1.qcow2
>
>
> Result
> --
>
> QEMU crashes with SIGSEGV:

I see the problem, I'll send a patch now.

Thanks for reporting it!

Berto



Re: [Qemu-devel] e1000e migration

2017-05-15 Thread Dr. David Alan Gilbert
* Dmitry Fleytman (dmi...@daynix.com) wrote:
> Hello Dave,

Hi Dmitry,
  Thanks for the reply.

> We are trying to reproduce this issue on our systems but with no luck so far…

Note our QE hit this with both a Win8.1 and a win2012r2 guest - although
the 2012r2 is reported to have recoverd after a few minutes.
2016 apparently works OK.

> From what you describe it looks like some bit in ICR is not being cleared by 
> the driver.
> This usually means that this bit should never be set in that specific 
> interrupt mode.
> 
> Could you please check which bit is not cleared and who sets it?

The full set of e1000e_irq_pending_interrupts after migration is:
23004@1494519346.673905:e1000e_irq_pending_interrupts ICR PENDING: 0x10 
(ICR: 0x80100082, IMS: 0x1f4)
23004@1494519346.674787:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x80100082, IMS: 0x1e4)
23004@1494519346.674946:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x80100082, IMS: 0x1e4)
23004@1494519346.675119:e1000e_irq_pending_interrupts ICR PENDING: 0x20 
(ICR: 0x80300082, IMS: 0x1e4)
23004@1494519346.675302:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x80100082, IMS: 0x1c4)
  
23004@1494519346.716279:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x80300082, IMS: 0x1c4)
23004@1494519346.716380:e1000e_irq_pending_interrupts ICR PENDING: 0x100 
(ICR: 0x813000c2, IMS: 0x1c4)
23004@1494519346.717040:e1000e_irq_pending_interrupts ICR PENDING: 0x100 
(ICR: 0x813000c2, IMS: 0x144)
23004@1494519346.717276:e1000e_irq_pending_interrupts ICR PENDING: 0x100 
(ICR: 0x813000c2, IMS: 0x104)
23004@1494519346.717443:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x813000c2, IMS: 0x4)
23004@1494519346.717567:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x813000c2, IMS: 0x4)
23004@1494519346.717782:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x813000c2, IMS: 0x4)
23004@1494519346.717918:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x813000c2, IMS: 0x4)
23004@1494519346.718319:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x813000c2, IMS: 0x4)
23004@1494519346.718523:e1000e_irq_pending_interrupts ICR PENDING: 0x20 
(ICR: 0x813000c2, IMS: 0xa4)
23004@1494519346.718684:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x811000c2, IMS: 0x4)
23004@1494519346.718890:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x811000c2, IMS: 0x4)
23004@1494519346.719034:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x811000c2, IMS: 0xa4)
23004@1494519346.719130:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x811000c2, IMS: 0xa4)
  
23004@1494519346.722699:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x811000c2, IMS: 0xa4)
23004@1494519346.722868:e1000e_irq_pending_interrupts ICR PENDING: 0x20 
(ICR: 0x813000c2, IMS: 0xa4)
23004@1494519346.723068:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x811000c2, IMS: 0x84)
  
23004@1494519346.731198:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x813000c2, IMS: 0x84)
23004@1494519346.731422:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x813000c2, IMS: 0x4)
23004@1494519346.731930:e1000e_irq_pending_interrupts ICR PENDING: 0x20 
(ICR: 0x813000c2, IMS: 0xa4)
23004@1494519346.732082:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x811000c2, IMS: 0x4)
23004@1494519346.732274:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x811000c2, IMS: 0x4)
23004@1494519346.732404:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x811000c2, IMS: 0xa4)
23004@1494519346.732504:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x811000c2, IMS: 0xa4)
23004@1494519346.784150:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x815000c2, IMS: 0xa4)
23004@1494519346.786506:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x815000c2, IMS: 0xa4)
23004@1494519346.786534:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x815000c2, IMS: 0xa4)
23004@1494519346.789644:e1000e_irq_pending_interrupts ICR PENDING: 0x100 
(ICR: 0x815000c2, IMS: 0x1a4)
23004@1494519346.789864:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x815000c2, IMS: 0xa4)
23004@1494519346.789992:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x815000c2, IMS: 0xa4)
23004@1494519346.790413:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x815000c2, IMS: 0xa4)
23004@1494519346.790539:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x815000c2, IMS: 0xa4)
23004@1494519346.792593:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x815000c2, IMS: 0xa4)
23004@1494519346.792620:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x815000c2, IMS: 0xa4)
23004@1494519346.795943:e1000e_irq_pending_interrupts ICR PENDING: 0x100 
(ICR: 0x815000c2, IMS: 0x1a4)

and then I think we get stuck in this cycle of this one always being the
one that fires repeatedly.  I think that

Re: [Qemu-devel] [PATCH] virtio-net: keep the packet layout intact

2017-05-15 Thread Wei Wang

Ping for comments, thanks.

On 05/11/2017 12:57 PM, Wei Wang wrote:

The current implementation may change the packet layout when
vnet_hdr needs an endianness swap. The layout change causes
one more iov to be added to the iov[] passed from the guest, which
is a barrier to making the TX queue size 1024 due to the possible
off-by-one issue.

This patch changes the implementation to remain the packet layout
intact. In this case, the number of iov[] passed to writev will be
equal to the number obtained from the guest.

Signed-off-by: Wei Wang 
---
  hw/net/virtio-net.c | 29 ++---
  1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7d091c9..a51e9a8 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1287,8 +1287,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
  for (;;) {
  ssize_t ret;
  unsigned int out_num;
-struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], 
*out_sg;
-struct virtio_net_hdr_mrg_rxbuf mhdr;
+struct iovec sg[VIRTQUEUE_MAX_SIZE], mhdr, *out_sg;
  
  elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));

  if (!elem) {
@@ -1305,7 +1304,10 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
  }
  
  if (n->has_vnet_hdr) {

-if (iov_to_buf(out_sg, out_num, 0, &mhdr, n->guest_hdr_len) <
+/* Buffer to copy vnet_hdr and the possible adjacent data */
+mhdr.iov_len = out_sg[0].iov_len;
+mhdr.iov_base = g_malloc0(mhdr.iov_len);
+if (iov_to_buf(out_sg, out_num, 0, mhdr.iov_base, mhdr.iov_len) <
  n->guest_hdr_len) {
  virtio_error(vdev, "virtio-net header incorrect");
  virtqueue_detach_element(q->tx_vq, elem, 0);
@@ -1313,17 +1315,12 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
  return -EINVAL;
  }
  if (n->needs_vnet_hdr_swap) {
-virtio_net_hdr_swap(vdev, (void *) &mhdr);
-sg2[0].iov_base = &mhdr;
-sg2[0].iov_len = n->guest_hdr_len;
-out_num = iov_copy(&sg2[1], ARRAY_SIZE(sg2) - 1,
-   out_sg, out_num,
-   n->guest_hdr_len, -1);
-if (out_num == VIRTQUEUE_MAX_SIZE) {
-goto drop;
-   }
-out_num += 1;
-out_sg = sg2;
+virtio_net_hdr_swap(vdev, mhdr.iov_base);
+/* Copy the first iov where the vnet_hdr resides in */
+out_num = iov_copy(sg, 1, &mhdr, 1, 0, mhdr.iov_len);
+out_num += iov_copy(sg + 1, ARRAY_SIZE(sg) - 1, out_sg + 1,
+elem->out_num - 1, 0, -1);
+out_sg = sg;
}
  }
  /*
@@ -1345,13 +1342,15 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
  
  ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic, queue_index),

out_sg, out_num, 
virtio_net_tx_complete);
+if (n->has_vnet_hdr) {
+g_free(mhdr.iov_base);
+}
  if (ret == 0) {
  virtio_queue_set_notification(q->tx_vq, 0);
  q->async_tx.elem = elem;
  return -EBUSY;
  }
  
-drop:

  virtqueue_push(q->tx_vq, elem, 0);
  virtio_notify(vdev, q->tx_vq);
  g_free(elem);





Re: [Qemu-devel] [PATCH 3/4] exec: further use is_mmio

2017-05-15 Thread Peter Xu
On Mon, May 15, 2017 at 11:04:52AM +0200, Paolo Bonzini wrote:
> 
> 
> On 15/05/2017 10:50, Peter Xu wrote:
> > This is MMIO-specific tunes on the size. Let's skip it for non-MMIO
> > translations.
> 
> Can you explain this better?  This is not specific to MMIO, in fact it's
> for RAM...

Yes, I misread the codes and comments, sorry...

Please ignore this patch.

-- 
Peter Xu



Re: [Qemu-devel] KVM "fake DAX" device flushing

2017-05-15 Thread Stefan Hajnoczi
On Fri, May 12, 2017 at 06:53:44PM +0200, Kevin Wolf wrote:
> Am 12.05.2017 um 15:42 hat Stefan Hajnoczi geschrieben:
> > On Thu, May 11, 2017 at 05:38:40PM -0400, Rik van Riel wrote:
> > > On Thu, 2017-05-11 at 14:17 -0400, Stefan Hajnoczi wrote:
> > > > On Wed, May 10, 2017 at 09:26:00PM +0530, Pankaj Gupta wrote:
> > > > > * For live migration use case, if host side backing file is 
> > > > >   shared storage, we need to flush the page cache for the disk 
> > > > >   image at the destination (new fadvise interface,
> > > > > FADV_INVALIDATE_CACHE?) 
> > > > >   before starting execution of the guest on the destination host.
> > > > 
> > > > Good point.  QEMU currently only supports live migration with
> > > > O_DIRECT.
> > > > I think the problem was that userspace cannot guarantee consistency
> > > > in
> > > > the general case.  If you find a solution to this problem for fake
> > > > NVDIMM then maybe the QEMU block layer can also begin supporting live
> > > > migration with buffered I/O.
> > > 
> > > I'll be happy to work with you on that, independently
> > > of Pankaj's project.
> > > 
> > > It looks like the fadvise system call could be extended
> > > pretty easily with an FADV_INVALIDATE_CACHE command, the
> > > other side of which can simply hook into the existing
> > > page cache invalidation code in the kernel.
> > > 
> > > Qemu will need to know whether the invalidation succeeded,
> > > but that is something we can test for pretty easily before
> > > returning to userspace.
> > 
> > Sounds great.  I will review the long discussions that took place on
> > qemu-devel about cache invalidation for live migration - just want to
> > make sure there were no other reasons why only O_DIRECT is supported
> > :).
> 
> There are other reasons why we recommend against using non-O_DIRECT
> modes in production (including the error handling), but with respect to
> live migration, this is the only one I'm aware of.
> 
> As I already said in the private email thread, an FADV_INVALIDATE_CACHE
> should do the trick and I'd be happy to work with you guys on that.

Okay, I didn't know you and Rik had already discussed this in private.
The QEMU change is probably not difficult.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH RESEND v2 02/21] sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE

2017-05-15 Thread Markus Armbruster
Eduardo Habkost  writes:

> commit 33cd52b5d7b9adfd009e95f07e6c64dd88ae2a31 unset
> cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all
> sysbus devices appear on "-device help" and lack the "no-user"
> flag in "info qdm".
>
> To fix this, we can set user_creatable=false by default on
> TYPE_SYS_BUS_DEVICE, but this requires setting
> user_creatable=true explicitly on the sysbus devices that
> actually work with -device.
>
> Fortunately today we have just a few has_dynamic_sysbus=1
> machines: virt, pc-q35-*, ppce500, and spapr.
>
> virt, ppce500, and spapr have extra checks to ensure just a few
> device types can be instantiated:
>
> * virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE.
> * ppce500 supports only TYPE_ETSEC_COMMON.
> * spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE.
>
> This patch sets user_creatable=true explicitly on those 4 device
> classes.

Correct me if I'm wrong: the devices become user_creatable regardless of
machine type, but they're actually creatable only for specific machine
types.  Not perfect, but much better than what we have now.

> Now, the more complex cases:
>
> pc-q35-*: q35 has no sysbus device whitelist yet (which is a
> separate bug). We are in the process of fixing it and building a
> sysbus whitelist on q35, but in the meantime we can fix the
> "-device help" and "info qdm" bugs mentioned above. Also, despite
> not being strictly necessary for fixing the q35 bug, reducing the
> list of user_creatable=true devices will help us be more
> confident when building the q35 whitelist.
>
> xen: We also have a hack at xen_set_dynamic_sysbus(), that sets
> has_dynamic_sysbus=true at runtime when using the Xen
> accelerator. This hack is only used to allow xen-backend devices
> to be dynamically plugged/unplugged.
>
> This means today we can use -device with the following 22 device
> types, that are the ones compiled into the qemu-system-x86_64 and
> qemu-system-i386 binaries:
>
> * allwinner-ahci
> * amd-iommu
> * cfi.pflash01
> * esp
> * fw_cfg_io
> * fw_cfg_mem
> * generic-sdhci
> * hpet
> * intel-iommu
> * ioapic
> * isabus-bridge
> * kvmclock
> * kvm-ioapic
> * kvmvapic
> * SUNW,fdtwo
> * sysbus-ahci
> * sysbus-fdc
> * sysbus-ohci
> * unimplemented-device
> * virtio-mmio
> * xen-backend
> * xen-sysdev
>
> This patch adds user_creatable=true explicitly to those devices,
> temporarily, just to keep 100% compatibility with existing
> behavior of q35. Subsequent patches will remove
> user_creatable=true from the devices that are really not meant to
> user-creatable on any machine, and remove the FIXME comment from
> the ones that are really supposed to be user-creatable. This is
> being done in separate patches because we still don't have an
> obvious list of devices that will be whitelisted by q35, and I
> would like to get each device reviewed individually.

Later patches indeed remove all the FIXMEs.  Good.

> Cc: Alexander Graf 
> Cc: Alex Williamson 
> Cc: Alistair Francis 
> Cc: Beniamino Galvani 
> Cc: Christian Borntraeger 
> Cc: Cornelia Huck 
> Cc: David Gibson 
> Cc: "Edgar E. Iglesias" 
> Cc: Eduardo Habkost 
> Cc: Frank Blaschka 
> Cc: Gabriel L. Somlo 
> Cc: Gerd Hoffmann 
> Cc: Igor Mammedov 
> Cc: Jason Wang 
> Cc: John Snow 
> Cc: Juergen Gross 
> Cc: Kevin Wolf 
> Cc: Laszlo Ersek 
> Cc: Marcel Apfelbaum 
> Cc: Markus Armbruster 
> Cc: Max Reitz 
> Cc: "Michael S. Tsirkin" 
> Cc: Paolo Bonzini 
> Cc: Peter Maydell 
> Cc: Pierre Morel 
> Cc: Prasad J Pandit 
> Cc: qemu-...@nongnu.org
> Cc: qemu-bl...@nongnu.org
> Cc: qemu-...@nongnu.org
> Cc: Richard Henderson 
> Cc: Rob Herring 
> Cc: Shannon Zhao 
> Cc: sstabell...@kernel.org
> Cc: Thomas Huth 
> Cc: Yi Min Zhao 
> Acked-by: John Snow 
> Acked-by: Juergen Gross 
> Acked-by: Marcel Apfelbaum 
> Signed-off-by: Eduardo Habkost 
> ---
> Changes v1 -> v2:
> * Rewrite commit message: don't pretend we are actually fixing
>   the q35 issue. We're just fixing "info qdm" and "-device help".
>   Making it easier to fix q35 is just a nice side-effect.
> * Rewrite FIXME comments to make it clear that we just have
>   user_creatable=true because we don't know yet if the device
>   should be in the q35 whitelist
> ---
>  hw/block/fdc.c   | 10 ++
>  hw/block/pflash_cfi01.c  |  5 +
>  hw/core/sysbus.c | 11 +++
>  hw/i386/amd_iommu.c  |  5 +
>  hw/i386/intel_iommu.c|  5 +
>  hw/i386/kvm/clock.c  |  5 +
>  hw/i386/kvm/ioapic.c |  5 +
>  hw/i386/kvmvapic.c   |  5 +
>  hw/ide/ahci.c| 10 ++
>  hw/intc/ioapic.c |  5 +
>  hw/isa/isa-bus.c |  5 +
>  hw/misc/unimp.c  |  5 +
>  hw/net/fsl_etsec/etsec.c |  2 ++
>  hw/nvram/fw_cfg.c| 10 ++
>  hw/ppc/spapr_pci.c   |  2 ++
>  hw/scsi/esp.c|  5 +
>  hw/sd/sdhci.c|  5 +
>  hw/timer/hpet.c  |  5 +
>  hw/usb/hcd-ohci.c|  5 +
>  hw/vfio/amd-xgbe.c   |  2 ++

[Qemu-devel] [PATCH] stream: fix crash in stream_start() when block_job_create() fails

2017-05-15 Thread Alberto Garcia
The code that tries to reopen a BlockDriverState in stream_start()
when the creation of a new block job fails crashes because it attempts
to dereference a pointer that is known to be NULL.

This is a regression introduced in a170a91fd3eab6155da39e740381867e,
likely because the code was copied from stream_complete().

Signed-off-by: Alberto Garcia 
---
 block/stream.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/stream.c b/block/stream.c
index 0113710845..52d329f5c6 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -280,6 +280,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 
 fail:
 if (orig_bs_flags != bdrv_get_flags(bs)) {
-bdrv_reopen(bs, s->bs_flags, NULL);
+bdrv_reopen(bs, orig_bs_flags, NULL);
 }
 }
-- 
2.11.0




Re: [Qemu-devel] [PATCH 1/3] migration: Create block capabilities for shared and enable

2017-05-15 Thread Juan Quintela
Eric Blake  wrote:
> On 05/11/2017 11:32 AM, Juan Quintela wrote:
>> Those two capabilities were added through the command line.  Notice that
>> we just created them.  This is just the boilerplate.
>> 
>> Signed-off-by: Juan Quintela 
>> Reviewed-by: Eric Blake 
>> 
>> --
>> 
>> Make migrate_set_block_* take a boolean argument.
>
> Question - do we support the orthogonal selection of all 4 combinations
> under HMP 'migrate' (no argument, -b alone, -i alone, -b and -i
> together), or are there only 3 actual states? If the latter, should we
> represent this as a single enum-valued property, rather than as two
> independent boolean properties?

{ 'enum': 'MigrationCapability',
  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
   'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram'] }


My understanding is that we can only have boolean capabilities here.
Or, how could we put a non-boolean capability?

There are three states as far as I can see.

Later, Juan.




Re: [Qemu-devel] [PATCH] nbd: strict nbd_wr_syncv

2017-05-15 Thread Vladimir Sementsov-Ogievskiy

12.05.2017 19:30, Paolo Bonzini wrote:


On 12/05/2017 17:57, Vladimir Sementsov-Ogievskiy wrote:

12.05.2017 18:46, Paolo Bonzini wrote:

On 12/05/2017 16:17, Vladimir Sementsov-Ogievskiy wrote:

nbd_wr_syncv is called either from coroutine or from client negotiation
code, when socket is in blocking mode. So, -EAGAIN is impossible.

Furthermore, EAGAIN is confusing, as, what to read/write again? With
EAGAIN as a return code we don't know how much data is already
read or written by the function, so in case of EAGAIN the whole
communication is broken.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all!

If I understand right, nbd_wr_syncv is called either from coroutine
or from client negotiation code, when socket is in blocking mode.
So, some thoughts

1. let's establish this with an assert, because returning EAGAIN is
 confusing (see above)

Yes, this seems like a good idea.


2. should we in case of non-coroutine context start this coroutine in
 nbd_wr_syncv, like in bdrv_prwv_co, and use non-blocking mode?
3. is there any problems or disadvantages of moving  client negotiation
 to coroutine too?

When you move code to coroutines you need to be aware of what code can
now run concurrently, for example the monitor.  I'm not sure that it's
possible to do this.

Hmm, can you please give some example of a problem? qcow2_open starts
coroutines to read it's header, why nbd_open can't start
coroutine/coroutines to read/write some negotiation data?

Ah, it's not a problem if you use synchronous I/O (aio_poll) within the
coroutines.  But then it's still blocking I/O in every way (except
you've fcntl-ed the descriptor to make it non-blocking); it's simply
hidden underneath coroutines and aio_poll.


I mean, make negotiation behave like normal nbd communication, 
non-blocking socket + yield.. So, some other coroutines may do their 
work, while nbd-negotiation coroutine waits for io..




Paolo



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH 1/3] migration: Create block capabilities for shared and enable

2017-05-15 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> Eric Blake  wrote:
> > On 05/11/2017 11:32 AM, Juan Quintela wrote:
> >> Those two capabilities were added through the command line.  Notice that
> >> we just created them.  This is just the boilerplate.
> >> 
> >> Signed-off-by: Juan Quintela 
> >> Reviewed-by: Eric Blake 
> >> 
> >> --
> >> 
> >> Make migrate_set_block_* take a boolean argument.
> >
> > Question - do we support the orthogonal selection of all 4 combinations
> > under HMP 'migrate' (no argument, -b alone, -i alone, -b and -i
> > together), or are there only 3 actual states? If the latter, should we
> > represent this as a single enum-valued property, rather than as two
> > independent boolean properties?
> 
> { 'enum': 'MigrationCapability',
>   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
>'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram'] }
> 
> 
> My understanding is that we can only have boolean capabilities here.
> Or, how could we put a non-boolean capability?

Lets keep this simple and stick with the booleans.

Dave

> There are three states as far as I can see.
> 
> Later, Juan.
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams

2017-05-15 Thread Juan Quintela
Eric Blake  wrote:
> On 05/12/2017 05:55 AM, Juan Quintela wrote:
 @@ -1239,6 +1240,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
 blk,
  }
  
  if (has_inc && inc) {
 +migrate_set_block_enabled(s, true);
  migrate_set_block_shared(s, true);
>>>
>>> [2]
>>>
>>> IIUC for [1] & [2] we are solving the same problem that "shared"
>>> depends on "enabled" bit. Would it be good to unitfy this dependency
>>> somewhere? E.g., by changing migrate_set_block_shared() into:
>>>
>>> void migrate_set_block_shared(MigrationState *s, bool value)
>>> {
>>> s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = value;
>>> if (value) {
>>> migrate_set_block_enabled(s, true);
>>> }
>>> }
>> 
>> ok with this.
>
> Or, as I commented on 1/3, maybe having a single property that is a
> tri-state enum value, instead of 2 separate boolean properties, might be
> nicer (but certainly a bit more complex to code up).

If you teach me how to do the qapi/qmp part, I will do the other bits.
I don't really care if we do it one way or the other.

>> I will add once here that when we disable block enabled, we also disable
>> shared, or just let it that way?
>> 
>>> Another thing to mention: after switching to the capability interface,
>>> we'll cache the "enabled" and "shared" bits now while we don't cache
>>> it before, right? IIUC it'll affect behavior of such sequence:
>>>
>>> - 1st migrate with enabled=1, shared=1, then
>>> - 2nd migrate with enabled=0, shared=0
>>>
>>> Before the series, the 2nd migrate will use enabled=shared=0, but
>>> after the series it should be using enabled=shared=1. Not sure whether
>>> this would be a problem (or I missed anything?).
>> 
>> We can't be consistent with both old/new way.
>> 
>> Old way: we always setup the capabilities on command line (that should
>> have been deprecated long, long ago)
>
> Well, the easy way out is to have the HMP migrate command (I assume
> that's what you mean by "on command line") explicitly clear the
> parameters if it is called without the -b/-i flag.  So the start of each
> migration is what changes the properties, so long as you are still using
> HMP to start the migration.  Or, on the QMP side, since 'migrate' has
> optional 'blk' and 'inc' booleans, basically leave the settings alone if
> the parameters were omitted, and explicitly update the property to the
> value of those parameters if they were present.

We are going to have trouble whatever way that we do it, or we start
doing lots of strange things.

Forget about qmp, we are going to assume that it is consistent with hmp.

migrate_set_capabilities block_enabled on
migrate -b .

Should migrate disable the block_enabled capability?  Give one
warning/error?

And notice that this only matter if we do a migration, we cancel/get one
error, and then we migrate again.

What I tried to do is assume that -b/-i arguments don't exist.  And if
the user use them, we implement its behaviour with the minimal fuss
possibly.

Only way that I can think of being consistent and bug compatible will be
to store:
- old block_shared/enanbeld capability value
- if we set -b/-i on the command line

In migration cleanup do the right thing depending on this four
variables.  I think that it is adding lots of complexity for very few
improvement.


> Or is the proposal that we are also going to simplify the QMP 'migrate'
> command to get rid of crufty parameters?

I didn't read it that way, but I would not oppose O:-)

Later, Juan.



[Qemu-devel] ping Re: [PATCH v18 00/25] qcow2: persistent dirty bitmaps

2017-05-15 Thread Vladimir Sementsov-Ogievskiy

ping...

What about this?

03.05.2017 15:25, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

There is a new update of qcow2-bitmap series - v18.

web: 
https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=qcow2-bitmap-v18
git: https://src.openvz.org/scm/~vsementsov/qemu.git (tag qcow2-bitmap-v18)

v18:

rebased on master (sorry for v17)

08: contextual: qcow2_do_open is changed instead of qcow2_open
 rename s/need_update_header/update_header/ in qcow2_do_open, to not do it 
in 10
 save r-b's by Max and John
09: new patch
10: load_bitmap_data: do not clear bitmap parameter - it should be already 
cleared
 (it actually created before single load_bitmap_data() call)
 if some bitmaps are loaded, but we can't write the image (it is readonly
 or inactive), so we can't mark bitmaps "in use" in the image, mark
 corresponding BdrvDirtyBitmap read-only.
 change error_setg to error_setg_errno for "Can't update bitmap directory"
 no needs to rename s/need_update_header/update_header/ here, as it done in 
08
13: function bdrv_has_persistent_bitmaps becomes 
bdrv_has_changed_persistent_bitmaps,
 to handle readonly field.
14: declaration moved to the bottom of .h, save r-b's
15: firstly check bdrv_has_changed_persistent_bitmaps and only then fail on 
!can_write, and then QSIMPLEQ_INIT(&drop_tables)
 skip readonly bitmaps in saving loop
18: remove '#optional', 2.9 -> 2.10, save r-b's
19: remove '#optional', 2.9 -> 2.10, save r-b's
20: 2.9 -> 2.10, save r-b's
21: add check of read-only image open, drop r-b's
24: add comment to qapi/block-core.json, that block-dirty-bitmap-add removes 
bitmap
 from storage. r-b's by Max and John saved


v17:
08: add r-b's by Max and John
09: clear unknown autoclear features from BDRVQcow2State before calling
 qcow2_load_autoloading_dirty_bitmaps(), and also do not extra update
 header if it is updated by qcow2_load_autoloading_dirty_bitmaps().
11: new patch, splitted out from 16
12: rewrite commit message
14: changing bdrv_close moved to separate new patch 11
 s/1/1ULL/ ; s/%u/%llu/ for two errors
16: s/No/Not/
 add Max's r-b
24: new patch


v16:

mostly by Kevin's comments:
07: just moved here, as preparation for merging refcounts to 08
08: move "qcow2-bitmap: refcounts" staff to this patch, to not break qcow2-img 
check
 drop r-b's
 move necessary supporting static functions to this patch too (to not break 
compilation)
 fprintf -> error_report
 other small changes with error messages and comments
 code style
 for qcow2-bitmap.c:
   'include "exec/log.h"' was dropped
   s/1/(1U << 0) for BME_FLAG_IN_USE
   add BME_TABLE_ENTRY_FLAG_ALL_ONES to replace magic 1
   don't check 'cluster_size <= 0' in check_table_entry
old "[PATCH v15 08/25] block: introduce auto-loading bitmaps" was dropped
09: was "[PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps"
 drop r-b's
 some staff was moved to 08
 update_header_sync - error on bdrv_flush fail
 rename disk_sectors_in_bitmap_cluster to sectors_covered_by_bitmap_cluster
  and adjust comment.
  so, variable for storing this function result: s/dsc/sbc/
 s/1/BME_TABLE_ENTRY_FLAG_ALL_ONES/
 also, as Qcow2BitmapTable already introduced in 08,
s/table_offset/table.offset/ and s/table_size/table.size, etc..
 update_ext_header_and_dir_in_place: add comments, add additional
   update_header_sync, to reduce indeterminacy in case of error.
 call qcow2_load_autoloading_dirty_bitmaps directly from qcow2_open
 no .bdrv_load_autoloading_dirty_bitmaps
11: drop bdrv_store_persistent_dirty_bitmaps at all.
 drop r-b's
13: rename patch, rewrite commit msg
 drop r-b's
 move bdrv_release_named_dirty_bitmaps in bdrv_close() after 
drv->bdrv_close()
 Qcow2BitmapTable is already introduced, so no
   s/table_offset/table.offset/ and s/table_size/table.size, etc.. here
 old 25 with check_constraints_on_bitmap() improvements merged here (Eric)
 like in 09, s/dsc/sbc/ and 
s/disk_sectors_in_bitmap_cluster/sectors_covered_by_bitmap_cluster/
 no .bdrv_store_persistent_dirty_bitmaps
 call qcow2_store_persistent_dirty_bitmaps directly from qcow2_inactivate
15: corresponding part of 25 merged here. Add John's r-b, as 25 was also 
reviewed by John.
16: add John's r-b


v15:
13,14: add John's r-b
15: qcow2_can_store_new_dirty_bitmap:
   - check max dirty bitmaps and bitmap directory overhead
   - switch to error_prepend
 rm Max's r-b
 not add John's r-b
17-24: add John's r-b
25: changed because 15 changed,
 not add John's r-b


v14:

07: use '|=' to update need_update_header
 add John's r-b
 add Max's r-b
09: remove unused bitmap_table_to_cpu()
 left Max's r-b, hope it's ok
 add John's r-b
10: remove extra new line
 add John's r-b
11: add John's r-b
12: add John's r-b
13: small fixes by John's review:
- remove weird g_free of NULL 

[Qemu-devel] [PATCH v3 1/7] curl: strengthen assertion in curl_clean_state

2017-05-15 Thread Paolo Bonzini
curl_clean_state should only be called after all AIOCBs have been
completed.  This is not so obvious for the call from curl_detach_aio_context,
so assert that.

Cc: qemu-sta...@nongnu.org
Reviewed-by: Jeff Cody 
Signed-off-by: Paolo Bonzini 
---
 block/curl.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/curl.c b/block/curl.c
index aa6e8cc0e5..010b0604c5 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -532,6 +532,11 @@ static CURLState *curl_init_state(BlockDriverState *bs, 
BDRVCURLState *s)
 
 static void curl_clean_state(CURLState *s)
 {
+int j;
+for (j=0; j < CURL_NUM_ACB; j++) {
+assert(!s->acb[j]);
+}
+
 if (s->s->multi)
 curl_multi_remove_handle(s->s->multi, s->curl);
 
-- 
2.12.2





[Qemu-devel] [PATCH v3 0/7] curl: locking cleanups/fixes, coroutine conversion, remove aio_poll

2017-05-15 Thread Paolo Bonzini
Compared to v2, this silences checkpatch and correctly destroy the mutex on
exiting from curl_open with an error.

Paolo

Paolo Bonzini (7):
  curl: strengthen assertion in curl_clean_state
  curl: never invoke callbacks with s->mutex held
  curl: avoid recursive locking of BDRVCURLState mutex
  curl: split curl_find_state/curl_init_state
  curl: convert CURLAIOCB to byte values
  curl: convert readv to coroutines
  curl: do not do aio_poll when waiting for a free CURLState

 block/curl.c | 217 +--
 1 file changed, 121 insertions(+), 96 deletions(-)

-- 
2.12.2




[Qemu-devel] [PATCH v3 3/7] curl: avoid recursive locking of BDRVCURLState mutex

2017-05-15 Thread Paolo Bonzini
The curl driver has a ugly hack where, if it cannot find an empty CURLState,
it just uses aio_poll to wait for one to be empty.  This is probably
buggy when used together with dataplane, and the simplest way to fix it
is to use coroutines instead.

A more immediate effect of the bug however is that it can cause a
recursive call to curl_readv_bh_cb and recursively taking the
BDRVCURLState mutex.  This causes a deadlock.

The fix is to unlock the mutex around aio_poll, but for cleanliness we
should also take the mutex around all calls to curl_init_state, even if
reaching the unlock/lock pair is impossible.  The same is true for
curl_clean_state.

Reported-by: Kun Wei 
Tested-by: Richard W.M. Jones 
Cc: qemu-sta...@nongnu.org
Cc: Jeff Cody 
Signed-off-by: Paolo Bonzini 
---
v2->v3: add missing qemu_mutex_destroy on failure case

 block/curl.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 924c2a2088..5e6ddf3a34 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -281,6 +281,7 @@ read_end:
 return size * nmemb;
 }
 
+/* Called with s->mutex held.  */
 static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len,
  CURLAIOCB *acb)
 {
@@ -453,6 +454,7 @@ static void curl_multi_timeout_do(void *arg)
 #endif
 }
 
+/* Called with s->mutex held.  */
 static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
 {
 CURLState *state = NULL;
@@ -471,7 +473,9 @@ static CURLState *curl_init_state(BlockDriverState *bs, 
BDRVCURLState *s)
 break;
 }
 if (!state) {
+qemu_mutex_unlock(&s->mutex);
 aio_poll(bdrv_get_aio_context(bs), true);
+qemu_mutex_lock(&s->mutex);
 }
 } while(!state);
 
@@ -534,6 +538,7 @@ static CURLState *curl_init_state(BlockDriverState *bs, 
BDRVCURLState *s)
 return state;
 }
 
+/* Called with s->mutex held.  */
 static void curl_clean_state(CURLState *s)
 {
 int j;
@@ -565,6 +570,7 @@ static void curl_detach_aio_context(BlockDriverState *bs)
 BDRVCURLState *s = bs->opaque;
 int i;
 
+qemu_mutex_lock(&s->mutex);
 for (i = 0; i < CURL_NUM_STATES; i++) {
 if (s->states[i].in_use) {
 curl_clean_state(&s->states[i]);
@@ -580,6 +586,7 @@ static void curl_detach_aio_context(BlockDriverState *bs)
 curl_multi_cleanup(s->multi);
 s->multi = NULL;
 }
+qemu_mutex_unlock(&s->mutex);
 
 timer_del(&s->timer);
 }
@@ -677,6 +684,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EROFS;
 }
 
+qemu_mutex_init(&s->mutex);
 opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
 qemu_opts_absorb_qdict(opts, options, &local_err);
 if (local_err) {
@@ -747,7 +755,9 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 DPRINTF("CURL: Opening %s\n", file);
 s->aio_context = bdrv_get_aio_context(bs);
 s->url = g_strdup(file);
+qemu_mutex_lock(&s->mutex);
 state = curl_init_state(bs, s);
+qemu_mutex_unlock(&s->mutex);
 if (!state)
 goto out_noclean;
 
@@ -791,11 +801,12 @@ static int curl_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 DPRINTF("CURL: Size = %zd\n", s->len);
 
+qemu_mutex_lock(&s->mutex);
 curl_clean_state(state);
+qemu_mutex_unlock(&s->mutex);
 curl_easy_cleanup(state->curl);
 state->curl = NULL;
 
-qemu_mutex_init(&s->mutex);
 curl_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
 qemu_opts_del(opts);
@@ -806,6 +817,7 @@ out:
 curl_easy_cleanup(state->curl);
 state->curl = NULL;
 out_noclean:
+qemu_mutex_destroy(&s->mutex);
 g_free(s->cookie);
 g_free(s->url);
 qemu_opts_del(opts);
-- 
2.12.2





[Qemu-devel] [PATCH v3 4/7] curl: split curl_find_state/curl_init_state

2017-05-15 Thread Paolo Bonzini
If curl_easy_init fails, a CURLState is left with s->in_use = 1.  Split
curl_init_state in two, so that we can distinguish the two failures and
call curl_clean_state if needed.

While at it, simplify curl_find_state, removing a dummy loop.  The
aio_poll loop is moved to the sole caller that needs it.

Reviewed-by: Jeff Cody 
Signed-off-by: Paolo Bonzini 
---
 block/curl.c | 52 ++--
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 5e6ddf3a34..29132ab835 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -455,34 +455,27 @@ static void curl_multi_timeout_do(void *arg)
 }
 
 /* Called with s->mutex held.  */
-static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
+static CURLState *curl_find_state(BDRVCURLState *s)
 {
 CURLState *state = NULL;
-int i, j;
-
-do {
-for (i=0; istates[i].acb[j])
-continue;
-if (s->states[i].in_use)
-continue;
+int i;
 
+for (i=0; i < CURL_NUM_STATES; i++) {
+if (!s->states[i].in_use) {
 state = &s->states[i];
 state->in_use = 1;
 break;
 }
-if (!state) {
-qemu_mutex_unlock(&s->mutex);
-aio_poll(bdrv_get_aio_context(bs), true);
-qemu_mutex_lock(&s->mutex);
-}
-} while(!state);
+}
+return state;
+}
 
+static int curl_init_state(BDRVCURLState *s, CURLState *state)
+{
 if (!state->curl) {
 state->curl = curl_easy_init();
 if (!state->curl) {
-return NULL;
+return -EIO;
 }
 curl_easy_setopt(state->curl, CURLOPT_URL, s->url);
 curl_easy_setopt(state->curl, CURLOPT_SSL_VERIFYPEER,
@@ -535,7 +528,7 @@ static CURLState *curl_init_state(BlockDriverState *bs, 
BDRVCURLState *s)
 QLIST_INIT(&state->sockets);
 state->s = s;
 
-return state;
+return 0;
 }
 
 /* Called with s->mutex held.  */
@@ -756,13 +749,18 @@ static int curl_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->aio_context = bdrv_get_aio_context(bs);
 s->url = g_strdup(file);
 qemu_mutex_lock(&s->mutex);
-state = curl_init_state(bs, s);
+state = curl_find_state(s);
 qemu_mutex_unlock(&s->mutex);
-if (!state)
+if (!state) {
 goto out_noclean;
+}
 
 // Get file size
 
+if (curl_init_state(s, state) < 0) {
+goto out;
+}
+
 s->accept_range = false;
 curl_easy_setopt(state->curl, CURLOPT_NOBODY, 1);
 curl_easy_setopt(state->curl, CURLOPT_HEADERFUNCTION,
@@ -857,8 +855,18 @@ static void curl_readv_bh_cb(void *p)
 }
 
 // No cache found, so let's start a new request
-state = curl_init_state(acb->common.bs, s);
-if (!state) {
+for (;;) {
+state = curl_find_state(s);
+if (state) {
+break;
+}
+qemu_mutex_unlock(&s->mutex);
+aio_poll(bdrv_get_aio_context(bs), true);
+qemu_mutex_lock(&s->mutex);
+}
+
+if (curl_init_state(s, state) < 0) {
+curl_clean_state(state);
 ret = -EIO;
 goto out;
 }
-- 
2.12.2





[Qemu-devel] [PATCH v3 6/7] curl: convert readv to coroutines

2017-05-15 Thread Paolo Bonzini
This is pretty simple.  The bottom half goes away because, unlike
bdrv_aio_readv, coroutine-based read can return immediately without
yielding.  However, for simplicity I kept the former bottom half
handler in a separate function.

Reviewed-by: Jeff Cody 
Signed-off-by: Paolo Bonzini 
---
 block/curl.c | 94 
 1 file changed, 38 insertions(+), 56 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 401a1b1015..86a4a73834 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -76,10 +76,6 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 #define CURL_TIMEOUT_DEFAULT 5
 #define CURL_TIMEOUT_MAX 1
 
-#define FIND_RET_NONE   0
-#define FIND_RET_OK 1
-#define FIND_RET_WAIT   2
-
 #define CURL_BLOCK_OPT_URL   "url"
 #define CURL_BLOCK_OPT_READAHEAD "readahead"
 #define CURL_BLOCK_OPT_SSLVERIFY "sslverify"
@@ -93,11 +89,12 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 struct BDRVCURLState;
 
 typedef struct CURLAIOCB {
-BlockAIOCB common;
+Coroutine *co;
 QEMUIOVector *qiov;
 
 uint64_t offset;
 uint64_t bytes;
+int ret;
 
 size_t start;
 size_t end;
@@ -268,11 +265,11 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
   request_length - offset);
 }
 
+acb->ret = 0;
+s->acb[i] = NULL;
 qemu_mutex_unlock(&s->s->mutex);
-acb->common.cb(acb->common.opaque, 0);
+aio_co_wake(acb->co);
 qemu_mutex_lock(&s->s->mutex);
-qemu_aio_unref(acb);
-s->acb[i] = NULL;
 }
 }
 
@@ -282,8 +279,8 @@ read_end:
 }
 
 /* Called with s->mutex held.  */
-static int curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len,
- CURLAIOCB *acb)
+static bool curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len,
+  CURLAIOCB *acb)
 {
 int i;
 uint64_t end = start + len;
@@ -312,7 +309,8 @@ static int curl_find_buf(BDRVCURLState *s, uint64_t start, 
uint64_t len,
 if (clamped_len < len) {
 qemu_iovec_memset(acb->qiov, clamped_len, 0, len - 
clamped_len);
 }
-return FIND_RET_OK;
+acb->ret = 0;
+return true;
 }
 
 // Wait for unfinished chunks
@@ -330,13 +328,13 @@ static int curl_find_buf(BDRVCURLState *s, uint64_t 
start, uint64_t len,
 for (j=0; jacb[j]) {
 state->acb[j] = acb;
-return FIND_RET_WAIT;
+return true;
 }
 }
 }
 }
 
-return FIND_RET_NONE;
+return false;
 }
 
 /* Called with s->mutex held.  */
@@ -381,11 +379,11 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 continue;
 }
 
+acb->ret = -EIO;
+state->acb[i] = NULL;
 qemu_mutex_unlock(&s->mutex);
-acb->common.cb(acb->common.opaque, -EIO);
+aio_co_wake(acb->co);
 qemu_mutex_lock(&s->mutex);
-qemu_aio_unref(acb);
-state->acb[i] = NULL;
 }
 }
 
@@ -822,19 +820,11 @@ out_noclean:
 return -EINVAL;
 }
 
-static const AIOCBInfo curl_aiocb_info = {
-.aiocb_size = sizeof(CURLAIOCB),
-};
-
-
-static void curl_readv_bh_cb(void *p)
+static void curl_setup_preadv(BlockDriverState *bs, CURLAIOCB *acb)
 {
 CURLState *state;
 int running;
-int ret = -EINPROGRESS;
 
-CURLAIOCB *acb = p;
-BlockDriverState *bs = acb->common.bs;
 BDRVCURLState *s = bs->opaque;
 
 uint64_t start = acb->offset;
@@ -844,14 +834,8 @@ static void curl_readv_bh_cb(void *p)
 
 // In case we have the requested data already (e.g. read-ahead),
 // we can just call the callback and be done.
-switch (curl_find_buf(s, start, acb->bytes, acb)) {
-case FIND_RET_OK:
-ret = 0;
-goto out;
-case FIND_RET_WAIT:
-goto out;
-default:
-break;
+if (curl_find_buf(s, start, acb->bytes, acb)) {
+goto out;
 }
 
 // No cache found, so let's start a new request
@@ -867,7 +851,7 @@ static void curl_readv_bh_cb(void *p)
 
 if (curl_init_state(s, state) < 0) {
 curl_clean_state(state);
-ret = -EIO;
+acb->ret = -EIO;
 goto out;
 }
 
@@ -882,7 +866,7 @@ static void curl_readv_bh_cb(void *p)
 state->orig_buf = g_try_malloc(state->buf_len);
 if (state->buf_len && state->orig_buf == NULL) {
 curl_clean_state(state);
-ret = -ENOMEM;
+acb->ret = -ENOMEM;
 goto out;
 }
 state->acb[0] = acb;
@@ -899,26 +883,24 @@ static void curl_readv_bh_cb(void *p)
 
 out:
 qemu_mutex_unlock(&s->mutex

[Qemu-devel] [PATCH v3 2/7] curl: never invoke callbacks with s->mutex held

2017-05-15 Thread Paolo Bonzini
All curl callbacks go through curl_multi_do, and hence are called with
s->mutex held.  Note that with comments, and make curl_read_cb drop the
lock before invoking the callback.

Likewise for curl_find_buf, where the callback can be invoked by the
caller.

Cc: qemu-sta...@nongnu.org
Reviewed-by: Jeff Cody 
Signed-off-by: Paolo Bonzini 
---
 block/curl.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 010b0604c5..924c2a2088 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -147,6 +147,7 @@ static void curl_multi_do(void *arg);
 static void curl_multi_read(void *arg);
 
 #ifdef NEED_CURL_TIMER_CALLBACK
+/* Called from curl_multi_do_locked, with s->mutex held.  */
 static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque)
 {
 BDRVCURLState *s = opaque;
@@ -163,6 +164,7 @@ static int curl_timer_cb(CURLM *multi, long timeout_ms, 
void *opaque)
 }
 #endif
 
+/* Called from curl_multi_do_locked, with s->mutex held.  */
 static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
 void *userp, void *sp)
 {
@@ -212,6 +214,7 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int 
action,
 return 0;
 }
 
+/* Called from curl_multi_do_locked, with s->mutex held.  */
 static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void 
*opaque)
 {
 BDRVCURLState *s = opaque;
@@ -226,6 +229,7 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
 return realsize;
 }
 
+/* Called from curl_multi_do_locked, with s->mutex held.  */
 static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
 {
 CURLState *s = ((CURLState*)opaque);
@@ -264,7 +268,9 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
   request_length - offset);
 }
 
+qemu_mutex_unlock(&s->s->mutex);
 acb->common.cb(acb->common.opaque, 0);
+qemu_mutex_lock(&s->s->mutex);
 qemu_aio_unref(acb);
 s->acb[i] = NULL;
 }
@@ -305,8 +311,6 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, 
size_t len,
 if (clamped_len < len) {
 qemu_iovec_memset(acb->qiov, clamped_len, 0, len - 
clamped_len);
 }
-acb->common.cb(acb->common.opaque, 0);
-
 return FIND_RET_OK;
 }
 
@@ -832,8 +836,8 @@ static void curl_readv_bh_cb(void *p)
 // we can just call the callback and be done.
 switch (curl_find_buf(s, start, acb->nb_sectors * BDRV_SECTOR_SIZE, acb)) {
 case FIND_RET_OK:
-qemu_aio_unref(acb);
-// fall through
+ret = 0;
+goto out;
 case FIND_RET_WAIT:
 goto out;
 default:
-- 
2.12.2





Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams

2017-05-15 Thread Peter Xu
On Fri, May 12, 2017 at 12:55:22PM +0200, Juan Quintela wrote:
> Peter Xu  wrote:
> > On Thu, May 11, 2017 at 06:32:27PM +0200, Juan Quintela wrote:
> 
> >> @@ -1214,9 +1218,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> >> blk,
> >>  MigrationParams params;
> >>  const char *p;
> >>  
> >> -params.blk = has_blk && blk;
> >> -params.shared = has_inc && inc;
> >> -
> >>  if (migration_is_setup_or_active(s->state) ||
> >>  s->state == MIGRATION_STATUS_CANCELLING ||
> >>  s->state == MIGRATION_STATUS_COLO) {
> >> @@ -1239,6 +1240,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> >> blk,
> >>  }
> >>  
> >>  if (has_inc && inc) {
> >> +migrate_set_block_enabled(s, true);
> >>  migrate_set_block_shared(s, true);
> >
> > [2]
> >
> > IIUC for [1] & [2] we are solving the same problem that "shared"
> > depends on "enabled" bit. Would it be good to unitfy this dependency
> > somewhere? E.g., by changing migrate_set_block_shared() into:
> >
> > void migrate_set_block_shared(MigrationState *s, bool value)
> > {
> > s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = value;
> > if (value) {
> > migrate_set_block_enabled(s, true);
> > }
> > }
> 
> ok with this.
> I will add once here that when we disable block enabled, we also disable
> shared, or just let it that way?

I should mark "nitpick" in my above comment. Any way works for me. :)

> 
> > Another thing to mention: after switching to the capability interface,
> > we'll cache the "enabled" and "shared" bits now while we don't cache
> > it before, right? IIUC it'll affect behavior of such sequence:
> >
> > - 1st migrate with enabled=1, shared=1, then
> > - 2nd migrate with enabled=0, shared=0
> >
> > Before the series, the 2nd migrate will use enabled=shared=0, but
> > after the series it should be using enabled=shared=1. Not sure whether
> > this would be a problem (or I missed anything?).
> 
> We can't be consistent with both old/new way.
> 
> Old way: we always setup the capabilities on command line (that should
> have been deprecated long, long ago)
> 
> New way: Once set, they stay set.
> 
> So, alternatives are:
> - If we are going to deprecate the old way, just let things as they are
>   on this patch (easier for me O:-)
> 
> - Always disable this features at the end of migration: Compatible with
>   old migration semantics, bet we are inconsistent with all other
>   capabilities.
> 
> - Add yet more code to only disable them when we are setting them
>   through the command line.  More code to maintain.
> 
> My idea would be to deprecate the migrate command line parameter, that
> is the reason why I did the 1st option.  I hope that in due curse, we
> would be able to remove the code.  But if anyone strongly think that any
> of the other options is better, please let me know.

I assume that sending continuous "migrate" is very rare, right (after
all, normally source VM is useless after that...)? I was just trying
to post this question up, and so... If you/Dave/others won't worry
about its compatibility, I won't either. ;)

Thanks!

-- 
Peter Xu



[Qemu-devel] [PATCH v3 5/7] curl: convert CURLAIOCB to byte values

2017-05-15 Thread Paolo Bonzini
This is in preparation for the conversion from bdrv_aio_readv to
bdrv_co_preadv, and it also requires changing some of the size_t values
to uint64_t.  This was broken before for disks > 2TB, but now it would
break at 4GB.

Reviewed-by: Jeff Cody 
Signed-off-by: Paolo Bonzini 
---
 block/curl.c | 44 ++--
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 29132ab835..401a1b1015 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -96,8 +96,8 @@ typedef struct CURLAIOCB {
 BlockAIOCB common;
 QEMUIOVector *qiov;
 
-int64_t sector_num;
-int nb_sectors;
+uint64_t offset;
+uint64_t bytes;
 
 size_t start;
 size_t end;
@@ -115,7 +115,7 @@ typedef struct CURLState
 CURL *curl;
 QLIST_HEAD(, CURLSocket) sockets;
 char *orig_buf;
-size_t buf_start;
+uint64_t buf_start;
 size_t buf_off;
 size_t buf_len;
 char range[128];
@@ -126,7 +126,7 @@ typedef struct CURLState
 typedef struct BDRVCURLState {
 CURLM *multi;
 QEMUTimer timer;
-size_t len;
+uint64_t len;
 CURLState states[CURL_NUM_STATES];
 char *url;
 size_t readahead_size;
@@ -257,7 +257,7 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
 continue;
 
 if ((s->buf_off >= acb->end)) {
-size_t request_length = acb->nb_sectors * BDRV_SECTOR_SIZE;
+size_t request_length = acb->bytes;
 
 qemu_iovec_from_buf(acb->qiov, 0, s->orig_buf + acb->start,
 acb->end - acb->start);
@@ -282,18 +282,18 @@ read_end:
 }
 
 /* Called with s->mutex held.  */
-static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len,
+static int curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len,
  CURLAIOCB *acb)
 {
 int i;
-size_t end = start + len;
-size_t clamped_end = MIN(end, s->len);
-size_t clamped_len = clamped_end - start;
+uint64_t end = start + len;
+uint64_t clamped_end = MIN(end, s->len);
+uint64_t clamped_len = clamped_end - start;
 
 for (i=0; istates[i];
-size_t buf_end = (state->buf_start + state->buf_off);
-size_t buf_fend = (state->buf_start + state->buf_len);
+uint64_t buf_end = (state->buf_start + state->buf_off);
+uint64_t buf_fend = (state->buf_start + state->buf_len);
 
 if (!state->orig_buf)
 continue;
@@ -788,7 +788,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 #endif
 
-s->len = (size_t)d;
+s->len = d;
 
 if ((!strncasecmp(s->url, "http://";, strlen("http://";))
 || !strncasecmp(s->url, "https://";, strlen("https://";)))
@@ -797,7 +797,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 "Server does not support 'range' (byte ranges).");
 goto out;
 }
-DPRINTF("CURL: Size = %zd\n", s->len);
+DPRINTF("CURL: Size = %" PRIu64 "\n", s->len);
 
 qemu_mutex_lock(&s->mutex);
 curl_clean_state(state);
@@ -837,14 +837,14 @@ static void curl_readv_bh_cb(void *p)
 BlockDriverState *bs = acb->common.bs;
 BDRVCURLState *s = bs->opaque;
 
-size_t start = acb->sector_num * BDRV_SECTOR_SIZE;
-size_t end;
+uint64_t start = acb->offset;
+uint64_t end;
 
 qemu_mutex_lock(&s->mutex);
 
 // In case we have the requested data already (e.g. read-ahead),
 // we can just call the callback and be done.
-switch (curl_find_buf(s, start, acb->nb_sectors * BDRV_SECTOR_SIZE, acb)) {
+switch (curl_find_buf(s, start, acb->bytes, acb)) {
 case FIND_RET_OK:
 ret = 0;
 goto out;
@@ -872,7 +872,7 @@ static void curl_readv_bh_cb(void *p)
 }
 
 acb->start = 0;
-acb->end = MIN(acb->nb_sectors * BDRV_SECTOR_SIZE, s->len - start);
+acb->end = MIN(acb->bytes, s->len - start);
 
 state->buf_off = 0;
 g_free(state->orig_buf);
@@ -887,9 +887,9 @@ static void curl_readv_bh_cb(void *p)
 }
 state->acb[0] = acb;
 
-snprintf(state->range, 127, "%zd-%zd", start, end);
-DPRINTF("CURL (AIO): Reading %llu at %zd (%s)\n",
-(acb->nb_sectors * BDRV_SECTOR_SIZE), start, state->range);
+snprintf(state->range, 127, "%" PRIu64 "-%" PRIu64, start, end);
+DPRINTF("CURL (AIO): Reading %" PRIu64 " at %" PRIu64 " (%s)\n",
+acb->bytes, start, state->range);
 curl_easy_setopt(state->curl, CURLOPT_RANGE, state->range);
 
 curl_multi_add_handle(s->multi, state->curl);
@@ -914,8 +914,8 @@ static BlockAIOCB *curl_aio_readv(BlockDriverState *bs,
 acb = qemu_aio_get(&curl_aiocb_info, bs, cb, opaque);
 
 acb->qiov = qiov;
-acb->sector_num = sector_num;
-acb->nb_sectors = nb_sectors;
+acb->offset = sector_num * BDRV_SECTOR_SIZE;
+acb->bytes = nb_sectors * BDRV_SECTOR_SIZE;
 
 aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), curl

[Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd

2017-05-15 Thread Peter Lieven
Hi Block developers,

I would like to add a feature to Qemu to drain all traffic from a block so that
I can take external snaphosts without the risk to that in the middle of a write
operation. Its meant for cases where where QGA freeze/thaw is not available.

For me its enough to have this through qemu-io, but Kevin asked me to check
if its not worth to have a stable API for it and present it via QMP/HMP.

What are your thoughts?

Thanks,
Peter

---
 qemu-io-cmds.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 312fc6d..49d82fe 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1565,6 +1565,82 @@ static const cmdinfo_t flush_cmd = {
 .oneline= "flush all in-core file state to disk",
 };
 
+static const cmdinfo_t drain_cmd;
+
+static int drain_f(BlockBackend *blk, int argc, char **argv)
+{
+BlockDriverState *bs = blk_bs(blk);
+bool flush = false;
+int c;
+
+while ((c = getopt(argc, argv, "f")) != -1) {
+switch (c) {
+case 'f':
+flush = true;
+break;
+default:
+return qemuio_command_usage(&drain_cmd);
+}
+}
+
+if (optind != argc) {
+return qemuio_command_usage(&drain_cmd);
+}
+
+
+if (bs->quiesce_counter) {
+printf("drain failed: device is already drained!\n");
+return 1;
+}
+
+bdrv_drained_begin(bs); /* complete I/O */
+if (flush) {
+bdrv_flush(bs);
+bdrv_drain(bs); /* in case flush left pending I/O */
+printf("flushed all pending I/O\n");
+}
+printf("drain successful\n");
+return 0;
+}
+
+static void drain_help(void)
+{
+printf(
+"\n"
+" Drains all external I/O from the device\n"
+"\n"
+" -f, -- flush all in-core file state to disk\n"
+"\n");
+}
+
+static const cmdinfo_t drain_cmd = {
+.name   = "drain",
+.cfunc  = drain_f,
+.args   = "[-f]",
+.argmin = 0,
+.argmax = -1,
+.oneline= "cease to send I/O to the device",
+.help   = drain_help
+};
+
+static int undrain_f(BlockBackend *blk, int argc, char **argv)
+{
+BlockDriverState *bs = blk_bs(blk);
+if (!bs->quiesce_counter) {
+printf("undrain failed: device is not drained!\n");
+return 1;
+}
+bdrv_drained_end(bs);
+printf("undrain successful\n");
+return 0;
+}
+
+static const cmdinfo_t undrain_cmd = {
+.name   = "undrain",
+.cfunc  = undrain_f,
+.oneline= "continue I/O to a drained device",
+};
+
 static int truncate_f(BlockBackend *blk, int argc, char **argv)
 {
 int64_t offset;
@@ -2296,6 +2372,8 @@ static void __attribute((constructor)) 
init_qemuio_commands(void)
 qemuio_add_command(&aio_write_cmd);
 qemuio_add_command(&aio_flush_cmd);
 qemuio_add_command(&flush_cmd);
+qemuio_add_command(&drain_cmd);
+qemuio_add_command(&undrain_cmd);
 qemuio_add_command(&truncate_cmd);
 qemuio_add_command(&length_cmd);
 qemuio_add_command(&info_cmd);
-- 
1.9.1




[Qemu-devel] [PATCH v3 7/7] curl: do not do aio_poll when waiting for a free CURLState

2017-05-15 Thread Paolo Bonzini
Instead, put the CURLAIOCB on a wait list and yield; curl_clean_state will
wake the corresponding coroutine.

Because of CURL's callback-based structure, we cannot easily convert
everything to CoMutex/CoQueue; keeping the QemuMutex is simpler.  However,
CoQueue is a simple wrapper around a linked list, so we can easily
use QSIMPLEQ and open-code a CoQueue, protected by the BDRVCURLState
QemuMutex instead of a CoMutex.

Reviewed-by: Jeff Cody 
Signed-off-by: Paolo Bonzini 
---
 block/curl.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 86a4a73834..630c6e5187 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -98,6 +98,8 @@ typedef struct CURLAIOCB {
 
 size_t start;
 size_t end;
+
+QSIMPLEQ_ENTRY(CURLAIOCB) next;
 } CURLAIOCB;
 
 typedef struct CURLSocket {
@@ -133,6 +135,7 @@ typedef struct BDRVCURLState {
 bool accept_range;
 AioContext *aio_context;
 QemuMutex mutex;
+QSIMPLEQ_HEAD(, CURLAIOCB) free_state_waitq;
 char *username;
 char *password;
 char *proxyusername;
@@ -532,6 +535,7 @@ static int curl_init_state(BDRVCURLState *s, CURLState 
*state)
 /* Called with s->mutex held.  */
 static void curl_clean_state(CURLState *s)
 {
+CURLAIOCB *next;
 int j;
 for (j=0; j < CURL_NUM_ACB; j++) {
 assert(!s->acb[j]);
@@ -548,6 +552,14 @@ static void curl_clean_state(CURLState *s)
 }
 
 s->in_use = 0;
+
+next = QSIMPLEQ_FIRST(&s->s->free_state_waitq);
+if (next) {
+QSIMPLEQ_REMOVE_HEAD(&s->s->free_state_waitq, next);
+qemu_mutex_unlock(&s->s->mutex);
+aio_co_wake(next->co);
+qemu_mutex_lock(&s->s->mutex);
+}
 }
 
 static void curl_parse_filename(const char *filename, QDict *options,
@@ -744,6 +756,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 DPRINTF("CURL: Opening %s\n", file);
+QSIMPLEQ_INIT(&s->free_state_waitq);
 s->aio_context = bdrv_get_aio_context(bs);
 s->url = g_strdup(file);
 qemu_mutex_lock(&s->mutex);
@@ -844,8 +857,9 @@ static void curl_setup_preadv(BlockDriverState *bs, 
CURLAIOCB *acb)
 if (state) {
 break;
 }
+QSIMPLEQ_INSERT_TAIL(&s->free_state_waitq, acb, next);
 qemu_mutex_unlock(&s->mutex);
-aio_poll(bdrv_get_aio_context(bs), true);
+qemu_coroutine_yield();
 qemu_mutex_lock(&s->mutex);
 }
 
-- 
2.12.2




Re: [Qemu-devel] [PATCH] stream: fix crash in stream_start() when block_job_create() fails

2017-05-15 Thread Kevin Wolf
Am 15.05.2017 um 11:34 hat Alberto Garcia geschrieben:
> The code that tries to reopen a BlockDriverState in stream_start()
> when the creation of a new block job fails crashes because it attempts
> to dereference a pointer that is known to be NULL.
> 
> This is a regression introduced in a170a91fd3eab6155da39e740381867e,
> likely because the code was copied from stream_complete().
> 
> Signed-off-by: Alberto Garcia 

Cc: qemu-sta...@nongnu.org

Thanks, applied to the block branch.

Kevin



Re: [Qemu-devel] [PATCH 1/2] migration: Remove use of old MigrationParams

2017-05-15 Thread Peter Xu
On Fri, May 12, 2017 at 05:54:48PM +0200, Juan Quintela wrote:
> We have change in the previous patch to use migration capabilities for
> it.  Notice that we continue using the old command line flags from
> migrate command from the time being.  Remove the set_params method as
> now it is empty.
> 
> Signed-off-by: Juan Quintela 
> Reviewed-by: zhanghailiang 
> 
> --
> - Maintain shared/enabled dependency (Xu suggestien)
> ---
>  include/migration/migration.h |  3 +--
>  migration/block.c | 17 ++---
>  migration/colo.c  |  5 +++--
>  migration/migration.c | 14 +++---
>  migration/savevm.c|  2 --
>  5 files changed, 17 insertions(+), 24 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 258e4ff..d228f29 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -43,8 +43,7 @@
>  extern int only_migratable;
>  
>  struct MigrationParams {
> -bool blk;
> -bool shared;
> +bool unused; /* C doesn't allow empty structs */
>  };
>  
>  /* Messages sent on the return path from destination to source */
> diff --git a/migration/block.c b/migration/block.c
> index 060087f..fcfa823 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -94,9 +94,6 @@ typedef struct BlkMigBlock {
>  } BlkMigBlock;
>  
>  typedef struct BlkMigState {
> -/* Written during setup phase.  Can be read without a lock.  */
> -int blk_enable;
> -int shared_base;
>  QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
>  int64_t total_sector_sum;
>  bool zero_blocks;
> @@ -425,7 +422,7 @@ static int init_blk_migration(QEMUFile *f)
>  bmds->bulk_completed = 0;
>  bmds->total_sectors = sectors;
>  bmds->completed_sectors = 0;
> -bmds->shared_base = block_mig_state.shared_base;
> +bmds->shared_base = migrate_use_block_shared();
>  
>  assert(i < num_bs);
>  bmds_bs[i].bmds = bmds;
> @@ -994,22 +991,12 @@ static int block_load(QEMUFile *f, void *opaque, int 
> version_id)
>  return 0;
>  }
>  
> -static void block_set_params(const MigrationParams *params, void *opaque)
> -{
> -block_mig_state.blk_enable = params->blk;
> -block_mig_state.shared_base = params->shared;
> -
> -/* shared base means that blk_enable = 1 */
> -block_mig_state.blk_enable |= params->shared;
> -}
> -
>  static bool block_is_active(void *opaque)
>  {
> -return block_mig_state.blk_enable == 1;
> +return migrate_use_block_enabled();
>  }
>  
>  static SaveVMHandlers savevm_block_handlers = {
> -.set_params = block_set_params,
>  .save_live_setup = block_save_setup,
>  .save_live_iterate = block_save_iterate,
>  .save_live_complete_precopy = block_save_complete,
> diff --git a/migration/colo.c b/migration/colo.c
> index 963c802..e772384 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -14,6 +14,7 @@
>  #include "qemu/timer.h"
>  #include "sysemu/sysemu.h"
>  #include "migration/colo.h"
> +#include "migration/block.h"
>  #include "io/channel-buffer.h"
>  #include "trace.h"
>  #include "qemu/error-report.h"
> @@ -345,8 +346,8 @@ static int colo_do_checkpoint_transaction(MigrationState 
> *s,
>  }
>  
>  /* Disable block migration */
> -s->params.blk = 0;
> -s->params.shared = 0;
> +migrate_set_block_enabled(s, false);
> +migrate_set_block_shared(s, false);
>  qemu_savevm_state_header(fb);
>  qemu_savevm_state_begin(fb, &s->params);
>  qemu_mutex_lock_iothread();
> diff --git a/migration/migration.c b/migration/migration.c
> index 78102eb..0c32609 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -788,6 +788,10 @@ void 
> qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>  s->enabled_capabilities[cap->value->capability] = cap->value->state;
>  }
>  
> +if (s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED]) {
> +s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED] = true;
> +}
> +
>  if (migrate_postcopy_ram()) {
>  if (migrate_use_compression()) {
>  /* The decompression threads asynchronously write into RAM
> @@ -1214,11 +1218,17 @@ bool migration_is_blocked(Error **errp)
>  void migrate_set_block_shared(MigrationState *s, bool value)
>  {
>  s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = value;
> +if (value) {
> +s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED] = true;
> +}
>  }
>  
>  void migrate_set_block_enabled(MigrationState *s, bool value)
>  {
>  s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED] = value;
> +if (!value) {
> +s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = false;
> +}
>  }
>  
>  void qmp_migrate(const char *uri, bool has_blk, bool blk,
> @@ -1230,9 +1240,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> blk,
>  MigrationP

Re: [Qemu-devel] [PATCH 2/2] migration: Remove old MigrationParams

2017-05-15 Thread Peter Xu
On Fri, May 12, 2017 at 05:54:49PM +0200, Juan Quintela wrote:
> Not used anymore after moving block migration to use capabilities.
> 
> Signed-off-by: Juan Quintela 
> Reviewed-by: Dr. David Alan Gilbert 
> Reviewed-by: zhanghailiang 

Reviewed-by: Peter Xu 



Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-05-15 Thread Liu, Yi L
On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi,
> 
> On 26/04/17 11:12, Liu, Yi L wrote:
> > From: "Liu, Yi L" 
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/uapi/linux/vfio.h | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +   __u32   argsz;
> > +   __u32   length;
> > +   __u8data[];
> > +};
> 
> We initially discussed something a little more generic than this, with
> most info explicitly described and only pIOMMU-specific quirks and hints
> in an opaque structure. Out of curiosity, why the change? I'm not against
> a fully opaque structure, but there seem to be a large overlap between TLB
> invalidations across architectures.

Hi Jean,

As my cover letter mentioned, it is an open on the iommu tlb invalidate
propagation. Paste it here since it's in the cover letter for Qemu part
changes. Pls refer to the [Open] session in the following link.

http://www.spinics.net/lists/kvm/msg148798.html

I want to see if community wants to have opaque structure or not
on iommu tlb invalidate propagation. Personally, I incline to use
opaque structure. But it's better to gather the comments before
deciding it. To assist the discussion, I put the full opaque structure
here. Once community gets consensus on using opaque structure for
iommu tlb invalidate propagation, I'm glad to work with you on a
structure with partial opaque since there seems to be overlap across
arch.

> 
> For what it's worth, when prototyping the paravirtualized IOMMU I came up
> with the following.
> 
> (From the paravirtualized POV, the SMMU also has to swizzle endianess
> after unpacking an opaque structure, since userspace doesn't know what's
> in it and guest might use a different endianess. So we need to force all
> opaque data to be e.g. little-endian.)
> 
> struct vfio_iommu_tlb_invalidate {
>   __u32   argsz;
>   __u32   scope;
>   __u32   flags;
>   __u32   pasid;
>   __u64   vaddr;
>   __u64   size;
>   __u8data[];
> };
>
> Scope is a bitfields restricting the invalidation scope. By default
> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
> and @size are unused.
> 
> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
> scope to the pasid described by @pasid.
> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
> scope to the address range described by (@vaddr, @size).
> 
> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
> range for *all* pasids (as well as no_pasid). Setting scope =
> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
> the VA range only for @pasid.
> 

Besides VA range flusing, there is PASID Cache flushing on VT-d. How about
SMMU? So I think besides the two scope you defined, may need one more to
indicate if it's PASID Cache flushing.

> Flags depend on the selected scope:
> 
> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
> exclusively (some architectures, e.g. SMMU, allow this)
> 
> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
> to invalidate all intermediate tables cached as part of the PTW for vaddr,
> only the last-level entry (pte). This is a hint.
> 
> I guess what's missing for Intel IOMMU and would go in @data is the
> "global" hint (which we don't have in SMMU invalidations). Do you see
> anything else, that the pIOMMU cannot deduce from this structure?
> 

For Intel platform, Drain read/write would be needed in the opaque.

Thanks,
Yi L



[Qemu-devel] [PATCH] qemu-img: Fix documentation of convert

2017-05-15 Thread Fam Zheng
It got lost in commit a8d16f9ca "qemu-img: Update documentation for -U".

Reported-by: Max Reitz 
Signed-off-by: Fam Zheng 
---
 qemu-img-cmds.hx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index e5bc28f..469ec12 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -40,9 +40,9 @@ STEXI
 ETEXI
 
 DEF("convert", img_convert,
-"convert [--object objectdef] [--image-opts] [-U] [-c] [-p] [-q] [-n] [-f 
fmt] [-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s 
snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m num_coroutines] 
[-W] filename [filename2 [...]] output_filename")
+"convert [--object objectdef] [--image-opts] [-U] [-c] [-p] [-q] [-n] [-f 
fmt] [-t cache] [-T src_cache] [-O output_fmt] [-B backing_file] [-o options] 
[-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m 
num_coroutines] [-W] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [--object @var{objectdef}] [--image-opts] [-U] [-c] [-p] [-q] 
[-n] [-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] 
[-o @var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S 
@var{sparse_size}] [-m @var{num_coroutines}] [-W] @var{filename} 
[@var{filename2} [...]] @var{output_filename}
+@item convert [--object @var{objectdef}] [--image-opts] [-U] [-c] [-p] [-q] 
[-n] [-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] 
[-B @var{backing_file}][-o @var{options}] [-s @var{snapshot_id_or_name}] [-l 
@var{snapshot_param}] [-S @var{sparse_size}] [-m @var{num_coroutines}] [-W] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("dd", img_dd,
-- 
2.9.3




Re: [Qemu-devel] [PATCH v8 0/4] Improve convert and dd commands

2017-05-15 Thread Fam Zheng
On Fri, 05/12 19:41, Max Reitz wrote:
> The fun is increased by the fact that the locking series has
> (inadvertently) removed the -B documentation from convert, so there is
> another conflict looming in the future...

Sorry about the mistake there..

I've posted a patch for that:

[Qemu-devel] [PATCH] qemu-img: Fix documentation of convert

Fam



Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams

2017-05-15 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> Eric Blake  wrote:
> > On 05/12/2017 05:55 AM, Juan Quintela wrote:
>  @@ -1239,6 +1240,7 @@ void qmp_migrate(const char *uri, bool has_blk, 
>  bool blk,
>   }
>   
>   if (has_inc && inc) {
>  +migrate_set_block_enabled(s, true);
>   migrate_set_block_shared(s, true);
> >>>
> >>> [2]
> >>>
> >>> IIUC for [1] & [2] we are solving the same problem that "shared"
> >>> depends on "enabled" bit. Would it be good to unitfy this dependency
> >>> somewhere? E.g., by changing migrate_set_block_shared() into:
> >>>
> >>> void migrate_set_block_shared(MigrationState *s, bool value)
> >>> {
> >>> s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = value;
> >>> if (value) {
> >>> migrate_set_block_enabled(s, true);
> >>> }
> >>> }
> >> 
> >> ok with this.
> >
> > Or, as I commented on 1/3, maybe having a single property that is a
> > tri-state enum value, instead of 2 separate boolean properties, might be
> > nicer (but certainly a bit more complex to code up).
> 
> If you teach me how to do the qapi/qmp part, I will do the other bits.
> I don't really care if we do it one way or the other.
> 
> >> I will add once here that when we disable block enabled, we also disable
> >> shared, or just let it that way?
> >> 
> >>> Another thing to mention: after switching to the capability interface,
> >>> we'll cache the "enabled" and "shared" bits now while we don't cache
> >>> it before, right? IIUC it'll affect behavior of such sequence:
> >>>
> >>> - 1st migrate with enabled=1, shared=1, then
> >>> - 2nd migrate with enabled=0, shared=0
> >>>
> >>> Before the series, the 2nd migrate will use enabled=shared=0, but
> >>> after the series it should be using enabled=shared=1. Not sure whether
> >>> this would be a problem (or I missed anything?).
> >> 
> >> We can't be consistent with both old/new way.
> >> 
> >> Old way: we always setup the capabilities on command line (that should
> >> have been deprecated long, long ago)
> >
> > Well, the easy way out is to have the HMP migrate command (I assume
> > that's what you mean by "on command line") explicitly clear the
> > parameters if it is called without the -b/-i flag.  So the start of each
> > migration is what changes the properties, so long as you are still using
> > HMP to start the migration.  Or, on the QMP side, since 'migrate' has
> > optional 'blk' and 'inc' booleans, basically leave the settings alone if
> > the parameters were omitted, and explicitly update the property to the
> > value of those parameters if they were present.
> 
> We are going to have trouble whatever way that we do it, or we start
> doing lots of strange things.
> 
> Forget about qmp, we are going to assume that it is consistent with hmp.
> 
> migrate_set_capabilities block_enabled on
> migrate -b .
> 
> Should migrate disable the block_enabled capability?  Give one
> warning/error?
> 
> And notice that this only matter if we do a migration, we cancel/get one
> error, and then we migrate again.
> 
> What I tried to do is assume that -b/-i arguments don't exist.  And if
> the user use them, we implement its behaviour with the minimal fuss
> possibly.
> 
> Only way that I can think of being consistent and bug compatible will be
> to store:
> - old block_shared/enanbeld capability value
> - if we set -b/-i on the command line
> 
> In migration cleanup do the right thing depending on this four
> variables.  I think that it is adding lots of complexity for very few
> improvement.
> 
> 
> > Or is the proposal that we are also going to simplify the QMP 'migrate'
> > command to get rid of crufty parameters?
> 
> I didn't read it that way, but I would not oppose O:-)
> 

Ewww this is messy; you end up with almost as much code as the old
flags you're trying to remove.
For HMP you could gently deprecate the flags over time and give
a warning telling people to use the capabilities; but doing that
in one go is a bit nasty, and you still have to do something
cleverer for the QMP which is similar.
 
I think I agree though that migrate, cancel, migrate should work
sensibly and it's hard to get it right.

Dave

> Later, Juan.
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PATCH] ehci: fix overflow in frame timer code

2017-05-15 Thread Gerd Hoffmann
In case the frame timer doesn't run for a while due to the host being
busy skipped_uframes can become big enough that UFRAME_TIMER_NS *
skipped_uframes overflows.  Which in turn throws off all subsequent
ehci frame timer calculations.

Reported-by: 李林 <8610...@163.com>
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-ehci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 3703a8dddc..17c572c55f 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -2238,7 +2238,7 @@ static void ehci_work_bh(void *opaque)
 int need_timer = 0;
 int64_t expire_time, t_now;
 uint64_t ns_elapsed;
-int uframes, skipped_uframes;
+uint64_t uframes, skipped_uframes;
 int i;
 
 t_now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-- 
2.9.3




Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd

2017-05-15 Thread Fam Zheng
On Mon, 05/15 12:02, Peter Lieven wrote:
> Hi Block developers,
> 
> I would like to add a feature to Qemu to drain all traffic from a block so 
> that
> I can take external snaphosts without the risk to that in the middle of a 
> write
> operation. Its meant for cases where where QGA freeze/thaw is not available.
> 
> For me its enough to have this through qemu-io, but Kevin asked me to check
> if its not worth to have a stable API for it and present it via QMP/HMP.
> 
> What are your thoughts?

For debugging purpose or a "hacky" usage where you know what you are doing, it
may be fine to have this. The only issue is it should be a separate flag, like
BlockJob.user_paused.

What happens from guest perspective? In the case of virtio, the request queue is
not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still handled,
the command is not effective (or rather the implementation is not complete).

Fam

> 
> Thanks,
> Peter
> 
> ---
>  qemu-io-cmds.c | 78 
> ++
>  1 file changed, 78 insertions(+)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 312fc6d..49d82fe 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1565,6 +1565,82 @@ static const cmdinfo_t flush_cmd = {
>  .oneline= "flush all in-core file state to disk",
>  };
>  
> +static const cmdinfo_t drain_cmd;
> +
> +static int drain_f(BlockBackend *blk, int argc, char **argv)
> +{
> +BlockDriverState *bs = blk_bs(blk);
> +bool flush = false;
> +int c;
> +
> +while ((c = getopt(argc, argv, "f")) != -1) {
> +switch (c) {
> +case 'f':
> +flush = true;
> +break;
> +default:
> +return qemuio_command_usage(&drain_cmd);
> +}
> +}
> +
> +if (optind != argc) {
> +return qemuio_command_usage(&drain_cmd);
> +}
> +
> +
> +if (bs->quiesce_counter) {
> +printf("drain failed: device is already drained!\n");
> +return 1;
> +}
> +
> +bdrv_drained_begin(bs); /* complete I/O */
> +if (flush) {
> +bdrv_flush(bs);
> +bdrv_drain(bs); /* in case flush left pending I/O */
> +printf("flushed all pending I/O\n");
> +}
> +printf("drain successful\n");
> +return 0;
> +}
> +
> +static void drain_help(void)
> +{
> +printf(
> +"\n"
> +" Drains all external I/O from the device\n"
> +"\n"
> +" -f, -- flush all in-core file state to disk\n"
> +"\n");
> +}
> +
> +static const cmdinfo_t drain_cmd = {
> +.name   = "drain",
> +.cfunc  = drain_f,
> +.args   = "[-f]",
> +.argmin = 0,
> +.argmax = -1,
> +.oneline= "cease to send I/O to the device",
> +.help   = drain_help
> +};
> +
> +static int undrain_f(BlockBackend *blk, int argc, char **argv)
> +{
> +BlockDriverState *bs = blk_bs(blk);
> +if (!bs->quiesce_counter) {
> +printf("undrain failed: device is not drained!\n");
> +return 1;
> +}
> +bdrv_drained_end(bs);
> +printf("undrain successful\n");
> +return 0;
> +}
> +
> +static const cmdinfo_t undrain_cmd = {
> +.name   = "undrain",
> +.cfunc  = undrain_f,
> +.oneline= "continue I/O to a drained device",
> +};
> +
>  static int truncate_f(BlockBackend *blk, int argc, char **argv)
>  {
>  int64_t offset;
> @@ -2296,6 +2372,8 @@ static void __attribute((constructor)) 
> init_qemuio_commands(void)
>  qemuio_add_command(&aio_write_cmd);
>  qemuio_add_command(&aio_flush_cmd);
>  qemuio_add_command(&flush_cmd);
> +qemuio_add_command(&drain_cmd);
> +qemuio_add_command(&undrain_cmd);
>  qemuio_add_command(&truncate_cmd);
>  qemuio_add_command(&length_cmd);
>  qemuio_add_command(&info_cmd);
> -- 
> 1.9.1
> 
> 



Re: [Qemu-devel] [PATCH v3 0/7] curl: locking cleanups/fixes, coroutine conversion, remove aio_poll

2017-05-15 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v3 0/7] curl: locking cleanups/fixes, coroutine 
conversion, remove aio_poll
Message-id: 20170515100059.15795-1-pbonz...@redhat.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20170515104543.32044-1-kra...@redhat.com -> 
patchew/20170515104543.32044-1-kra...@redhat.com
Switched to a new branch 'test'
e926adb curl: do not do aio_poll when waiting for a free CURLState
4eebc68 curl: convert readv to coroutines
f4b49bc curl: convert CURLAIOCB to byte values
610980d curl: split curl_find_state/curl_init_state
c5ca4dc curl: avoid recursive locking of BDRVCURLState mutex
3ace9ce curl: never invoke callbacks with s->mutex held
b31fcbf curl: strengthen assertion in curl_clean_state

=== OUTPUT BEGIN ===
Checking PATCH 1/7: curl: strengthen assertion in curl_clean_state...
ERROR: spaces required around that '=' (ctx:VxV)
#24: FILE: block/curl.c:536:
+for (j=0; j < CURL_NUM_ACB; j++) {
   ^

total: 1 errors, 0 warnings, 11 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/7: curl: never invoke callbacks with s->mutex held...
Checking PATCH 3/7: curl: avoid recursive locking of BDRVCURLState mutex...
Checking PATCH 4/7: curl: split curl_find_state/curl_init_state...
ERROR: spaces required around that '=' (ctx:VxV)
#40: FILE: block/curl.c:463:
+for (i=0; i < CURL_NUM_STATES; i++) {
   ^

total: 1 errors, 0 warnings, 92 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 5/7: curl: convert CURLAIOCB to byte values...
Checking PATCH 6/7: curl: convert readv to coroutines...
Checking PATCH 7/7: curl: do not do aio_poll when waiting for a free 
CURLState...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [PATCH 05/12] migration: Move colo.h to migration/

2017-05-15 Thread Juan Quintela
"Dr. David Alan Gilbert"  wrote:
D> * Juan Quintela (quint...@redhat.com) wrote:
>> There are functions only used by migration code.
>
> That's only mostly true; see the current 'integrate colo frame with
> block replication and net compare' series (posted 22nd April).
> That adds colo_handle_shutdown to this header and calls it from vl.c
> ( https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03901.html )
> where should that go?

Dropped.

It compiled, but who knows O:-)

> There's also a net/colo.h as well, so using the
>   #include "colo.h" in migration is correct but that's
> really scary when there are two files of the same name.

Yeap, it is scary ...



Re: [Qemu-devel] [PATCH 09/12] migration: Split vmstate-types.c from vmstate.c

2017-05-15 Thread Juan Quintela
"Dr. David Alan Gilbert"  wrote:
> * Juan Quintela (quint...@redhat.com) wrote:
>> Now one just has the interperter, and the other has the basic types.
>> Once there, add copyright boilerplate.
>> 
>> Signed-off-by: Juan Quintela 
>
> I think this is generally OK, but as discussed on IRC, I think
> you need to check the licenses.

Changed to GPLv2 or later.

I just copied that from savevm.c, I should have known better and read it
O:-)

Later, Juan.



Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-05-15 Thread Liu, Yi L
On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:04 +0800
> "Liu, Yi L"  wrote:
> 
> > From: "Liu, Yi L" 
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/uapi/linux/vfio.h | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +   __u32   argsz;
> > +   __u32   length;
> > +   __u8data[];
> > +};
> > +
> > +#define VFIO_IOMMU_TLB_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 23)
> 
> I'm kind of wondering why this isn't just a new flag bit on
> vfio_device_svm, the data structure is so similar.  Of course data
> needs to be fully specified in uapi.

Hi Alex,

For this part, it depends on using opaque structure or not. The following
link mentioned it in [Open] session.

http://www.spinics.net/lists/kvm/msg148798.html

If we pick the full opaque solution for iommu tlb invalidate propagation.
Then I may add a flag bit on vfio_device_svm and also add definition in
uapi as you suggested.

Thanks,
Yi L

> > +
> >  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
> >  
> >  /*
> 



Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-05-15 Thread Liu, Yi L
On Fri, May 12, 2017 at 03:59:14PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:11:58 +0800
> "Liu, Yi L"  wrote:
> 
> > From: Jacob Pan 
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  drivers/iommu/iommu.c | 19 +++
> >  include/linux/iommu.h | 31 +++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
> > struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +   struct pasid_table_info *pasidt_binfo)
> > +{
> > +   if (unlikely(!domain->ops->bind_pasid_table))
> > +   return -EINVAL;
> > +
> > +   return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> > +
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device 
> > *dev)
> > +{
> > +   if (unlikely(!domain->ops->unbind_pasid_table))
> > +   return -EINVAL;
> > +
> > +   return domain->ops->unbind_pasid_table(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >   struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 0ff5111..491a011 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -131,6 +131,15 @@ struct iommu_dm_region {
> > int prot;
> >  };
> >  
> > +struct pasid_table_info {
> > +   __u64   ptr;/* PASID table ptr */
> > +   __u64   size;   /* PASID table size*/
> > +   __u32   model;  /* magic number */
> > +#define INTEL_IOMMU(1 << 0)
> > +#define ARM_SMMU   (1 << 1)
> > +   __u8opaque[];/* IOMMU-specific details */
> > +};
> 
> This needs to be in uapi since you're expecting a user to pass it 

yes, it is. Thx for the correction.

Thanks,
Yi L
> > +
> >  #ifdef CONFIG_IOMMU_API
> >  
> >  /**
> > @@ -159,6 +168,8 @@ struct iommu_dm_region {
> >   * @domain_get_windows: Return the number of windows for a domain
> >   * @of_xlate: add OF master IDs to iommu grouping
> >   * @pgsize_bitmap: bitmap of all possible supported page sizes
> > + * @bind_pasid_table: bind pasid table pointer for guest SVM
> > + * @unbind_pasid_table: unbind pasid table pointer and restore defaults
> >   */
> >  struct iommu_ops {
> > bool (*capable)(enum iommu_cap);
> > @@ -200,6 +211,10 @@ struct iommu_ops {
> > u32 (*domain_get_windows)(struct iommu_domain *domain);
> >  
> > int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
> > +   int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
> > +   struct pasid_table_info *pasidt_binfo);
> > +   int (*unbind_pasid_table)(struct iommu_domain *domain,
> > +   struct device *dev);
> >  
> > unsigned long pgsize_bitmap;
> >  };
> > @@ -221,6 +236,10 @@ extern int iommu_attach_device(struct iommu_domain 
> > *domain,
> >struct device *dev);
> >  extern void iommu_detach_device(struct iommu_domain *domain,
> > struct device *dev);
> > +extern int iommu_bind_pasid_table(struct iommu_domain *domain,
> > +   struct device *dev, struct pasid_table_info *pasidt_binfo);
> > +extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
> > +   struct device *dev);
> >  extern st

Re: [Qemu-devel] [PATCH 12/12] migration: migration.h was not needed

2017-05-15 Thread Juan Quintela
"Dr. David Alan Gilbert"  wrote:
> * Juan Quintela (quint...@redhat.com) wrote:
>> This files don't use any function from migration.h, so drop it.
>> 
>> Signed-off-by: Juan Quintela 
>> ---
>>  block/qed.c | 1 -
>>  hw/i386/pc_q35.c| 1 -
>>  hw/virtio/vhost-user.c  | 1 -
>>  hw/virtio/vhost-vsock.c | 1 -
>>  hw/virtio/virtio.c  | 1 -
>>  monitor.c   | 1 -
>>  6 files changed, 6 deletions(-)
>> 
>> diff --git a/block/qed.c b/block/qed.c
>> index fd76817..8d899fd 100644
>> --- a/block/qed.c
>> +++ b/block/qed.c
>> @@ -19,7 +19,6 @@
>>  #include "trace.h"
>>  #include "qed.h"
>>  #include "qapi/qmp/qerror.h"
>> -#include "migration/migration.h"
>>  #include "sysemu/block-backend.h"
>>  
>>  static const AIOCBInfo qed_aiocb_info = {
>> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
>> index dd792a8..76b08f8 100644
>> --- a/hw/i386/pc_q35.c
>> +++ b/hw/i386/pc_q35.c
>> @@ -46,7 +46,6 @@
>>  #include "hw/ide/ahci.h"
>>  #include "hw/usb.h"
>>  #include "qemu/error-report.h"
>> -#include "migration/migration.h"
>>  
>>  /* ICH9 AHCI has 6 ports */
>>  #define MAX_SATA_PORTS 6
>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> index 9334a8a..ebc8ccf 100644
>> --- a/hw/virtio/vhost-user.c
>> +++ b/hw/virtio/vhost-user.c
>> @@ -17,7 +17,6 @@
>>  #include "sysemu/kvm.h"
>>  #include "qemu/error-report.h"
>>  #include "qemu/sockets.h"
>> -#include "migration/migration.h"
>>  
>>  #include 
>>  #include 
>> diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
>> index b481562..49e0022 100644
>> --- a/hw/virtio/vhost-vsock.c
>> +++ b/hw/virtio/vhost-vsock.c
>> @@ -17,7 +17,6 @@
>>  #include "qapi/error.h"
>>  #include "hw/virtio/virtio-bus.h"
>>  #include "hw/virtio/virtio-access.h"
>> -#include "migration/migration.h"
>>  #include "qemu/error-report.h"
>>  #include "hw/virtio/vhost-vsock.h"
>>  #include "qemu/iov.h"
>
> Aren't these including it to get vmstate macros?
> but have they picked up that instead somewhere?

hw/hw.h

hw/virtio/vhost-vsock.h -> hw/virtio/virtio.h -> hw/hw.h

Dropping vmstate.h from that file would have mean have to add one
hundred new includes or so.  And I didn't see any good reason for that.

My understanding is that all devices end including hw/hw.h one way or
another, so the only reason to add migration include files right now is:
- you use old registartion functions
- you use blockers
- you use migration notifiers

Otherwise, if you just use vmstate macros, they are there by hw/hw.h.

Later, Juan.



Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd

2017-05-15 Thread Peter Lieven

Am 15.05.2017 um 12:50 schrieb Fam Zheng:

On Mon, 05/15 12:02, Peter Lieven wrote:

Hi Block developers,

I would like to add a feature to Qemu to drain all traffic from a block so that
I can take external snaphosts without the risk to that in the middle of a write
operation. Its meant for cases where where QGA freeze/thaw is not available.

For me its enough to have this through qemu-io, but Kevin asked me to check
if its not worth to have a stable API for it and present it via QMP/HMP.

What are your thoughts?

For debugging purpose or a "hacky" usage where you know what you are doing, it
may be fine to have this. The only issue is it should be a separate flag, like
BlockJob.user_paused.


How can I add, remove such a flag?



What happens from guest perspective? In the case of virtio, the request queue is
not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still handled,
the command is not effective (or rather the implementation is not complete).


That it only works with virtio is fine. However, the thing it does not work 
correctly
apply then also to all other users of the drained_begin/end functions, right?
As for the timeout I only plan to drain the device for about 1 second.

Thanks,
Peter




[Qemu-devel] [PATCH 2/6] spapr: fix error path of required kernel-irqchip

2017-05-15 Thread Greg Kurz
QEMU should exit if the user explicitely asked for kernel-irqchip support
and "xics-kvm" initialization fails.

The changelog of commit 34f2af3d3edf ("spapr: Clean up misuse of qdev_init()
in xics-kvm creation") reads:

While there, improve the error message when we can't satisfy an
explicit user request for "xics-kvm", and exit(1) instead of abort().
Simplify the abort when we can't create "xics".

This patch adds the missing call to exit().

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index abfb99b71b7d..f477d7b8a210 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -133,6 +133,7 @@ static void xics_system_init(MachineState *machine, int 
nr_irqs, Error **errp)
 if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
 error_reportf_err(err,
   "kernel_irqchip requested but unavailable: ");
+exit(EXIT_FAILURE);
 } else {
 error_free(err);
 }




[Qemu-devel] [PATCH 4/6] spapr: sanitize error handling in spapr_ics_create()

2017-05-15 Thread Greg Kurz
The spapr_ics_create() function handles errors in a rather convoluted
way, with two local Error * variables. Moreover, failing to parent the
ICS object to the machine should be considered as a bug but it is
currently ignored.

This patch addresses both issues.

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr.c |   19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 44f7dc7f40e9..c53989bb10b1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -101,21 +101,26 @@ static ICSState *spapr_ics_create(sPAPRMachineState 
*spapr,
   const char *type_ics,
   int nr_irqs, Error **errp)
 {
-Error *err = NULL, *local_err = NULL;
+Error *local_err = NULL;
 Object *obj;
 
 obj = object_new(type_ics);
-object_property_add_child(OBJECT(spapr), "ics", obj, NULL);
+object_property_add_child(OBJECT(spapr), "ics", obj, &error_abort);
 object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort);
-object_property_set_int(obj, nr_irqs, "nr-irqs", &err);
+object_property_set_int(obj, nr_irqs, "nr-irqs", &local_err);
+if (local_err) {
+goto error;
+}
 object_property_set_bool(obj, true, "realized", &local_err);
-error_propagate(&err, local_err);
-if (err) {
-error_propagate(errp, err);
-return NULL;
+if (local_err) {
+goto error;
 }
 
 return ICS_SIMPLE(obj);
+
+error:
+error_propagate(errp, local_err);
+return NULL;
 }
 
 static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)




[Qemu-devel] [PATCH 0/6] spapr/xics: fix migration of older machine types

2017-05-15 Thread Greg Kurz
Recent work on XICS broke migration of older machine types. The target
fails with:

qemu-system-ppc64: Unknown savevm section or instance 'icp/server' 1
qemu-system-ppc64: load of migration failed: Invalid argument

This happens because the current code no longer pre-allocates ICP objects
at machine init time.

This series restore the previous behavior for older machine types (patch 6).
Since the fix adds an error path to cover the case when realization of the
ICP objects fails, the series first does some cleanup on the way errors are
handled (patch 1-5).

I could successfully migrate older machines back and forth with QEMU 2.9:
- pseries-2.9
- pseries-2.9 with hotplugged CPUs
- pseries-2.6 (pre CPU hotplug support)

This series is based on:

https://github.com/dgibson/qemu.git ppc-for-2.10

--
Greg

---

Greg Kurz (6):
  ppc/xics: simplify prototype of xics_spapr_init()
  spapr: fix error path of required kernel-irqchip
  spapr: fix error reporting in xics_system_init()
  spapr: sanitize error handling in spapr_ics_create()
  spapr-cpu-core: release ICP object when realization fails
  spapr: fix migration of ICP objects from/to older QEMU


 hw/intc/xics_spapr.c|3 +-
 hw/ppc/spapr.c  |   75 ++-
 hw/ppc/spapr_cpu_core.c |   40 -
 include/hw/ppc/spapr.h  |2 +
 include/hw/ppc/xics.h   |2 +
 5 files changed, 90 insertions(+), 32 deletions(-)




[Qemu-devel] [PATCH 6/6] spapr: fix migration of ICP objects from/to older QEMU

2017-05-15 Thread Greg Kurz
Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under
sPAPRCPUCore") moved ICP objects from the machine to CPU cores. This
is an improvement since we no longer allocate ICP objects that will
never be used. But it has the side-effect of breaking migration of
older machine types from older QEMU versions.

This patch introduces a compat flag in the sPAPR machine class so
that all pseries machine up to 2.9 go on with the previous behavior
of pre-allocating ICP objects.

While here, we also ensure that object_property_add_child() errors cause
QEMU to abort for newer machines.

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr.c  |   36 
 hw/ppc/spapr_cpu_core.c |   28 
 include/hw/ppc/spapr.h  |2 ++
 3 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c53989bb10b1..ab3683bcd677 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -126,6 +126,7 @@ error:
 static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
+sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
 Error *local_err = NULL;
 
 if (kvm_enabled()) {
@@ -151,6 +152,38 @@ static void xics_system_init(MachineState *machine, int 
nr_irqs, Error **errp)
   &local_err);
 }
 
+if (!spapr->ics) {
+goto out;
+}
+
+if (smc->must_pre_allocate_icps) {
+int smt = kvmppc_smt_threads();
+int nr_servers = DIV_ROUND_UP(max_cpus * smt, smp_threads);
+int i;
+
+spapr->legacy_icps = g_malloc0(nr_servers * sizeof(ICPState));
+
+for (i = 0; i < nr_servers; i++) {
+void* obj = &spapr->legacy_icps[i];
+
+object_initialize(obj, sizeof(ICPState), spapr->icp_type);
+object_property_add_child(OBJECT(spapr), "icp[*]", obj,
+  &error_abort);
+object_unref(obj);
+object_property_add_const_link(obj, "xics", OBJECT(spapr),
+   &error_abort);
+object_property_set_bool(obj, true, "realized", &local_err);
+if (local_err) {
+while (i--) {
+object_unparent(obj);
+}
+g_free(spapr->legacy_icps);
+break;
+}
+}
+}
+
+out:
 error_propagate(errp, local_err);
 }
 
@@ -3256,8 +3289,11 @@ static void 
spapr_machine_2_9_instance_options(MachineState *machine)
 
 static void spapr_machine_2_9_class_options(MachineClass *mc)
 {
+sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
 spapr_machine_2_10_class_options(mc);
 SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
+smc->must_pre_allocate_icps = true;
 }
 
 DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 63d160f7e010..5476647efa06 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -119,6 +119,7 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, 
Error **errp)
 size_t size = object_type_get_instance_size(typename);
 CPUCore *cc = CPU_CORE(dev);
 int i;
+sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 
 for (i = 0; i < cc->nr_threads; i++) {
 void *obj = sc->threads + i * size;
@@ -127,7 +128,9 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, 
Error **errp)
 PowerPCCPU *cpu = POWERPC_CPU(cs);
 
 spapr_cpu_destroy(cpu);
-object_unparent(cpu->intc);
+if (!spapr->legacy_icps) {
+object_unparent(cpu->intc);
+}
 cpu_remove_sync(cs);
 object_unparent(obj);
 }
@@ -142,12 +145,19 @@ static void spapr_cpu_core_realize_child(Object *child, 
Error **errp)
 PowerPCCPU *cpu = POWERPC_CPU(cs);
 Object *obj;
 
-obj = object_new(spapr->icp_type);
-object_property_add_child(OBJECT(cpu), "icp", obj, NULL);
-object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort);
-object_property_set_bool(obj, true, "realized", &local_err);
-if (local_err) {
-goto error;
+if (spapr->legacy_icps) {
+int index = cpu->parent_obj.cpu_index;
+
+obj = OBJECT(&spapr->legacy_icps[index]);
+} else {
+obj = object_new(spapr->icp_type);
+object_property_add_child(OBJECT(cpu), "icp", obj, &error_abort);
+object_property_add_const_link(obj, "xics", OBJECT(spapr),
+   &error_abort);
+object_property_set_bool(obj, true, "realized", &local_err);
+if (local_err) {
+goto error;
+}
 }
 
 object_property_set_bool(child, true, "realized", &local_err);
@@ -164,7 +174,9 @@ static void spapr_cpu_core_realize_child(Object *child, 
Error **errp)
 return;
 
 error:
-object_unparent(obj);
+if (!spapr->legacy

[Qemu-devel] [PATCH 1/6] ppc/xics: simplify prototype of xics_spapr_init()

2017-05-15 Thread Greg Kurz
This function only does hypercall and RTAS-call registration, and thus
never returns an error. This patch adapt the prototype to reflect that.

Signed-off-by: Greg Kurz 
---
 hw/intc/xics_spapr.c  |3 +--
 hw/ppc/spapr.c|2 +-
 include/hw/ppc/xics.h |2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index f05308b897f2..d98ea8b13068 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -229,7 +229,7 @@ static void rtas_int_on(PowerPCCPU *cpu, sPAPRMachineState 
*spapr,
 rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 }
 
-int xics_spapr_init(sPAPRMachineState *spapr, Error **errp)
+void xics_spapr_init(sPAPRMachineState *spapr)
 {
 /* Registration of global state belongs into realize */
 spapr_rtas_register(RTAS_IBM_SET_XIVE, "ibm,set-xive", rtas_set_xive);
@@ -243,7 +243,6 @@ int xics_spapr_init(sPAPRMachineState *spapr, Error **errp)
 spapr_register_hypercall(H_XIRR_X, h_xirr_x);
 spapr_register_hypercall(H_EOI, h_eoi);
 spapr_register_hypercall(H_IPOLL, h_ipoll);
-return 0;
 }
 
 #define ICS_IRQ_FREE(ics, srcno)   \
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1b7cadab0cdf..abfb99b71b7d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -139,7 +139,7 @@ static void xics_system_init(MachineState *machine, int 
nr_irqs, Error **errp)
 }
 
 if (!spapr->ics) {
-xics_spapr_init(spapr, errp);
+xics_spapr_init(spapr);
 spapr->icp_type = TYPE_ICP;
 spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, errp);
 }
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 05e6acbb3533..d6cb51f3ad5d 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -206,6 +206,6 @@ void icp_resend(ICPState *ss);
 typedef struct sPAPRMachineState sPAPRMachineState;
 
 int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
-int xics_spapr_init(sPAPRMachineState *spapr, Error **errp);
+void xics_spapr_init(sPAPRMachineState *spapr);
 
 #endif /* XICS_H */




[Qemu-devel] [PATCH 3/6] spapr: fix error reporting in xics_system_init()

2017-05-15 Thread Greg Kurz
The xics_system_init() function passes its errp argument to xics_kvm_init().
If the call fails and the user requested in-kernel irqchip, it then ends up
passing a NULL Error * to error_reportf_err() and the error message is
silently dropped.

Passing an errp argument is generally wrong, unless you really just need
to convey an error to the caller.

This patch converts xics_system_init() to *only* pass a pointer to its own
Error * and then to propagate it with error_propagate(), as recommended
in error.h.

The local_err name is used for consistency with the rest of the code.

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr.c |   17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f477d7b8a210..44f7dc7f40e9 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -121,29 +121,32 @@ static ICSState *spapr_ics_create(sPAPRMachineState 
*spapr,
 static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
+Error *local_err = NULL;
 
 if (kvm_enabled()) {
-Error *err = NULL;
-
 if (machine_kernel_irqchip_allowed(machine) &&
-!xics_kvm_init(spapr, errp)) {
+!xics_kvm_init(spapr, &local_err)) {
 spapr->icp_type = TYPE_KVM_ICP;
-spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, &err);
+spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs,
+  &local_err);
 }
 if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
-error_reportf_err(err,
+error_reportf_err(local_err,
   "kernel_irqchip requested but unavailable: ");
 exit(EXIT_FAILURE);
 } else {
-error_free(err);
+error_free(local_err);
 }
 }
 
 if (!spapr->ics) {
 xics_spapr_init(spapr);
 spapr->icp_type = TYPE_ICP;
-spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, errp);
+spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs,
+  &local_err);
 }
+
+error_propagate(errp, local_err);
 }
 
 static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,




[Qemu-devel] [PATCH 5/6] spapr-cpu-core: release ICP object when realization fails

2017-05-15 Thread Greg Kurz
While here we introduce a single error path to avoid code duplication.

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr_cpu_core.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 4389ef4c2aef..63d160f7e010 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -147,25 +147,25 @@ static void spapr_cpu_core_realize_child(Object *child, 
Error **errp)
 object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort);
 object_property_set_bool(obj, true, "realized", &local_err);
 if (local_err) {
-error_propagate(errp, local_err);
-return;
+goto error;
 }
 
 object_property_set_bool(child, true, "realized", &local_err);
 if (local_err) {
-object_unparent(obj);
-error_propagate(errp, local_err);
-return;
+goto error;
 }
 
 spapr_cpu_init(spapr, cpu, &local_err);
 if (local_err) {
-object_unparent(obj);
-error_propagate(errp, local_err);
-return;
+goto error;
 }
 
 xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj));
+return;
+
+error:
+object_unparent(obj);
+error_propagate(errp, local_err);
 }
 
 static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)




Re: [Qemu-devel] [PATCH 2/6] spapr: fix error path of required kernel-irqchip

2017-05-15 Thread Cédric Le Goater
On 05/15/2017 01:39 PM, Greg Kurz wrote:
> QEMU should exit if the user explicitely asked for kernel-irqchip support
> and "xics-kvm" initialization fails.
> 
> The changelog of commit 34f2af3d3edf ("spapr: Clean up misuse of qdev_init()
> in xics-kvm creation") reads:
> 
> While there, improve the error message when we can't satisfy an
> explicit user request for "xics-kvm", and exit(1) instead of abort().
> Simplify the abort when we can't create "xics".
> 
> This patch adds the missing call to exit().
> 
> Signed-off-by: Greg Kurz 


Reviewed-by: Cédric Le Goater 

> ---
>  hw/ppc/spapr.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index abfb99b71b7d..f477d7b8a210 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -133,6 +133,7 @@ static void xics_system_init(MachineState *machine, int 
> nr_irqs, Error **errp)
>  if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
>  error_reportf_err(err,
>"kernel_irqchip requested but unavailable: ");
> +exit(EXIT_FAILURE);
>  } else {
>  error_free(err);
>  }
> 




Re: [Qemu-devel] [PATCH 3/6] spapr: fix error reporting in xics_system_init()

2017-05-15 Thread Cédric Le Goater
On 05/15/2017 01:39 PM, Greg Kurz wrote:
> The xics_system_init() function passes its errp argument to xics_kvm_init().
> If the call fails and the user requested in-kernel irqchip, it then ends up
> passing a NULL Error * to error_reportf_err() and the error message is
> silently dropped.
> 
> Passing an errp argument is generally wrong, unless you really just need
> to convey an error to the caller.
> 
> This patch converts xics_system_init() to *only* pass a pointer to its own
> Error * and then to propagate it with error_propagate(), as recommended
> in error.h.
> 
> The local_err name is used for consistency with the rest of the code.
> 
> Signed-off-by: Greg Kurz 

Thanks for the fix.

Reviewed-by: Cédric Le Goater 

C.

> ---
>  hw/ppc/spapr.c |   17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f477d7b8a210..44f7dc7f40e9 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -121,29 +121,32 @@ static ICSState *spapr_ics_create(sPAPRMachineState 
> *spapr,
>  static void xics_system_init(MachineState *machine, int nr_irqs, Error 
> **errp)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> +Error *local_err = NULL;
>  
>  if (kvm_enabled()) {
> -Error *err = NULL;
> -
>  if (machine_kernel_irqchip_allowed(machine) &&
> -!xics_kvm_init(spapr, errp)) {
> +!xics_kvm_init(spapr, &local_err)) {
>  spapr->icp_type = TYPE_KVM_ICP;
> -spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, 
> &err);
> +spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs,
> +  &local_err);
>  }
>  if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
> -error_reportf_err(err,
> +error_reportf_err(local_err,
>"kernel_irqchip requested but unavailable: ");
>  exit(EXIT_FAILURE);
>  } else {
> -error_free(err);
> +error_free(local_err);
>  }
>  }
>  
>  if (!spapr->ics) {
>  xics_spapr_init(spapr);
>  spapr->icp_type = TYPE_ICP;
> -spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, errp);
> +spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs,
> +  &local_err);
>  }
> +
> +error_propagate(errp, local_err);
>  }
>  
>  static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
> 




Re: [Qemu-devel] [PATCH] Disable the not fully implemented warning for e1000

2017-05-15 Thread Philippe Mathieu-Daudé

Hi Julien,


On 05/05/2017 09:57 AM, Julien Duponchelle wrote:

Otherwise for image like CISCO IOSv you got a lot
of warning on the console preventing you to use
the VM because it's slow down the machine.

This fix:
https://bugs.launchpad.net/qemu/+bug/1673722

Signed-off-by: Julien Duponchelle 
---
 hw/net/e1000.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index f2e5072d27..095fd0133f 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -35,6 +35,7 @@
 #include "sysemu/dma.h"
 #include "qemu/iov.h"
 #include "qemu/range.h"
+#include "qemu/log.h"

 #include "e1000x_common.h"

@@ -1264,8 +1265,9 @@ e1000_mmio_write(void *opaque, hwaddr addr,
uint64_t val,
 if (!(mac_reg_access[index] & MAC_ACCESS_FLAG_NEEDED)
 || (s->compat_flags & (mac_reg_access[index] >> 2))) {
 if (mac_reg_access[index] & MAC_ACCESS_PARTIAL) {
-DBGOUT(GENERAL, "Writing to register at offset:
0x%08x. "
-   "It is not fully implemented.\n", index<<2);
+qemu_log_mask(LOG_UNIMP, "e1000: "
+"Writing to register at offset: 0x%08x. "
+"It is not fully implemented.\n", index << 2);


trailing newline is no more necessary.


I mistaken with error_report(), qemu_log_mask() do use trailing newline, 
sorry!




what about:
qemu_log_mask(LOG_UNIMP, "%s: write to unimplemented "
  "register addr=0x%05x val=0x%08" PRIx64,
   __func__, index << 2, val);


 }
 macreg_writeops[index](s, index, val);
 } else {/* "flag needed" bit is set, but the flag is not
active */
@@ -1291,8 +1293,9 @@ e1000_mmio_read(void *opaque, hwaddr addr,
unsigned size)
 if (!(mac_reg_access[index] & MAC_ACCESS_FLAG_NEEDED)
 || (s->compat_flags & (mac_reg_access[index] >> 2))) {
 if (mac_reg_access[index] & MAC_ACCESS_PARTIAL) {
-DBGOUT(GENERAL, "Reading register at offset: 0x%08x. "
-   "It is not fully implemented.\n", index<<2);
+qemu_log_mask(LOG_UNIMP, "e1000: "
+"Reading register at offset: 0x%08x. "
+"It is not fully implemented.\n", index << 2);


qemu_log_mask(LOG_UNIMP, "%s: read  to unimplemented "
  "register addr=0x%05x" PRIx64, __func__,
  index << 2);


 }
 return macreg_readops[index](s, index);
 } else {/* "flag needed" bit is set, but the flag is not
active */



Regard,

Phil.




Re: [Qemu-devel] [Qemu-arm] [Qemu-devel PATCH 1/5] msf2: Add Smartfusion2 System timer

2017-05-15 Thread Philippe Mathieu-Daudé

Hi Subbaraya,


+if (value & TIMER_MODE) {
+qemu_log_mask(LOG_UNIMP, "64-bit mode not supported\n");


No need of trailing '\n', be more specific, something like:

qemu_log_mask(LOG_UNIMP, TYPE_MSF2_TIMER ": 64-bit mode not
supported");


I mistaken with error_report(), qemu_log_mask() does use trailing 
newline, sorry!


Phil.



Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd

2017-05-15 Thread Fam Zheng
On Mon, 05/15 13:26, Peter Lieven wrote:
> Am 15.05.2017 um 12:50 schrieb Fam Zheng:
> > On Mon, 05/15 12:02, Peter Lieven wrote:
> > > Hi Block developers,
> > > 
> > > I would like to add a feature to Qemu to drain all traffic from a block 
> > > so that
> > > I can take external snaphosts without the risk to that in the middle of a 
> > > write
> > > operation. Its meant for cases where where QGA freeze/thaw is not 
> > > available.
> > > 
> > > For me its enough to have this through qemu-io, but Kevin asked me to 
> > > check
> > > if its not worth to have a stable API for it and present it via QMP/HMP.
> > > 
> > > What are your thoughts?
> > For debugging purpose or a "hacky" usage where you know what you are doing, 
> > it
> > may be fine to have this. The only issue is it should be a separate flag, 
> > like
> > BlockJob.user_paused.
> 
> How can I add, remove such a flag?

Like bs->user_drained. Set it in "drain" command, then increment
bs->quiesce_counter if toggled; vise versa.

> 
> > 
> > What happens from guest perspective? In the case of virtio, the request 
> > queue is
> > not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still 
> > handled,
> > the command is not effective (or rather the implementation is not complete).
> 
> That it only works with virtio is fine. However, the thing it does not work 
> correctly
> apply then also to all other users of the drained_begin/end functions, right?
> As for the timeout I only plan to drain the device for about 1 second.

It didn't matter because for IDE, the invariant (staying quiesced as long as
necessary) is already ensured by BQL.  Virtio is different because it supports
ioeventfd and data plane.

Fam



Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd

2017-05-15 Thread Peter Lieven

Am 15.05.2017 um 13:53 schrieb Fam Zheng:

On Mon, 05/15 13:26, Peter Lieven wrote:

Am 15.05.2017 um 12:50 schrieb Fam Zheng:

On Mon, 05/15 12:02, Peter Lieven wrote:

Hi Block developers,

I would like to add a feature to Qemu to drain all traffic from a block so that
I can take external snaphosts without the risk to that in the middle of a write
operation. Its meant for cases where where QGA freeze/thaw is not available.

For me its enough to have this through qemu-io, but Kevin asked me to check
if its not worth to have a stable API for it and present it via QMP/HMP.

What are your thoughts?

For debugging purpose or a "hacky" usage where you know what you are doing, it
may be fine to have this. The only issue is it should be a separate flag, like
BlockJob.user_paused.

How can I add, remove such a flag?

Like bs->user_drained. Set it in "drain" command, then increment
bs->quiesce_counter if toggled; vise versa.


Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these functions
the counter is incremented already.






What happens from guest perspective? In the case of virtio, the request queue is
not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still handled,
the command is not effective (or rather the implementation is not complete).

That it only works with virtio is fine. However, the thing it does not work 
correctly
apply then also to all other users of the drained_begin/end functions, right?
As for the timeout I only plan to drain the device for about 1 second.

It didn't matter because for IDE, the invariant (staying quiesced as long as
necessary) is already ensured by BQL.  Virtio is different because it supports
ioeventfd and data plane.


Okay understood. So my use of bdrv_drained_begin/end is more an abuse of
these functions?

Do you have another idea how to achieve what I want? I was thinking of throttle
the I/O to zero. It would be enough to do this for writes, reading doesn't hurt 
in
my case.

Peter




Re: [Qemu-devel] [PATCH 4/6] spapr: sanitize error handling in spapr_ics_create()

2017-05-15 Thread Cédric Le Goater
On 05/15/2017 01:39 PM, Greg Kurz wrote:
> The spapr_ics_create() function handles errors in a rather convoluted
> way, with two local Error * variables. Moreover, failing to parent the
> ICS object to the machine should be considered as a bug but it is
> currently ignored.

I am not sure what should be done for object_property_add_child()
errors but QEMU generally uses NULL for 'Error **'. It might be 
wrong though.

As for the local error handling, it is following what is described in 
qapi/error.h. Isn't it ?

Cheers,

C. 

 
> This patch addresses both issues.
> 
> Signed-off-by: Greg Kurz 
> ---
>  hw/ppc/spapr.c |   19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 44f7dc7f40e9..c53989bb10b1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -101,21 +101,26 @@ static ICSState *spapr_ics_create(sPAPRMachineState 
> *spapr,
>const char *type_ics,
>int nr_irqs, Error **errp)
>  {
> -Error *err = NULL, *local_err = NULL;
> +Error *local_err = NULL;
>  Object *obj;
>  
>  obj = object_new(type_ics);
> -object_property_add_child(OBJECT(spapr), "ics", obj, NULL);
> +object_property_add_child(OBJECT(spapr), "ics", obj, &error_abort);
>  object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort);
> -object_property_set_int(obj, nr_irqs, "nr-irqs", &err);
> +object_property_set_int(obj, nr_irqs, "nr-irqs", &local_err);
> +if (local_err) {
> +goto error;
> +}
>  object_property_set_bool(obj, true, "realized", &local_err);
> -error_propagate(&err, local_err);
> -if (err) {
> -error_propagate(errp, err);
> -return NULL;
> +if (local_err) {
> +goto error;
>  }
>  
>  return ICS_SIMPLE(obj);
> +
> +error:
> +error_propagate(errp, local_err);
> +return NULL;
>  }
>  
>  static void xics_system_init(MachineState *machine, int nr_irqs, Error 
> **errp)
> 




Re: [Qemu-devel] [PATCH 5/6] spapr-cpu-core: release ICP object when realization fails

2017-05-15 Thread Cédric Le Goater
On 05/15/2017 01:39 PM, Greg Kurz wrote:
> While here we introduce a single error path to avoid code duplication.
> 
> Signed-off-by: Greg Kurz 
> ---
>  hw/ppc/spapr_cpu_core.c |   16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 4389ef4c2aef..63d160f7e010 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -147,25 +147,25 @@ static void spapr_cpu_core_realize_child(Object *child, 
> Error **errp)
>  object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort);
>  object_property_set_bool(obj, true, "realized", &local_err);
>  if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> +goto error;

the error: statement below adds an object_unparent(). is that ok ? 

C. 

>  }
>  
>  object_property_set_bool(child, true, "realized", &local_err);
>  if (local_err) {
> -object_unparent(obj);
> -error_propagate(errp, local_err);
> -return;
> +goto error;
>  }
>  
>  spapr_cpu_init(spapr, cpu, &local_err);
>  if (local_err) {
> -object_unparent(obj);
> -error_propagate(errp, local_err);
> -return;
> +goto error;
>  }
>  
>  xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj));
> +return;
> +
> +error:
> +object_unparent(obj);
> +error_propagate(errp, local_err);
>  }
>  
>  static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
> 




Re: [Qemu-devel] [PATCH 4/6] spapr: sanitize error handling in spapr_ics_create()

2017-05-15 Thread Greg Kurz
On Mon, 15 May 2017 13:59:33 +0200
Cédric Le Goater  wrote:

> On 05/15/2017 01:39 PM, Greg Kurz wrote:
> > The spapr_ics_create() function handles errors in a rather convoluted
> > way, with two local Error * variables. Moreover, failing to parent the
> > ICS object to the machine should be considered as a bug but it is
> > currently ignored.  
> 
> I am not sure what should be done for object_property_add_child()
> errors but QEMU generally uses NULL for 'Error **'. It might be 
> wrong though.
> 
> As for the local error handling, it is following what is described in 
> qapi/error.h. Isn't it ?
> 

Yes, it does follow the "Receive and accumulate multiple errors" recommandation,
but does it make sense to realize the ICS object if we failed to set nr-irqs ?

> Cheers,
> 
> C. 
> 
>  
> > This patch addresses both issues.
> > 
> > Signed-off-by: Greg Kurz 
> > ---
> >  hw/ppc/spapr.c |   19 ---
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 44f7dc7f40e9..c53989bb10b1 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -101,21 +101,26 @@ static ICSState *spapr_ics_create(sPAPRMachineState 
> > *spapr,
> >const char *type_ics,
> >int nr_irqs, Error **errp)
> >  {
> > -Error *err = NULL, *local_err = NULL;
> > +Error *local_err = NULL;
> >  Object *obj;
> >  
> >  obj = object_new(type_ics);
> > -object_property_add_child(OBJECT(spapr), "ics", obj, NULL);
> > +object_property_add_child(OBJECT(spapr), "ics", obj, &error_abort);
> >  object_property_add_const_link(obj, "xics", OBJECT(spapr), 
> > &error_abort);
> > -object_property_set_int(obj, nr_irqs, "nr-irqs", &err);
> > +object_property_set_int(obj, nr_irqs, "nr-irqs", &local_err);
> > +if (local_err) {
> > +goto error;
> > +}
> >  object_property_set_bool(obj, true, "realized", &local_err);
> > -error_propagate(&err, local_err);
> > -if (err) {
> > -error_propagate(errp, err);
> > -return NULL;
> > +if (local_err) {
> > +goto error;
> >  }
> >  
> >  return ICS_SIMPLE(obj);
> > +
> > +error:
> > +error_propagate(errp, local_err);
> > +return NULL;
> >  }
> >  
> >  static void xics_system_init(MachineState *machine, int nr_irqs, Error 
> > **errp)
> >   
> 



pgpArqsYEeP80.pgp
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 1/6] ppc/xics: simplify prototype of xics_spapr_init()

2017-05-15 Thread Cédric Le Goater
On 05/15/2017 01:39 PM, Greg Kurz wrote:
> This function only does hypercall and RTAS-call registration, and thus
> never returns an error. This patch adapt the prototype to reflect that.
> 
> Signed-off-by: Greg Kurz 

Reviewed-by: Cédric Le Goater 

> ---
>  hw/intc/xics_spapr.c  |3 +--
>  hw/ppc/spapr.c|2 +-
>  include/hw/ppc/xics.h |2 +-
>  3 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
> index f05308b897f2..d98ea8b13068 100644
> --- a/hw/intc/xics_spapr.c
> +++ b/hw/intc/xics_spapr.c
> @@ -229,7 +229,7 @@ static void rtas_int_on(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }
>  
> -int xics_spapr_init(sPAPRMachineState *spapr, Error **errp)
> +void xics_spapr_init(sPAPRMachineState *spapr)
>  {
>  /* Registration of global state belongs into realize */
>  spapr_rtas_register(RTAS_IBM_SET_XIVE, "ibm,set-xive", rtas_set_xive);
> @@ -243,7 +243,6 @@ int xics_spapr_init(sPAPRMachineState *spapr, Error 
> **errp)
>  spapr_register_hypercall(H_XIRR_X, h_xirr_x);
>  spapr_register_hypercall(H_EOI, h_eoi);
>  spapr_register_hypercall(H_IPOLL, h_ipoll);
> -return 0;
>  }
>  
>  #define ICS_IRQ_FREE(ics, srcno)   \
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 1b7cadab0cdf..abfb99b71b7d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -139,7 +139,7 @@ static void xics_system_init(MachineState *machine, int 
> nr_irqs, Error **errp)
>  }
>  
>  if (!spapr->ics) {
> -xics_spapr_init(spapr, errp);
> +xics_spapr_init(spapr);
>  spapr->icp_type = TYPE_ICP;
>  spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, errp);
>  }
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 05e6acbb3533..d6cb51f3ad5d 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -206,6 +206,6 @@ void icp_resend(ICPState *ss);
>  typedef struct sPAPRMachineState sPAPRMachineState;
>  
>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
> -int xics_spapr_init(sPAPRMachineState *spapr, Error **errp);
> +void xics_spapr_init(sPAPRMachineState *spapr);
>  
>  #endif /* XICS_H */
> 




Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-05-15 Thread Jean-Philippe Brucker
On 14/05/17 11:12, Liu, Yi L wrote:
> On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
>> Hi Yi,
>>
>> On 26/04/17 11:12, Liu, Yi L wrote:
>>> From: "Liu, Yi L" 
>>>
>>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
>>> invalidate request from guest to host.
>>>
>>> In the case of SVM virtualization on VT-d, host IOMMU driver has
>>> no knowledge of caching structure updates unless the guest
>>> invalidation activities are passed down to the host. So a new
>>> IOCTL is needed to propagate the guest cache invalidation through
>>> VFIO.
>>>
>>> Signed-off-by: Liu, Yi L 
>>> ---
>>>  include/uapi/linux/vfio.h | 9 +
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 6b97987..50c51f8 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -564,6 +564,15 @@ struct vfio_device_svm {
>>>  
>>>  #define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
>>>  
>>> +/* For IOMMU TLB Invalidation Propagation */
>>> +struct vfio_iommu_tlb_invalidate {
>>> +   __u32   argsz;
>>> +   __u32   length;
>>> +   __u8data[];
>>> +};
>>
>> We initially discussed something a little more generic than this, with
>> most info explicitly described and only pIOMMU-specific quirks and hints
>> in an opaque structure. Out of curiosity, why the change? I'm not against
>> a fully opaque structure, but there seem to be a large overlap between TLB
>> invalidations across architectures.
> 
> Hi Jean,
> 
> As my cover letter mentioned, it is an open on the iommu tlb invalidate
> propagation. Paste it here since it's in the cover letter for Qemu part
> changes. Pls refer to the [Open] session in the following link.
> 
> http://www.spinics.net/lists/kvm/msg148798.html
> 
> I want to see if community wants to have opaque structure or not
> on iommu tlb invalidate propagation. Personally, I incline to use
> opaque structure. But it's better to gather the comments before
> deciding it. To assist the discussion, I put the full opaque structure
> here. Once community gets consensus on using opaque structure for
> iommu tlb invalidate propagation, I'm glad to work with you on a
> structure with partial opaque since there seems to be overlap across
> arch.

I see, thanks for the pointer. I'm not fan of using the pIOMMU format in
invalidations, but I understand the appeal in your case, where you can
shave off the overhead of parsing/re-assembling the pIOMMU format.

It's not suitable for generic io-pgtable, where different pIOMMUs may
offer the same page table format but different invalidation methods. I
guess I could make it work by negotiating invalidation method at bind
time, falling back to the generic format for virtio-iommu. We already have
to do some related negotiation for SVM on SMMU, where in embedded
implementations the guest doesn't need to send invalidation requests via
IOMMU queue, but can do it using CPU instructions instead.

>> For what it's worth, when prototyping the paravirtualized IOMMU I came up
>> with the following.
>>
>> (From the paravirtualized POV, the SMMU also has to swizzle endianess
>> after unpacking an opaque structure, since userspace doesn't know what's
>> in it and guest might use a different endianess. So we need to force all
>> opaque data to be e.g. little-endian.)
>>
>> struct vfio_iommu_tlb_invalidate {
>>  __u32   argsz;
>>  __u32   scope;
>>  __u32   flags;
>>  __u32   pasid;
>>  __u64   vaddr;
>>  __u64   size;
>>  __u8data[];
>> };
>>
>> Scope is a bitfields restricting the invalidation scope. By default
>> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
>> and @size are unused.
>>
>> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
>> scope to the pasid described by @pasid.
>> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
>> scope to the address range described by (@vaddr, @size).
>>
>> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
>> range for *all* pasids (as well as no_pasid). Setting scope =
>> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
>> the VA range only for @pasid.
>>
> 
> Besides VA range flusing, there is PASID Cache flushing on VT-d. How about
> SMMU? So I think besides the two scope you defined, may need one more to
> indicate if it's PASID Cache flushing.

Yes, invalidating all TLB entries associated to a PASID would be done by
setting scope = VFIO_IOMMU_INVALIDATE_PASID. In which case field @pasid is
valid, and fields (@vaddr, @size) aren't used.

Thanks,
Jean

>> Flags depend on the selected scope:
>>
>> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
>> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
>> exclusively (some architectures, e.g. SMMU, allow this)
>>
>> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't nee

Re: [Qemu-devel] [PATCH] block migration: Allow compile time disable

2017-05-15 Thread Dr. David Alan Gilbert
* Eric Blake (ebl...@redhat.com) wrote:
> On 05/03/2017 05:42 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > Many users now prefer to use drive_mirror over NBD as an
> > alternative to the older migrate -b option; drive_mirror is
> > more complex to setup but gives you more options (e.g. only
> > migrating some of the disks if some of them are shared).
> > 
> > Allow the large chunk of block migration code to be compiled
> > out for those who don't use it.
> > 
> > Based on a downstream-patch we've had for a while by Jeff Cody.
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > ---
> > @@ -1222,6 +1226,14 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> > blk,
> >  params.blk = has_blk && blk;
> >  params.shared = has_inc && inc;
> >  
> > +#ifndef CONFIG_LIVE_BLOCK_MIGRATION
> > +if (params.blk || params.shared) {
> > +error_setg(errp, "QEMU compiled without old-style block migration. 
> > "
> > + "Use drive_mirror+NBD.");
> 
> error_setg() should not be used with '.' (it should be a single phrase,
> here you are trying to stuff in two sentences).  error_append_hint() can
> be used to supply the advice about using drive_mirror+NBD as the
> alternative.
> 
> Otherwise this looks reasonable to me.

Done as:
error_setg(errp, "QEMU compiled without old-style block migration");
error_append_hint(errp, "Use drive_mirror+NBD.\n");

(All the places I can see seem to have . and \n in append_hint)

Dave

> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
> 
> 



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH 0/6] spapr/xics: fix migration of older machine types

2017-05-15 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH 0/6] spapr/xics: fix migration of older machine 
types
Message-id: 149484833874.20089.4164801378197848306.st...@bahia.lan
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  
patchew/149484833874.20089.4164801378197848306.st...@bahia.lan -> 
patchew/149484833874.20089.4164801378197848306.st...@bahia.lan
Switched to a new branch 'test'
3c3824a spapr: fix migration of ICP objects from/to older QEMU
280164f spapr-cpu-core: release ICP object when realization fails
5bc72eb spapr: sanitize error handling in spapr_ics_create()
fa15d00 spapr: fix error reporting in xics_system_init()
ecbe234 spapr: fix error path of required kernel-irqchip
b7cf73d ppc/xics: simplify prototype of xics_spapr_init()

=== OUTPUT BEGIN ===
Checking PATCH 1/6: ppc/xics: simplify prototype of xics_spapr_init()...
Checking PATCH 2/6: spapr: fix error path of required kernel-irqchip...
Checking PATCH 3/6: spapr: fix error reporting in xics_system_init()...
Checking PATCH 4/6: spapr: sanitize error handling in spapr_ics_create()...
Checking PATCH 5/6: spapr-cpu-core: release ICP object when realization fails...
Checking PATCH 6/6: spapr: fix migration of ICP objects from/to older QEMU...
ERROR: "foo* bar" should be "foo *bar"
#50: FILE: hw/ppc/spapr.c:167:
+void* obj = &spapr->legacy_icps[i];

total: 1 errors, 0 warnings, 122 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [PATCH 2/4] exec: rename resolve_subpage

2017-05-15 Thread Peter Xu
On Mon, May 15, 2017 at 11:03:49AM +0200, Paolo Bonzini wrote:
> 
> 
> On 15/05/2017 10:50, Peter Xu wrote:
> > It is not easy for people to know "what this parameter does" before
> > knowing "what is subpage". Let's use "is_mmio" to make it easier to
> > understand.
> > 
> > Signed-off-by: Peter Xu 
> 
> Maybe invert the direction and call it for_iotlb?

I wasn't aware that tcg is using resolve_subpage==false always. Then
at least is_mmio does not suite here. I am not sure whether for-iotlb
suites here either... So maybe I will just drop this patch as well,
just like patch 3.

Thanks for reviewing it!

-- 
Peter Xu



Re: [Qemu-devel] [PATCH 5/6] spapr-cpu-core: release ICP object when realization fails

2017-05-15 Thread Greg Kurz
On Mon, 15 May 2017 14:02:34 +0200
Cédric Le Goater  wrote:

> On 05/15/2017 01:39 PM, Greg Kurz wrote:
> > While here we introduce a single error path to avoid code duplication.
> > 
> > Signed-off-by: Greg Kurz 
> > ---
> >  hw/ppc/spapr_cpu_core.c |   16 
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> > index 4389ef4c2aef..63d160f7e010 100644
> > --- a/hw/ppc/spapr_cpu_core.c
> > +++ b/hw/ppc/spapr_cpu_core.c
> > @@ -147,25 +147,25 @@ static void spapr_cpu_core_realize_child(Object 
> > *child, Error **errp)
> >  object_property_add_const_link(obj, "xics", OBJECT(spapr), 
> > &error_abort);
> >  object_property_set_bool(obj, true, "realized", &local_err);
> >  if (local_err) {
> > -error_propagate(errp, local_err);
> > -return;
> > +goto error;  
> 
> the error: statement below adds an object_unparent(). is that ok ? 
> 

Yes, this is the purpose of the patch actually.

The ICP object gets parented to the CPU a few lines above, so I guess we
should unparent in this error path as well.

Maybe I should rename the patch to "unparent ICP object when realization fails" 
?

> C. 
> 
> >  }
> >  
> >  object_property_set_bool(child, true, "realized", &local_err);
> >  if (local_err) {
> > -object_unparent(obj);
> > -error_propagate(errp, local_err);
> > -return;
> > +goto error;
> >  }
> >  
> >  spapr_cpu_init(spapr, cpu, &local_err);
> >  if (local_err) {
> > -object_unparent(obj);
> > -error_propagate(errp, local_err);
> > -return;
> > +goto error;
> >  }
> >  
> >  xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj));
> > +return;
> > +
> > +error:
> > +object_unparent(obj);
> > +error_propagate(errp, local_err);
> >  }
> >  
> >  static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
> >   
> 



pgpeRLCOV4XNH.pgp
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 0/6] spapr/xics: fix migration of older machine types

2017-05-15 Thread Greg Kurz
On Mon, 15 May 2017 05:13:34 -0700 (PDT)
no-re...@patchew.org wrote:

> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:
> 
> Subject: [Qemu-devel] [PATCH 0/6] spapr/xics: fix migration of older machine 
> types
> Message-id: 149484833874.20089.4164801378197848306.st...@bahia.lan
> Type: series
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> 
> BASE=base
> n=1
> total=$(git log --oneline $BASE.. | wc -l)
> failed=0
> 
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> 
> commits="$(git log --format=%H --reverse $BASE..)"
> for c in $commits; do
> echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
> if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; 
> then
> failed=1
> echo
> fi
> n=$((n+1))
> done
> 
> exit $failed
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> From https://github.com/patchew-project/qemu
>  - [tag update]  
> patchew/149484833874.20089.4164801378197848306.st...@bahia.lan -> 
> patchew/149484833874.20089.4164801378197848306.st...@bahia.lan
> Switched to a new branch 'test'
> 3c3824a spapr: fix migration of ICP objects from/to older QEMU
> 280164f spapr-cpu-core: release ICP object when realization fails
> 5bc72eb spapr: sanitize error handling in spapr_ics_create()
> fa15d00 spapr: fix error reporting in xics_system_init()
> ecbe234 spapr: fix error path of required kernel-irqchip
> b7cf73d ppc/xics: simplify prototype of xics_spapr_init()
> 
> === OUTPUT BEGIN ===
> Checking PATCH 1/6: ppc/xics: simplify prototype of xics_spapr_init()...
> Checking PATCH 2/6: spapr: fix error path of required kernel-irqchip...
> Checking PATCH 3/6: spapr: fix error reporting in xics_system_init()...
> Checking PATCH 4/6: spapr: sanitize error handling in spapr_ics_create()...
> Checking PATCH 5/6: spapr-cpu-core: release ICP object when realization 
> fails...
> Checking PATCH 6/6: spapr: fix migration of ICP objects from/to older QEMU...
> ERROR: "foo* bar" should be "foo *bar"
> #50: FILE: hw/ppc/spapr.c:167:
> +void* obj = &spapr->legacy_icps[i];
> 

Oops, I'll fix that.

> total: 1 errors, 0 warnings, 122 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 
> === OUTPUT END ===
> 
> Test command exited with code: 1
> 
> 
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-de...@freelists.org


pgpRuKMbEsumW.pgp
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 6/6] spapr: fix migration of ICP objects from/to older QEMU

2017-05-15 Thread Cédric Le Goater
On 05/15/2017 01:40 PM, Greg Kurz wrote:
> Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under
> sPAPRCPUCore") moved ICP objects from the machine to CPU cores. This
> is an improvement since we no longer allocate ICP objects that will
> never be used. But it has the side-effect of breaking migration of
> older machine types from older QEMU versions.
> 
> This patch introduces a compat flag in the sPAPR machine class so
> that all pseries machine up to 2.9 go on with the previous behavior
> of pre-allocating ICP objects.

I think this is a quite elegant way to a handle the migration 
regression. Thanks for taking care of it.

Have you tried to simply reparent the ICPs objects to OBJECT(spapr) 
instead of the OBJECT(cpu)  ? 

See some minor comments below.

> While here, we also ensure that object_property_add_child() errors cause
> QEMU to abort for newer machines.
> 
> Signed-off-by: Greg Kurz 
> ---
>  hw/ppc/spapr.c  |   36 
>  hw/ppc/spapr_cpu_core.c |   28 
>  include/hw/ppc/spapr.h  |2 ++
>  3 files changed, 58 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c53989bb10b1..ab3683bcd677 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -126,6 +126,7 @@ error:
>  static void xics_system_init(MachineState *machine, int nr_irqs, Error 
> **errp)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>  Error *local_err = NULL;
>  
>  if (kvm_enabled()) {
> @@ -151,6 +152,38 @@ static void xics_system_init(MachineState *machine, int 
> nr_irqs, Error **errp)
>&local_err);
>  }
>  
> +if (!spapr->ics) {
> +goto out;
> +}
> +
> +if (smc->must_pre_allocate_icps) {

I am not sure I like 'must', I think 'pre_allocate_icps' should be enough ? 
or simply 'allocate_legacy_icps' ?

> +int smt = kvmppc_smt_threads();
> +int nr_servers = DIV_ROUND_UP(max_cpus * smt, smp_threads);

may be we should reintroduce nr_servers at the machine level ? 

> +int i;
> +
> +spapr->legacy_icps = g_malloc0(nr_servers * sizeof(ICPState));
> +
> +for (i = 0; i < nr_servers; i++) {
> +void* obj = &spapr->legacy_icps[i];

'void *'

> +
> +object_initialize(obj, sizeof(ICPState), spapr->icp_type);
> +object_property_add_child(OBJECT(spapr), "icp[*]", obj,
> +  &error_abort);

David does not like the "icp[*]" syntax.

> +object_unref(obj);
> +object_property_add_const_link(obj, "xics", OBJECT(spapr),
> +   &error_abort);
> +object_property_set_bool(obj, true, "realized", &local_err);
> +if (local_err) {
> +while (i--) {
> +object_unparent(obj);
> +}
> +g_free(spapr->legacy_icps);
> +break;
> +}
> +}
> +}
> +
> +out:
>  error_propagate(errp, local_err);
>  }
>  
> @@ -3256,8 +3289,11 @@ static void 
> spapr_machine_2_9_instance_options(MachineState *machine)
>  
>  static void spapr_machine_2_9_class_options(MachineClass *mc)
>  {
> +sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> +
>  spapr_machine_2_10_class_options(mc);
>  SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
> +smc->must_pre_allocate_icps = true;
>  }
>  
>  DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 63d160f7e010..5476647efa06 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -119,6 +119,7 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, 
> Error **errp)
>  size_t size = object_type_get_instance_size(typename);
>  CPUCore *cc = CPU_CORE(dev);
>  int i;
> +sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  
>  for (i = 0; i < cc->nr_threads; i++) {
>  void *obj = sc->threads + i * size;
> @@ -127,7 +128,9 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, 
> Error **errp)
>  PowerPCCPU *cpu = POWERPC_CPU(cs);
>  
>  spapr_cpu_destroy(cpu);
> -object_unparent(cpu->intc);
> +if (!spapr->legacy_icps) {
> +object_unparent(cpu->intc);
> +}
>  cpu_remove_sync(cs);
>  object_unparent(obj);
>  }
> @@ -142,12 +145,19 @@ static void spapr_cpu_core_realize_child(Object *child, 
> Error **errp)
>  PowerPCCPU *cpu = POWERPC_CPU(cs);
>  Object *obj;
>  
> -obj = object_new(spapr->icp_type);
> -object_property_add_child(OBJECT(cpu), "icp", obj, NULL);
> -object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort);
> -object_property_set_bool(obj, true, "realized", &local_err);
> -if (local_err) {
> -goto error;
> + 

Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd

2017-05-15 Thread Fam Zheng
On Mon, 05/15 13:58, Peter Lieven wrote:
> Am 15.05.2017 um 13:53 schrieb Fam Zheng:
> > On Mon, 05/15 13:26, Peter Lieven wrote:
> > > Am 15.05.2017 um 12:50 schrieb Fam Zheng:
> > > > On Mon, 05/15 12:02, Peter Lieven wrote:
> > > > > Hi Block developers,
> > > > > 
> > > > > I would like to add a feature to Qemu to drain all traffic from a 
> > > > > block so that
> > > > > I can take external snaphosts without the risk to that in the middle 
> > > > > of a write
> > > > > operation. Its meant for cases where where QGA freeze/thaw is not 
> > > > > available.
> > > > > 
> > > > > For me its enough to have this through qemu-io, but Kevin asked me to 
> > > > > check
> > > > > if its not worth to have a stable API for it and present it via 
> > > > > QMP/HMP.
> > > > > 
> > > > > What are your thoughts?
> > > > For debugging purpose or a "hacky" usage where you know what you are 
> > > > doing, it
> > > > may be fine to have this. The only issue is it should be a separate 
> > > > flag, like
> > > > BlockJob.user_paused.
> > > How can I add, remove such a flag?
> > Like bs->user_drained. Set it in "drain" command, then increment
> > bs->quiesce_counter if toggled; vise versa.
> 
> Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these functions
> the counter is incremented already.

You're right, calling bdrv_drained_begin() is better.

> 
> 
> 
> > 
> > > > What happens from guest perspective? In the case of virtio, the request 
> > > > queue is
> > > > not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still 
> > > > handled,
> > > > the command is not effective (or rather the implementation is not 
> > > > complete).
> > > That it only works with virtio is fine. However, the thing it does not 
> > > work correctly
> > > apply then also to all other users of the drained_begin/end functions, 
> > > right?
> > > As for the timeout I only plan to drain the device for about 1 second.
> > It didn't matter because for IDE, the invariant (staying quiesced as long as
> > necessary) is already ensured by BQL.  Virtio is different because it 
> > supports
> > ioeventfd and data plane.
> 
> Okay understood. So my use of bdrv_drained_begin/end is more an abuse of
> these functions?

Sort of. But it's not unreasonable to "extend" bdrv_drained_begin/end to cover
IDE, I just haven't thought about "how".

> 
> Do you have another idea how to achieve what I want? I was thinking of 
> throttle
> the I/O to zero. It would be enough to do this for writes, reading doesn't 
> hurt in
> my case.

Maybe add a block filter on top of the drained node, drain it when doing so,
then queue all further requests with a CoQueue until "undrain".  (It is then not
quite to "drain" but to "halt" or "pause", though.)

Fam



Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd

2017-05-15 Thread Peter Lieven

Am 15.05.2017 um 14:28 schrieb Fam Zheng:

On Mon, 05/15 13:58, Peter Lieven wrote:

Am 15.05.2017 um 13:53 schrieb Fam Zheng:

On Mon, 05/15 13:26, Peter Lieven wrote:

Am 15.05.2017 um 12:50 schrieb Fam Zheng:

On Mon, 05/15 12:02, Peter Lieven wrote:

Hi Block developers,

I would like to add a feature to Qemu to drain all traffic from a block so that
I can take external snaphosts without the risk to that in the middle of a write
operation. Its meant for cases where where QGA freeze/thaw is not available.

For me its enough to have this through qemu-io, but Kevin asked me to check
if its not worth to have a stable API for it and present it via QMP/HMP.

What are your thoughts?

For debugging purpose or a "hacky" usage where you know what you are doing, it
may be fine to have this. The only issue is it should be a separate flag, like
BlockJob.user_paused.

How can I add, remove such a flag?

Like bs->user_drained. Set it in "drain" command, then increment
bs->quiesce_counter if toggled; vise versa.

Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these functions
the counter is incremented already.

You're right, calling bdrv_drained_begin() is better.





What happens from guest perspective? In the case of virtio, the request queue is
not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still handled,
the command is not effective (or rather the implementation is not complete).

That it only works with virtio is fine. However, the thing it does not work 
correctly
apply then also to all other users of the drained_begin/end functions, right?
As for the timeout I only plan to drain the device for about 1 second.

It didn't matter because for IDE, the invariant (staying quiesced as long as
necessary) is already ensured by BQL.  Virtio is different because it supports
ioeventfd and data plane.

Okay understood. So my use of bdrv_drained_begin/end is more an abuse of
these functions?

Sort of. But it's not unreasonable to "extend" bdrv_drained_begin/end to cover
IDE, I just haven't thought about "how".


Do you have another idea how to achieve what I want? I was thinking of throttle
the I/O to zero. It would be enough to do this for writes, reading doesn't hurt 
in
my case.

Maybe add a block filter on top of the drained node, drain it when doing so,
then queue all further requests with a CoQueue until "undrain".  (It is then not
quite to "drain" but to "halt" or "pause", though.)


To get the drain for free was why I was looking at this approach. If I read 
correctly
if I keep using bdrv_drained_begin/end its too hacky to implement it in QMP?
If yes, would support adding it to qemu-io?

Thanks,
Peter



Re: [Qemu-devel] [PATCH] stream: fix crash in stream_start() when block_job_create() fails

2017-05-15 Thread Kashyap Chamarthy
On Mon, May 15, 2017 at 12:34:24PM +0300, Alberto Garcia wrote:
> The code that tries to reopen a BlockDriverState in stream_start()
> when the creation of a new block job fails crashes because it attempts
> to dereference a pointer that is known to be NULL.
> 
> This is a regression introduced in a170a91fd3eab6155da39e740381867e,
> likely because the code was copied from stream_complete().
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/stream.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index 0113710845..52d329f5c6 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -280,6 +280,6 @@ void stream_start(const char *job_id, BlockDriverState 
> *bs,
>  
>  fail:
>  if (orig_bs_flags != bdrv_get_flags(bs)) {
> -bdrv_reopen(bs, s->bs_flags, NULL);
> +bdrv_reopen(bs, orig_bs_flags, NULL);

Tested-by: Kashyap Chamarthy 

Yep, this fixes it.

It gracefully throws an error when a 'job-id' was not specified:

---
(QEMU) block-stream device=#block890 base=disk1.qcow2
{
"execute": "block-stream", 
"arguments": {
"device": "#block890", 
"base": "disk1.qcow2"
}
}
{
"error": {
"class": "GenericError", 
"desc": "An explicit job ID is required for this node"
}
}
(QEMU) block-stream device=#block890 base=disk1.qcow2 job-id=job0
{
"execute": "block-stream", 
"arguments": {
"device": "#block890", 
"job-id": "job0", 
"base": "disk1.qcow2"
}
}
{
"return": {}
}
(QEMU) 
---

[...]

-- 
/kashyap



[Qemu-devel] [PATCH] qemu-iotests: Test streaming with missing job ID

2017-05-15 Thread Kevin Wolf
This adds a small test for the image streaming error path for failing
block_job_create(), which would have found the null pointer dereference
in commit a170a91f.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/030 | 4 
 tests/qemu-iotests/030.out | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index e00c11b..feee861 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -147,6 +147,10 @@ class TestSingleDrive(iotests.QMPTestCase):
 result = self.vm.qmp('block-stream', device='nonexistent')
 self.assert_qmp(result, 'error/class', 'GenericError')
 
+def test_job_id_missing(self):
+result = self.vm.qmp('block-stream', device='mid')
+self.assert_qmp(result, 'error/class', 'GenericError')
+
 
 class TestParallelOps(iotests.QMPTestCase):
 num_ops = 4 # Number of parallel block-stream operations
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 84bfd63..391c857 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-..
+...
 --
-Ran 22 tests
+Ran 23 tests
 
 OK
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH 05/12] migration: Move colo.h to migration/

2017-05-15 Thread Hailiang Zhang

On 2017/5/15 19:04, Juan Quintela wrote:

"Dr. David Alan Gilbert"  wrote:
D> * Juan Quintela (quint...@redhat.com) wrote:

There are functions only used by migration code.

That's only mostly true; see the current 'integrate colo frame with
block replication and net compare' series (posted 22nd April).
That adds colo_handle_shutdown to this header and calls it from vl.c
( https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03901.html )
where should that go?

Dropped.

It compiled, but who knows O:-)


There's also a net/colo.h as well, so using the
   #include "colo.h" in migration is correct but that's
really scary when there are two files of the same name.


Ha, I'd like to suggest it be renamed as colo-proxy.h (colo-net.h ? ) for 
net/colo.h,
which is saner,  hmm, together with colo.c, colo-proxy.c ? colo-net.c ? Or any 
other proper name ?
( I didn't notice that until it went into upstream.  O:-) )

Cc:  Zhang Chen 


Yeap, it is scary ...









Re: [Qemu-devel] [PATCH] qemu-iotests: Test streaming with missing job ID

2017-05-15 Thread Alberto Garcia
On Mon 15 May 2017 02:39:40 PM CEST, Kevin Wolf wrote:
> This adds a small test for the image streaming error path for failing
> block_job_create(), which would have found the null pointer dereference
> in commit a170a91f.
>
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/030 | 4 
>  tests/qemu-iotests/030.out | 4 ++--
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
> index e00c11b..feee861 100755
> --- a/tests/qemu-iotests/030
> +++ b/tests/qemu-iotests/030
> @@ -147,6 +147,10 @@ class TestSingleDrive(iotests.QMPTestCase):
>  result = self.vm.qmp('block-stream', device='nonexistent')
>  self.assert_qmp(result, 'error/class', 'GenericError')
>  
> +def test_job_id_missing(self):
> +result = self.vm.qmp('block-stream', device='mid')
> +self.assert_qmp(result, 'error/class', 'GenericError')

Mmm... but does that trigger the bug?

The bug happens if the user tries to create a stream job on a BDS that:

a) exists
b) is not the active node (i.e. needs reopening in read-write mode)
c) its node name is not valid for a block job (e.g. it contains a '#')

Berto



Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd

2017-05-15 Thread Fam Zheng
On Mon, 05/15 14:32, Peter Lieven wrote:
> Am 15.05.2017 um 14:28 schrieb Fam Zheng:
> > On Mon, 05/15 13:58, Peter Lieven wrote:
> > > Am 15.05.2017 um 13:53 schrieb Fam Zheng:
> > > > On Mon, 05/15 13:26, Peter Lieven wrote:
> > > > > Am 15.05.2017 um 12:50 schrieb Fam Zheng:
> > > > > > On Mon, 05/15 12:02, Peter Lieven wrote:
> > > > > > > Hi Block developers,
> > > > > > > 
> > > > > > > I would like to add a feature to Qemu to drain all traffic from a 
> > > > > > > block so that
> > > > > > > I can take external snaphosts without the risk to that in the 
> > > > > > > middle of a write
> > > > > > > operation. Its meant for cases where where QGA freeze/thaw is not 
> > > > > > > available.
> > > > > > > 
> > > > > > > For me its enough to have this through qemu-io, but Kevin asked 
> > > > > > > me to check
> > > > > > > if its not worth to have a stable API for it and present it via 
> > > > > > > QMP/HMP.
> > > > > > > 
> > > > > > > What are your thoughts?
> > > > > > For debugging purpose or a "hacky" usage where you know what you 
> > > > > > are doing, it
> > > > > > may be fine to have this. The only issue is it should be a separate 
> > > > > > flag, like
> > > > > > BlockJob.user_paused.
> > > > > How can I add, remove such a flag?
> > > > Like bs->user_drained. Set it in "drain" command, then increment
> > > > bs->quiesce_counter if toggled; vise versa.
> > > Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these 
> > > functions
> > > the counter is incremented already.
> > You're right, calling bdrv_drained_begin() is better.
> > 
> > > 
> > > 
> > > > > > What happens from guest perspective? In the case of virtio, the 
> > > > > > request queue is
> > > > > > not handled and -ETIMEDOUT may happen. With IDE, I/O commands are 
> > > > > > still handled,
> > > > > > the command is not effective (or rather the implementation is not 
> > > > > > complete).
> > > > > That it only works with virtio is fine. However, the thing it does 
> > > > > not work correctly
> > > > > apply then also to all other users of the drained_begin/end 
> > > > > functions, right?
> > > > > As for the timeout I only plan to drain the device for about 1 second.
> > > > It didn't matter because for IDE, the invariant (staying quiesced as 
> > > > long as
> > > > necessary) is already ensured by BQL.  Virtio is different because it 
> > > > supports
> > > > ioeventfd and data plane.
> > > Okay understood. So my use of bdrv_drained_begin/end is more an abuse of
> > > these functions?
> > Sort of. But it's not unreasonable to "extend" bdrv_drained_begin/end to 
> > cover
> > IDE, I just haven't thought about "how".
> > 
> > > Do you have another idea how to achieve what I want? I was thinking of 
> > > throttle
> > > the I/O to zero. It would be enough to do this for writes, reading 
> > > doesn't hurt in
> > > my case.
> > Maybe add a block filter on top of the drained node, drain it when doing so,
> > then queue all further requests with a CoQueue until "undrain".  (It is 
> > then not
> > quite to "drain" but to "halt" or "pause", though.)
> 
> To get the drain for free was why I was looking at this approach. If I read 
> correctly
> if I keep using bdrv_drained_begin/end its too hacky to implement it in QMP?

I think so.

> If yes, would support adding it to qemu-io?

I'm under the impression that you are looking to a real use case, I don't think
I like the idea. Also, accessing the image from other processes while QEMU is
using it is strongly discouraged, and there is the coming image locking
mechanism to prevent this from happening. Why is the blockdev-snapshot command
not enough?

Fam



Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/2] Enhance block status when crossing EOF

2017-05-15 Thread Stefan Hajnoczi
On Thu, May 04, 2017 at 09:14:58PM -0500, Eric Blake wrote:
> As mentioned in my 'qcow2 zero-cluster tweaks' cover letter
> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00722.html
> I hit a case where we were not fully optimizing a write-zeroes
> operation when we have a backing file that is shorter than the
> active layer, and where the request crosses that boundary.  I don't
> know that the situation will be all that common (most qcow2 files
> happen to be cluster-aligned in size; and most backing chains use
> the same length at each layer of the chain), but for RFC purposes,
> I wanted to explore how easy it would be to optimize the case
> anyways, and to see if we can think of any other cases where
> knowing that a block status crossed an end-of-file boundary can
> be exploited for less work.
> 
> Thus, this is a followup to that series, but I'm also okay if we
> think it is too much maintenance compared to the potential gains,
> and decide to drop it after all.
> 
> Another potential use that I thought of: we may someday want to
> track BDS length by bytes.  Right now, bdrv_getlength() returns a
> value in bytes, but which is sector-aligned, by adding padding;
> but raw-format.c is able to see the transition between data up to
> EOF followed by a hole after EOF.  Because of that, my series to
> convert bdrv_get_block_status() into a byte-based interface has to
> fudge things - any time we see a mid-sector hole being reported,
> we know it is because we hit EOF, and can round the result up to
> the next sector boundary.  Having the BDRV_BLOCK_EOF flag set in
> that case may alert clients that the rounding took place, or help
> us when we quit doing the rounding and actually start tracking BDS
> lengths by bytes even where it is not sector-aligned.
> 
> Eric Blake (2):
>   block: Add BDRV_BLOCK_EOF to bdrv_get_block_status()
>   block: Exploit BDRV_BLOCK_EOF for larger zero blocks
> 
>  include/block/block.h  |  2 ++
>  block/io.c | 42 +-
>  tests/qemu-iotests/154 |  4 
>  tests/qemu-iotests/154.out | 12 ++--
>  4 files changed, 41 insertions(+), 19 deletions(-)
> 
> -- 
> 2.9.3
> 
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-block] [RFC PATCH 2/2] block: Exploit BDRV_BLOCK_EOF for larger zero blocks

2017-05-15 Thread Stefan Hajnoczi
On Thu, May 04, 2017 at 09:15:00PM -0500, Eric Blake wrote:
> When we have a BSD with unallocated clusters, but asking the status

s/BSD/BDS/


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PULL 0/9] pci, virtio, vhost: fixes

2017-05-15 Thread Stefan Hajnoczi
On Wed, May 10, 2017 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> The following changes since commit 76d20ea0f1b26ebd5da2f5fb2fdf3250cde887bb:
> 
>   Merge remote-tracking branch 'armbru/tags/pull-qapi-2017-05-04-v3' into 
> staging (2017-05-09 15:49:14 -0400)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> 
> for you to fetch changes up to 8b12e48950a3d59188489b2ff6c5ad9cc09e9866:
> 
>   acpi-defs: clean up open brace usage (2017-05-10 22:04:23 +0300)
> 
> 
> pci, virtio, vhost: fixes
> 
> A bunch of fixes that missed the release.
> 
> Signed-off-by: Michael S. Tsirkin 

Please resend with the checkpatch.pl issue fixed.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd

2017-05-15 Thread Peter Lieven

Am 15.05.2017 um 14:52 schrieb Fam Zheng:

On Mon, 05/15 14:32, Peter Lieven wrote:

Am 15.05.2017 um 14:28 schrieb Fam Zheng:

On Mon, 05/15 13:58, Peter Lieven wrote:

Am 15.05.2017 um 13:53 schrieb Fam Zheng:

On Mon, 05/15 13:26, Peter Lieven wrote:

Am 15.05.2017 um 12:50 schrieb Fam Zheng:

On Mon, 05/15 12:02, Peter Lieven wrote:

Hi Block developers,

I would like to add a feature to Qemu to drain all traffic from a block so that
I can take external snaphosts without the risk to that in the middle of a write
operation. Its meant for cases where where QGA freeze/thaw is not available.

For me its enough to have this through qemu-io, but Kevin asked me to check
if its not worth to have a stable API for it and present it via QMP/HMP.

What are your thoughts?

For debugging purpose or a "hacky" usage where you know what you are doing, it
may be fine to have this. The only issue is it should be a separate flag, like
BlockJob.user_paused.

How can I add, remove such a flag?

Like bs->user_drained. Set it in "drain" command, then increment
bs->quiesce_counter if toggled; vise versa.

Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these functions
the counter is incremented already.

You're right, calling bdrv_drained_begin() is better.




What happens from guest perspective? In the case of virtio, the request queue is
not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still handled,
the command is not effective (or rather the implementation is not complete).

That it only works with virtio is fine. However, the thing it does not work 
correctly
apply then also to all other users of the drained_begin/end functions, right?
As for the timeout I only plan to drain the device for about 1 second.

It didn't matter because for IDE, the invariant (staying quiesced as long as
necessary) is already ensured by BQL.  Virtio is different because it supports
ioeventfd and data plane.

Okay understood. So my use of bdrv_drained_begin/end is more an abuse of
these functions?

Sort of. But it's not unreasonable to "extend" bdrv_drained_begin/end to cover
IDE, I just haven't thought about "how".


Do you have another idea how to achieve what I want? I was thinking of throttle
the I/O to zero. It would be enough to do this for writes, reading doesn't hurt 
in
my case.

Maybe add a block filter on top of the drained node, drain it when doing so,
then queue all further requests with a CoQueue until "undrain".  (It is then not
quite to "drain" but to "halt" or "pause", though.)

To get the drain for free was why I was looking at this approach. If I read 
correctly
if I keep using bdrv_drained_begin/end its too hacky to implement it in QMP?

I think so.


If yes, would support adding it to qemu-io?

I'm under the impression that you are looking to a real use case, I don't think
I like the idea. Also, accessing the image from other processes while QEMU is
using it is strongly discouraged, and there is the coming image locking
mechanism to prevent this from happening. Why is the blockdev-snapshot command
not enough?


blockdev-snapshot is enough, but I still fear the case there is suddenly too 
much I/O
for the live-commit. And that the whole snapshot / commit code is more senstive 
than
just stopping I/O for a second or two.

do you have a pointer to the image locking mechanism?

Peter




Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd

2017-05-15 Thread Kevin Wolf
Am 15.05.2017 um 14:52 hat Fam Zheng geschrieben:
> On Mon, 05/15 14:32, Peter Lieven wrote:
> > Am 15.05.2017 um 14:28 schrieb Fam Zheng:
> > > On Mon, 05/15 13:58, Peter Lieven wrote:
> > > > Am 15.05.2017 um 13:53 schrieb Fam Zheng:
> > > > > On Mon, 05/15 13:26, Peter Lieven wrote:
> > > > > > Am 15.05.2017 um 12:50 schrieb Fam Zheng:
> > > > > > > On Mon, 05/15 12:02, Peter Lieven wrote:
> > > > > > > > Hi Block developers,
> > > > > > > > 
> > > > > > > > I would like to add a feature to Qemu to drain all traffic from 
> > > > > > > > a block so that
> > > > > > > > I can take external snaphosts without the risk to that in the 
> > > > > > > > middle of a write
> > > > > > > > operation. Its meant for cases where where QGA freeze/thaw is 
> > > > > > > > not available.
> > > > > > > > 
> > > > > > > > For me its enough to have this through qemu-io, but Kevin asked 
> > > > > > > > me to check
> > > > > > > > if its not worth to have a stable API for it and present it via 
> > > > > > > > QMP/HMP.
> > > > > > > > 
> > > > > > > > What are your thoughts?
> > > > > > > For debugging purpose or a "hacky" usage where you know what you 
> > > > > > > are doing, it
> > > > > > > may be fine to have this. The only issue is it should be a 
> > > > > > > separate flag, like
> > > > > > > BlockJob.user_paused.
> > > > > > How can I add, remove such a flag?
> > > > > Like bs->user_drained. Set it in "drain" command, then increment
> > > > > bs->quiesce_counter if toggled; vise versa.
> > > > Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these 
> > > > functions
> > > > the counter is incremented already.
> > > You're right, calling bdrv_drained_begin() is better.
> > > 
> > > > 
> > > > 
> > > > > > > What happens from guest perspective? In the case of virtio, the 
> > > > > > > request queue is
> > > > > > > not handled and -ETIMEDOUT may happen. With IDE, I/O commands are 
> > > > > > > still handled,
> > > > > > > the command is not effective (or rather the implementation is not 
> > > > > > > complete).
> > > > > > That it only works with virtio is fine. However, the thing it does 
> > > > > > not work correctly
> > > > > > apply then also to all other users of the drained_begin/end 
> > > > > > functions, right?
> > > > > > As for the timeout I only plan to drain the device for about 1 
> > > > > > second.
> > > > > It didn't matter because for IDE, the invariant (staying quiesced as 
> > > > > long as
> > > > > necessary) is already ensured by BQL.  Virtio is different because it 
> > > > > supports
> > > > > ioeventfd and data plane.
> > > > Okay understood. So my use of bdrv_drained_begin/end is more an abuse of
> > > > these functions?
> > > Sort of. But it's not unreasonable to "extend" bdrv_drained_begin/end to 
> > > cover
> > > IDE, I just haven't thought about "how".
> > > 
> > > > Do you have another idea how to achieve what I want? I was thinking of 
> > > > throttle
> > > > the I/O to zero. It would be enough to do this for writes, reading 
> > > > doesn't hurt in
> > > > my case.
> > > Maybe add a block filter on top of the drained node, drain it when doing 
> > > so,
> > > then queue all further requests with a CoQueue until "undrain".  (It is 
> > > then not
> > > quite to "drain" but to "halt" or "pause", though.)
> > 
> > To get the drain for free was why I was looking at this approach. If I read 
> > correctly
> > if I keep using bdrv_drained_begin/end its too hacky to implement it in QMP?
> 
> I think so.
> 
> > If yes, would support adding it to qemu-io?
> 
> I'm under the impression that you are looking to a real use case, I don't 
> think
> I like the idea. Also, accessing the image from other processes while QEMU is
> using it is strongly discouraged, and there is the coming image locking
> mechanism to prevent this from happening.

Thinking a bit more about this, it looks to me as if what we really want
is inactivating the image. Should we add an option to the 'stop' command
(or introduce another command to be used on an already stopped VM) that
inactivates all/some images? And then 'cont' regains control over the
images, just like after migration.

Automatically stopping the VM while the snapshot is taken also makes it
work with IDE and prevents guests running into timeouts, which makes it
look much more like a proper solution to me.

But then, stop/cont would already achieve something very similar today
(as 'stop' calls bdrv_drain_all(); just the locking part wouldn't be
covered), so maybe there is a reason not to use it in your case, Peter?

Kevin



  1   2   3   4   >