Re: [Qemu-devel] [PATCH 04/17] qapi: merge QInt and QFloat in QNum
Luiz Capitulino writes: > On Fri, 12 May 2017 08:30:36 +0200 > Markus Armbruster wrote: > >> Question for Luiz... >> >> Marc-André Lureau writes: >> >> [...] >> > diff --git a/tests/check-qnum.c b/tests/check-qnum.c >> > new file mode 100644 >> > index 00..d08d35e85a >> > --- /dev/null >> > +++ b/tests/check-qnum.c >> > @@ -0,0 +1,131 @@ >> > +/* >> > + * QNum unit-tests. >> > + * >> > + * Copyright (C) 2009 Red Hat Inc. >> > + * >> > + * Authors: >> > + * Luiz Capitulino >> > + * >> > + * This work is licensed under the terms of the GNU LGPL, version 2.1 or >> > later. >> > + * See the COPYING.LIB file in the top-level directory. >> > + */ >> > +#include "qemu/osdep.h" >> > + >> > +#include "qapi/qmp/qnum.h" >> > +#include "qapi/error.h" >> > +#include "qemu-common.h" >> > + >> > +/* >> > + * Public Interface test-cases >> > + * >> > + * (with some violations to access 'private' data) >> > + */ >> > + >> > +static void qnum_from_int_test(void) >> > +{ >> > +QNum *qi; >> > +const int value = -42; >> > + >> > +qi = qnum_from_int(value); >> > +g_assert(qi != NULL); >> > +g_assert_cmpint(qi->u.i64, ==, value); >> > +g_assert_cmpint(qi->base.refcnt, ==, 1); >> > +g_assert_cmpint(qobject_type(QOBJECT(qi)), ==, QTYPE_QNUM); >> > + >> > +// destroy doesn't exit yet >> > +g_free(qi); >> > +} >> >> The comment is enigmatic. > > It was meant for future generations to figure it out :) Hah! >> It was first written in commit 33837ba >> "Introduce QInt unit-tests", and got copied around since. In >> check-qlist.c, it's spelled "exist yet". > > Yes, "exit" is a typo it should be "exist". > >> What is "destroy", why doesn't it exit / exist now, but will exit / >> exist later? It can't be qnum_destroy_obj(), because that certainly >> exists already, exits already in the sense of returning, and shouldn't >> ever exit in the sense of terminating the program. >> >> The comment applies to a g_free(). Why do we free directly instead >> decrementing the reference count? Perhaps the comment tries to explain >> that (if it does, it fails). > > In my personal style of writing unit-tests, I never use a method > in a test before testing it. So, as QDECREF() wasn't tested yet, > I wasn't allowed to use it. It's a good principle for organizing tests. > While I keep this principle when writing unit-tests today, this > particular case is very extreme and not useful at all. Today I'd just > go ahead and use QDECREF(). Makes sense. > The qint_destroy_test() in the original > commit is also very bogus, it's not really doing an useful test. It can demonstrate leaks under valgrind. But pretty much every other test can just as well, so... Marc-André, care to stick a cleanup patch into your series?
Re: [Qemu-devel] [RFC PATCH v2] coccinelle: add a script to optimize tcg op using tcg_gen_extract()
Philippe Mathieu-Daudé writes: > Hi Markus, > > On 05/11/2017 06:03 AM, Markus Armbruster wrote: >> Philippe Mathieu-Daudé writes: >> >>> Ok I just understood Richard explanation, so this patch is WRONG and I >>> need to get some real rest :( >> >> Ha! Get some sleep; we'll still be around in the morning ;) >> >>> On 05/10/2017 08:52 PM, Philippe Mathieu-Daudé wrote: Apply this script using: $ docker run -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle \ --sp-file scripts/coccinelle/tcg_gen_extract.cocci \ --macro-file scripts/cocci-macro-file.h \ --dir target \ --in-place Signed-off-by: Philippe Mathieu-Daudé --- This is a new version of the coccinelle script addressing Richard comments and trying to do it correctly. Also changed license to GPLv2+. The first rule matches, it calls a python2 script that basically checks the target_ulong is not overflowed: (msk << ofs) >> sizeof(target_ulong) == 0 >>> >>> WRONG >> [...] >> >> Is this script likely to be rerun in the future? If yes, keeping it in >> scripts/coccinelle/ is a good idea. If no, I recommend to store it in >> the commit message instead. > > It is unlikely to be rerun in the future, at least for this specific > pattern. But it can be easily adapted for another TCG optimization. > > I could not find much documentation about how to do a such script > using Python, except on a thread [1]. If it is documented enough I > think it is worth to keep it. > > About putting it in each commit message, it is now 3 times bigger than > the patch it generates! Consider putting it into the first commit message, and have the others reference back.
Re: [Qemu-devel] [PATCH 1/2] tests: check-qom-proplist: add checks for cmdline-created objects
"Dr. David Alan Gilbert" writes: > I notice this pair of patches doesn't seem to have gone anywhere. > WHile it's labelled as a monitor fix, it's all QOM stuff, so I don't > think it should be going via me. It's somewhere between QOM and QemuOpts. Andreas, please have a look. Feel free to ask me to take the patch through my tree. > Dave > > * Michael Roth (mdr...@linux.vnet.ibm.com) wrote: >> check-qom-proplist originally added tests for verifying that >> object-creation helpers object_new_with_{props,propv} behaved in >> similar fashion to the "traditional" method involving setting each >> individual property separately after object creation rather than >> via a single call. >> >> Another similar "helper" for creating Objects exists in the form of >> objects specified via -object command-line parameters. By that >> rationale, we extend check-qom-proplist to include similar checks >> for command-line-created objects by employing the same >> qemu_opts_parse()-based parsing the vl.c employs. >> >> This parser has a side-effect of parsing the object's options into >> a QemuOpt structure and registering this in the global QemuOptsList >> using the Object's ID. This can conflict with future Object instances >> that attempt to use the same ID if we don't ensure this is cleaned >> up as part of Object finalization, so we include a FIXME stub to test >> for this case, which will then be resolved in a subsequent patch. >> >> Suggested-by: Daniel Berrange >> Cc: "Dr. David Alan Gilbert" >> Cc: Markus Armbruster >> Cc: Eric Blake >> Cc: Daniel Berrange >> Cc: qemu-sta...@nongnu.org >> Signed-off-by: Michael Roth >> Reviewed-by: Markus Armbruster >> --- >> tests/check-qom-proplist.c | 55 >> ++ >> 1 file changed, 55 insertions(+) >> >> diff --git a/tests/check-qom-proplist.c b/tests/check-qom-proplist.c >> index a16cefc..e3f56ca 100644 >> --- a/tests/check-qom-proplist.c >> +++ b/tests/check-qom-proplist.c >> @@ -23,6 +23,9 @@ >> #include "qapi/error.h" >> #include "qom/object.h" >> #include "qemu/module.h" >> +#include "qemu/option.h" >> +#include "qemu/config-file.h" >> +#include "qom/object_interfaces.h" >> >> >> #define TYPE_DUMMY "qemu-dummy" >> @@ -162,6 +165,10 @@ static const TypeInfo dummy_info = { >> .instance_finalize = dummy_finalize, >> .class_size = sizeof(DummyObjectClass), >> .class_init = dummy_class_init, >> +.interfaces = (InterfaceInfo[]) { >> +{ TYPE_USER_CREATABLE }, >> +{ } >> +} >> }; >> >> >> @@ -320,6 +327,14 @@ static const TypeInfo dummy_backend_info = { >> .class_size = sizeof(DummyBackendClass), >> }; >> >> +static QemuOptsList qemu_object_opts = { >> +.name = "object", >> +.implied_opt_name = "qom-type", >> +.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head), >> +.desc = { >> +{ } >> +}, >> +}; >> >> >> static void test_dummy_createv(void) >> @@ -388,6 +403,45 @@ static void test_dummy_createlist(void) >> object_unparent(OBJECT(dobj)); >> } >> >> +static void test_dummy_createcmdl(void) >> +{ >> +QemuOpts *opts; >> +DummyObject *dobj; >> +Error *err = NULL; >> +const char *params = TYPE_DUMMY \ >> + ",id=dev0," \ >> + "bv=yes,sv=Hiss hiss hiss,av=platypus"; >> + >> +qemu_add_opts(&qemu_object_opts); >> +opts = qemu_opts_parse(&qemu_object_opts, params, true, &err); >> +g_assert(err == NULL); >> +g_assert(opts); >> + >> +dobj = DUMMY_OBJECT(user_creatable_add_opts(opts, &err)); >> +g_assert(err == NULL); >> +g_assert(dobj); >> +g_assert_cmpstr(dobj->sv, ==, "Hiss hiss hiss"); >> +g_assert(dobj->bv == true); >> +g_assert(dobj->av == DUMMY_PLATYPUS); >> + >> +user_creatable_del("dev0", &err); >> +g_assert(err == NULL); >> +error_free(err); >> + >> +/* cmdline-parsing via qemu_opts_parse() results in a QemuOpts entry >> + * corresponding to the Object's ID to be added to the QemuOptsList >> + * for objects. To avoid having this entry conflict with future >> + * Objects using the same ID (which can happen in cases where >> + * qemu_opts_parse() is used to parse the object params, such as >> + * with hmp_object_add() at the time of this comment), we need to >> + * check for this in user_creatable_del() and remove the QemuOpts if >> + * it is present. >> + * >> + * FIXME: add an assert to verify that the QemuOpts is cleaned up >> + * once the corresponding cleanup code is added. >> + */ >> +} >> + >> static void test_dummy_badenum(void) >> { >> Error *err = NULL; >> @@ -525,6 +579,7 @@ int main(int argc, char **argv) >> >> g_test_add_func("/qom/proplist/createlist", test_dummy_createlist); >> g_test_add_func("/qom/proplist/createv", test_dummy_createv); >> +g_test_add_func("/qom/proplist/createcmdline", test_dummy_createcmdl); >> g_test_add_f
Re: [Qemu-devel] [PATCH 06/17] qnum: add uint type
Marc-André Lureau writes: > In order to store integer values superior to INT64_MAX, add a u64 > internal representation. > > Signed-off-by: Marc-André Lureau > --- > include/qapi/qmp/qnum.h | 4 > qobject/qnum.c | 49 +++ > tests/check-qnum.c | 51 > + > 3 files changed, 104 insertions(+) > > diff --git a/include/qapi/qmp/qnum.h b/include/qapi/qmp/qnum.h > index 0e51427821..89c14e040f 100644 > --- a/include/qapi/qmp/qnum.h > +++ b/include/qapi/qmp/qnum.h > @@ -17,6 +17,7 @@ > > typedef enum { > QNUM_I64, > +QNUM_U64, > QNUM_DOUBLE > } QNumType; > > @@ -25,14 +26,17 @@ typedef struct QNum { > QNumType type; > union { > int64_t i64; > +uint64_t u64; > double dbl; > } u; > } QNum; > > QNum *qnum_from_int(int64_t value); > +QNum *qnum_from_uint(uint64_t value); > QNum *qnum_from_double(double value); > > int64_t qnum_get_int(const QNum *qi, Error **errp); > +uint64_t qnum_get_uint(const QNum *qi, Error **errp); > double qnum_get_double(QNum *qn); > > char *qnum_to_string(QNum *qn); > diff --git a/qobject/qnum.c b/qobject/qnum.c > index 8e9dd38350..be6307accf 100644 > --- a/qobject/qnum.c > +++ b/qobject/qnum.c > @@ -13,6 +13,7 @@ > > #include "qemu/osdep.h" > #include "qapi/error.h" > +#include "qemu/error-report.h" Superfluous, please drop. > #include "qapi/qmp/qnum.h" > #include "qapi/qmp/qobject.h" > #include "qemu-common.h" > @@ -34,6 +35,22 @@ QNum *qnum_from_int(int64_t value) > } > > /** > + * qnum_from_uint(): Create a new QNum from an uint64_t > + * > + * Return strong reference. > + */ > +QNum *qnum_from_uint(uint64_t value) > +{ > +QNum *qn = g_new(QNum, 1); > + > +qobject_init(QOBJECT(qn), QTYPE_QNUM); > +qn->type = QNUM_U64; > +qn->u.u64 = value; > + > +return qn; > +} > + > +/** > * qnum_from_double(): Create a new QNum from a double > * > * Return strong reference. > @@ -57,6 +74,34 @@ int64_t qnum_get_int(const QNum *qn, Error **errp) > switch (qn->type) { > case QNUM_I64: > return qn->u.i64; > +case QNUM_U64: > +if (qn->u.u64 > INT64_MAX) { > +error_setg(errp, "The number is too large, use qnum_get_uint()"); The user has no way to "use qnum_get_uint()". > +return 0; > +} > +return qn->u.u64; > +case QNUM_DOUBLE: > +error_setg(errp, "The number is a float"); > +return 0; > +} > + > +g_assert_not_reached(); > +} > + > +/** > + * qnum_get_uint(): Get an unsigned integer from the number > + */ > +uint64_t qnum_get_uint(const QNum *qn, Error **errp) > +{ > +switch (qn->type) { > +case QNUM_I64: > +if (qn->u.i64 < 0) { > +error_setg(errp, "The number is negative"); The new error messages are inadequate in the same way as the existing one. > +return 0; > +} > +return qn->u.i64; > +case QNUM_U64: > +return qn->u.u64; > case QNUM_DOUBLE: > error_setg(errp, "The number is a float"); > return 0; > @@ -73,6 +118,8 @@ double qnum_get_double(QNum *qn) > switch (qn->type) { > case QNUM_I64: > return qn->u.i64; > +case QNUM_U64: > +return qn->u.u64; > case QNUM_DOUBLE: > return qn->u.dbl; > } > @@ -88,6 +135,8 @@ char *qnum_to_string(QNum *qn) > switch (qn->type) { > case QNUM_I64: > return g_strdup_printf("%" PRId64, qn->u.i64); > +case QNUM_U64: > +return g_strdup_printf("%" PRIu64, qn->u.u64); > case QNUM_DOUBLE: > /* FIXME: snprintf() is locale dependent; but JSON requires > * numbers to be formatted as if in the C locale. Dependence > diff --git a/tests/check-qnum.c b/tests/check-qnum.c > index d08d35e85a..8199546f99 100644 > --- a/tests/check-qnum.c > +++ b/tests/check-qnum.c > @@ -36,6 +36,21 @@ static void qnum_from_int_test(void) > g_free(qi); > } > > +static void qnum_from_uint_test(void) > +{ > +QNum *qu; > +const int value = UINT_MAX; > + > +qu = qnum_from_int(value); > +g_assert(qu != NULL); > +g_assert(qu->u.u64 == value); > +g_assert(qu->base.refcnt == 1); > +g_assert(qobject_type(QOBJECT(qu)) == QTYPE_QNUM); > + > +// destroy doesn't exit yet > +g_free(qu); > +} > + Matches the other qnum_from_TYPE_test(). Let's clean them up before this patch: use QDECREF() instead of g_free(), and drop the comment (see Luiz's reply to PATCH 04). > static void qnum_from_double_test(void) > { > QNum *qf; > @@ -73,6 +88,37 @@ static void qnum_get_int_test(void) > QDECREF(qi); > } > > +static void qnum_get_uint_test(void) > +{ > +QNum *qn; > +const int value = 123456; > +Error *err = NULL; > + > +qn = qnum_from_uint(value); > +g_assert(qnum_get_uint(qn, &error_abort) == value); > +QDECREF(qn); > + > +qn =
Re: [Qemu-devel] [PATCH V4 03/12] net/filter-mirror.c: Make filter_mirror_send support vnet support.
On 05/15/2017 11:28 AM, Jason Wang wrote: On 2017年05月12日 09:41, Zhang Chen wrote: In this patch, if vnet_hdr=on we change the send packet format from struct {int size; const uint8_t buf[];} to {int size; int vnet_hdr_len; const uint8_t buf[];}. make other module(like colo-compare) know how to parse net packet correctly. Signed-off-by: Zhang Chen --- net/filter-mirror.c | 35 ++- 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/net/filter-mirror.c b/net/filter-mirror.c index 3766414..64323fc 100644 --- a/net/filter-mirror.c +++ b/net/filter-mirror.c @@ -44,10 +44,11 @@ typedef struct MirrorState { SocketReadState rs; } MirrorState; -static int filter_mirror_send(CharBackend *chr_out, +static int filter_mirror_send(MirrorState *s, const struct iovec *iov, int iovcnt) { +NetFilterState *nf = NETFILTER(s); int ret = 0; ssize_t size = 0; uint32_t len = 0; @@ -59,14 +60,38 @@ static int filter_mirror_send(CharBackend *chr_out, } len = htonl(size); -ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)&len, sizeof(len)); +ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len)); if (ret != sizeof(len)) { goto err; } +if (s->vnet_hdr) { +/* + * If vnet_hdr = on, we send vnet header len to make other + * module(like colo-compare) know how to parse net + * packet correctly. + */ +ssize_t vnet_hdr_len; + +if (nf->netdev->using_vnet_hdr) { +vnet_hdr_len = nf->netdev->vnet_hdr_len; +} else if (nf->netdev->peer->using_vnet_hdr) { +vnet_hdr_len = nf->netdev->peer->vnet_hdr_len; Still questionable here, may need a comment to explain. OK, I will add comments in next version. +} else { +error_report("filter get vnet_hdr_len failed"); Why need error here, could we just use zero? Yes,we can. Thanks Zhang Chen Thanks +goto err; +} + +len = htonl(vnet_hdr_len); +ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len)); +if (ret != sizeof(len)) { +goto err; +} +} + buf = g_malloc(size); iov_to_buf(iov, iovcnt, 0, buf, size); -ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)buf, size); +ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size); g_free(buf); if (ret != size) { goto err; @@ -142,7 +167,7 @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf, MirrorState *s = FILTER_MIRROR(nf); int ret; -ret = filter_mirror_send(&s->chr_out, iov, iovcnt); +ret = filter_mirror_send(s, iov, iovcnt); if (ret) { error_report("filter_mirror_send failed(%s)", strerror(-ret)); } @@ -165,7 +190,7 @@ static ssize_t filter_redirector_receive_iov(NetFilterState *nf, int ret; if (qemu_chr_fe_get_driver(&s->chr_out)) { -ret = filter_mirror_send(&s->chr_out, iov, iovcnt); +ret = filter_mirror_send(s, iov, iovcnt); if (ret) { error_report("filter_mirror_send failed(%s)", strerror(-ret)); } . -- Thanks Zhang Chen
Re: [Qemu-devel] [PATCH V4 04/12] net/filter-mirror.c: Add new option to enable vnet support for filter-redirector
On 05/15/2017 11:31 AM, Jason Wang wrote: On 2017年05月12日 09:41, Zhang Chen wrote: We add the vnet_hdr option for filter-redirector, default is disable. If you use virtio-net-pci net driver, please enable it. We need a better commit log here to explain e.g why we need this option. OK. I will more comments in next version. Thanks Zhang Chen Thanks You can use it for example: -object filter-redirector,id=r0,netdev=hn0,queue=tx,outdev=red0,vnet_hdr=on Signed-off-by: Zhang Chen --- net/filter-mirror.c | 33 + qemu-options.hx | 5 +++-- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/net/filter-mirror.c b/net/filter-mirror.c index 64323fc..a65853c 100644 --- a/net/filter-mirror.c +++ b/net/filter-mirror.c @@ -377,6 +377,13 @@ static char *filter_redirector_get_outdev(Object *obj, Error **errp) return g_strdup(s->outdev); } +static char *filter_redirector_get_vnet_hdr(Object *obj, Error **errp) +{ +MirrorState *s = FILTER_REDIRECTOR(obj); + +return s->vnet_hdr ? g_strdup("on") : g_strdup("off"); +} + static void filter_redirector_set_outdev(Object *obj, const char *value, Error **errp) { @@ -386,6 +393,21 @@ filter_redirector_set_outdev(Object *obj, const char *value, Error **errp) s->outdev = g_strdup(value); } +static void filter_redirector_set_vnet_hdr(Object *obj, + const char *value, + Error **errp) +{ +MirrorState *s = FILTER_REDIRECTOR(obj); + +if (strcmp(value, "on") && strcmp(value, "off")) { +error_setg(errp, "Invalid value for filter-redirector vnet_hdr, " + "should be 'on' or 'off'"); +return; +} + +s->vnet_hdr = !strcmp(value, "on"); +} + static void filter_mirror_init(Object *obj) { MirrorState *s = FILTER_MIRROR(obj); @@ -405,10 +427,21 @@ static void filter_mirror_init(Object *obj) static void filter_redirector_init(Object *obj) { +MirrorState *s = FILTER_REDIRECTOR(obj); + object_property_add_str(obj, "indev", filter_redirector_get_indev, filter_redirector_set_indev, NULL); object_property_add_str(obj, "outdev", filter_redirector_get_outdev, filter_redirector_set_outdev, NULL); + +/* + * The vnet_hdr is disabled by default, if you want to enable + * this option, you must enable all the option on related modules + * (like other filter or colo-compare). + */ +s->vnet_hdr = false; +object_property_add_str(obj, "vnet_hdr", filter_redirector_get_vnet_hdr, +filter_redirector_set_vnet_hdr, NULL); } static void filter_mirror_fini(Object *obj) diff --git a/qemu-options.hx b/qemu-options.hx index 1e08481..0f81c22 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -4031,10 +4031,11 @@ filter-mirror on netdev @var{netdevid},mirror net packet to chardev with vnet_hdr_len. @item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid}, -outdev=@var{chardevid}[,queue=@var{all|rx|tx}] +outdev=@var{chardevid},vnet_hdr=@var{on|off}[,queue=@var{all|rx|tx}] filter-redirector on netdev @var{netdevid},redirect filter's net packet to chardev -@var{chardevid},and redirect indev's packet to filter. +@var{chardevid},and redirect indev's packet to filter.if vnet_hdr = on, +filter-redirector will redirect packet with vnet_hdr_len. Create a filter-redirector we need to differ outdev id from indev id, id can not be the same. we can just use indev or outdev, but at least one of indev or outdev need to be specified. . -- Thanks Zhang Chen
Re: [Qemu-devel] [PATCH V4 05/12] net/net.c: Add vnet_hdr support in SocketReadState
On 05/15/2017 12:02 PM, Jason Wang wrote: On 2017年05月12日 09:41, Zhang Chen wrote: Address Jason Wang's comments add vnet header length to SocketReadState. Better to use "Suggested-by" instead of this :). I got it~ We add a flag to dicide whether net_fill_rstate() to read struct {int size; int vnet_hdr_len; const uint8_t buf[];} or not. Few words to explain the format is better than e.g a struct. OK, will fix it in next version. Signed-off-by: Zhang Chen --- include/net/net.h | 9 +++-- net/colo-compare.c | 4 ++-- net/filter-mirror.c | 2 +- net/net.c | 36 net/socket.c| 2 +- 5 files changed, 43 insertions(+), 10 deletions(-) diff --git a/include/net/net.h b/include/net/net.h index 70edfc0..0763636 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -113,14 +113,19 @@ typedef struct NICState { } NICState; struct SocketReadState { -int state; /* 0 = getting length, 1 = getting data */ +/* 0 = getting length, 1 = getting vnet header length, 2 = getting data */ +int state; uint32_t index; uint32_t packet_len; +uint32_t vnet_hdr_len; uint8_t buf[NET_BUFSIZE]; SocketReadStateFinalize *finalize; }; -int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size); +int net_fill_rstate(SocketReadState *rs, +const uint8_t *buf, +int size, +bool vnet_hdr); Why not just move vnet_hdr to SocketReadState? Good idea. Thanks Zhang Chen Thanks char *qemu_mac_strdup_printf(const uint8_t *macaddr); NetClientState *qemu_find_netdev(const char *id); int qemu_find_net_clients_except(const char *id, NetClientState **ncs, diff --git a/net/colo-compare.c b/net/colo-compare.c index 4ab80b1..332f57e 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -530,7 +530,7 @@ static void compare_pri_chr_in(void *opaque, const uint8_t *buf, int size) CompareState *s = COLO_COMPARE(opaque); int ret; -ret = net_fill_rstate(&s->pri_rs, buf, size); +ret = net_fill_rstate(&s->pri_rs, buf, size, false); if (ret == -1) { qemu_chr_fe_set_handlers(&s->chr_pri_in, NULL, NULL, NULL, NULL, NULL, true); @@ -547,7 +547,7 @@ static void compare_sec_chr_in(void *opaque, const uint8_t *buf, int size) CompareState *s = COLO_COMPARE(opaque); int ret; -ret = net_fill_rstate(&s->sec_rs, buf, size); +ret = net_fill_rstate(&s->sec_rs, buf, size, false); if (ret == -1) { qemu_chr_fe_set_handlers(&s->chr_sec_in, NULL, NULL, NULL, NULL, NULL, true); diff --git a/net/filter-mirror.c b/net/filter-mirror.c index a65853c..4649416 100644 --- a/net/filter-mirror.c +++ b/net/filter-mirror.c @@ -134,7 +134,7 @@ static void redirector_chr_read(void *opaque, const uint8_t *buf, int size) MirrorState *s = FILTER_REDIRECTOR(nf); int ret; -ret = net_fill_rstate(&s->rs, buf, size); +ret = net_fill_rstate(&s->rs, buf, size, s->vnet_hdr); if (ret == -1) { qemu_chr_fe_set_handlers(&s->chr_in, NULL, NULL, NULL, diff --git a/net/net.c b/net/net.c index a00a0c9..a9c97cf 100644 --- a/net/net.c +++ b/net/net.c @@ -1618,13 +1618,20 @@ void net_socket_rs_init(SocketReadState *rs, * 0: success * -1: error occurs */ -int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size) +int net_fill_rstate(SocketReadState *rs, +const uint8_t *buf, +int size, +bool vnet_hdr) { unsigned int l; while (size > 0) { -/* reassemble a packet from the network */ -switch (rs->state) { /* 0 = getting length, 1 = getting data */ +/* Reassemble a packet from the network. + * 0 = getting length. + * 1 = getting vnet header length. + * 2 = getting data. + */ +switch (rs->state) { case 0: l = 4 - rs->index; if (l > size) { @@ -1638,10 +1645,31 @@ int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size) /* got length */ rs->packet_len = ntohl(*(uint32_t *)rs->buf); rs->index = 0; -rs->state = 1; +if (vnet_hdr) { +rs->state = 1; +} else { +rs->state = 2; +rs->vnet_hdr_len = 0; +} } break; case 1: +l = 4 - rs->index; +if (l > size) { +l = size; +} +memcpy(rs->buf + rs->index, buf, l); +buf += l; +size -= l; +rs->index += l; +if (rs->index == 4) { +/* got vnet header length */ +rs->vnet_hdr_len = ntohl(*(uint32_t *)rs->buf)
Re: [Qemu-devel] [PATCH V4 06/12] net/colo-compare.c: Add new option to enable vnet support for colo-compare
On 05/15/2017 12:03 PM, Jason Wang wrote: On 2017年05月12日 09:41, Zhang Chen wrote: We add the vnet_hdr option for colo-compare, default is disable. If you use virtio-net-pci net driver, please enable it. You can use it for example: -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr=on Signed-off-by: Zhang Chen --- net/colo-compare.c | 34 +- qemu-options.hx| 3 ++- 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 332f57e..99a6912 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -73,6 +73,7 @@ typedef struct CompareState { CharBackend chr_out; SocketReadState pri_rs; SocketReadState sec_rs; +bool vnet_hdr; /* connection list: the connections belonged to this NIC could be found * in this list. @@ -642,6 +643,28 @@ static void compare_set_outdev(Object *obj, const char *value, Error **errp) s->outdev = g_strdup(value); } +static char *compare_get_vnet_hdr(Object *obj, Error **errp) +{ +CompareState *s = COLO_COMPARE(obj); + +return s->vnet_hdr ? g_strdup("on") : g_strdup("off"); +} + +static void compare_set_vnet_hdr(Object *obj, + const char *value, + Error **errp) +{ +CompareState *s = COLO_COMPARE(obj); + +if (strcmp(value, "on") && strcmp(value, "off")) { +error_setg(errp, "Invalid value for colo-compare vnet_hdr, " + "should be 'on' or 'off'"); +return; +} + +s->vnet_hdr = !strcmp(value, "on"); +} + static void compare_pri_rs_finalize(SocketReadState *pri_rs) { CompareState *s = container_of(pri_rs, CompareState, pri_rs); @@ -667,7 +690,6 @@ static void compare_sec_rs_finalize(SocketReadState *sec_rs) } } - Unnecessary whitespace change. I will remove it in next version. /* * Return 0 is success. * Return 1 is failed. @@ -775,6 +797,8 @@ static void colo_compare_class_init(ObjectClass *oc, void *data) static void colo_compare_init(Object *obj) { +CompareState *s = COLO_COMPARE(obj); + object_property_add_str(obj, "primary_in", compare_get_pri_indev, compare_set_pri_indev, NULL); @@ -784,6 +808,14 @@ static void colo_compare_init(Object *obj) object_property_add_str(obj, "outdev", compare_get_outdev, compare_set_outdev, NULL); +/* + * The vnet_hdr is disabled by default, if you want to enable + * this option, you must enable all the option on related modules + * (like other filter or colo-compare). + */ +s->vnet_hdr = false; +object_property_add_str(obj, "vnet_hdr", compare_get_vnet_hdr, +compare_set_vnet_hdr, NULL); } static void colo_compare_finalize(Object *obj) diff --git a/qemu-options.hx b/qemu-options.hx index 0f81c22..115b83f 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -4061,12 +4061,13 @@ The file format is libpcap, so it can be analyzed with tools such as tcpdump or Wireshark. @item -object colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid}, -outdev=@var{chardevid} +outdev=@var{chardevid},vnet_hdr=@var{on|off} Colo-compare gets packet from primary_in@var{chardevid} and secondary_in@var{chardevid}, than compare primary packet with secondary packet. If the packets are same, we will output primary packet to outdev@var{chardevid}, else we will notify colo-frame do checkpoint and send primary packet to outdev@var{chardevid}. +if vnet_hdr = on, colo compare will send/recv packet with vnet_hdr_len. we must use it with the help of filter-mirror and filter-redirector. Please squash this into its function implementation. OK. Thanks Zhang Chen Thanks . -- Thanks Zhang Chen
Re: [Qemu-devel] [PATCH V4 07/12] net/colo.c: Make vnet_hdr_len as packet property
On 05/15/2017 12:05 PM, Jason Wang wrote: On 2017年05月12日 09:41, Zhang Chen wrote: We can use this property flush and send packet with vnet_hdr_len. Signed-off-by: Zhang Chen Then I think it's not necessary to store vnet_hdr_len in SocketReadState? Do you means we keep the patch 05/12 in original? Thanks Zhang Chen Thanks --- net/colo-compare.c| 8 ++-- net/colo.c| 3 ++- net/colo.h| 4 +++- net/filter-rewriter.c | 2 +- 4 files changed, 12 insertions(+), 5 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 99a6912..87a9529 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -122,9 +122,13 @@ static int packet_enqueue(CompareState *s, int mode) Connection *conn; if (mode == PRIMARY_IN) { -pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len); +pkt = packet_new(s->pri_rs.buf, + s->pri_rs.packet_len, + s->pri_rs.vnet_hdr_len); } else { -pkt = packet_new(s->sec_rs.buf, s->sec_rs.packet_len); +pkt = packet_new(s->sec_rs.buf, + s->sec_rs.packet_len, + s->sec_rs.vnet_hdr_len); } if (parse_packet_early(pkt)) { diff --git a/net/colo.c b/net/colo.c index 8cc166b..180eaed 100644 --- a/net/colo.c +++ b/net/colo.c @@ -153,13 +153,14 @@ void connection_destroy(void *opaque) g_slice_free(Connection, conn); } -Packet *packet_new(const void *data, int size) +Packet *packet_new(const void *data, int size, int vnet_hdr_len) { Packet *pkt = g_slice_new(Packet); pkt->data = g_memdup(data, size); pkt->size = size; pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST); +pkt->vnet_hdr_len = vnet_hdr_len; return pkt; } diff --git a/net/colo.h b/net/colo.h index 7c524f3..caedb0d 100644 --- a/net/colo.h +++ b/net/colo.h @@ -43,6 +43,8 @@ typedef struct Packet { int size; /* Time of packet creation, in wall clock ms */ int64_t creation_ms; +/* Get vnet_hdr_len from filter */ +uint32_t vnet_hdr_len; } Packet; typedef struct ConnectionKey { @@ -82,7 +84,7 @@ Connection *connection_get(GHashTable *connection_track_table, ConnectionKey *key, GQueue *conn_list); void connection_hashtable_reset(GHashTable *connection_track_table); -Packet *packet_new(const void *data, int size); +Packet *packet_new(const void *data, int size, int vnet_hdr_len); void packet_destroy(void *opaque, void *user_data); #endif /* QEMU_COLO_PROXY_H */ diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index afa06e8..63256c7 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -158,7 +158,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf, char *buf = g_malloc0(size); iov_to_buf(iov, iovcnt, 0, buf, size); -pkt = packet_new(buf, size); +pkt = packet_new(buf, size, 0); g_free(buf); /* . -- Thanks Zhang Chen
Re: [Qemu-devel] [PATCH V4 10/12] net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare
On 05/15/2017 12:11 PM, Jason Wang wrote: On 2017年05月12日 09:41, Zhang Chen wrote: COLO-Proxy just focus on packet payload, So we skip vnet header. Signed-off-by: Zhang Chen --- net/colo-compare.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index cb0b04e..bf565f3 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -188,6 +188,8 @@ static int packet_enqueue(CompareState *s, int mode) */ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) { +int offset_all; + if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20]; @@ -201,9 +203,12 @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) sec_ip_src, sec_ip_dst); } +offset_all = ppkt->vnet_hdr_len + offset; How about just keep using offset by increasing it with vnet_hdr_len? OK, I will fix it in next version. Thanks Zhang Chen Thanks + if (ppkt->size == spkt->size) { -return memcmp(ppkt->data + offset, spkt->data + offset, - spkt->size - offset); +return memcmp(ppkt->data + offset_all, + spkt->data + offset_all, + spkt->size - offset_all); } else { trace_colo_compare_main("Net packet size are not the same"); return -1; @@ -261,8 +266,9 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) */ if (ptcp->th_off > 5) { ptrdiff_t tcp_offset; + tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data - + (ptcp->th_off * 4); + + (ptcp->th_off * 4) - ppkt->vnet_hdr_len; res = colo_packet_compare_common(ppkt, spkt, tcp_offset); } else if (ptcp->th_sum == stcp->th_sum) { res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN); . -- Thanks Zhang Chen
Re: [Qemu-devel] [PATCH V4 11/12] net/filter-rewriter.c: Add new option to enable vnet support for filter-rewriter
On 05/15/2017 12:12 PM, Jason Wang wrote: On 2017年05月12日 09:41, Zhang Chen wrote: We add the vnet_hdr option for filter-rewriter, default is disable. If you use virtio-net-pci net driver, please enable it. You can use it for example: -object filter-rewriter,id=rew0,netdev=hn0,queue=all,vnet_hdr=on Signed-off-by: Zhang Chen As has been pointed out, let's squash this into patch 12. OK~ Thanks Zhang Chen Thanks -- Thanks Zhang Chen
Re: [Qemu-devel] [PATCH V4 01/12] net: Add vnet_hdr_len related arguments in NetClientState
On 2017年05月15日 14:56, Zhang Chen wrote: On 05/15/2017 11:24 AM, Jason Wang wrote: On 2017年05月14日 05:24, Philippe Mathieu-Daudé wrote: Hi Zhang On 05/11/2017 10:41 PM, Zhang Chen wrote: Add vnet_hdr_len and using_vnet_hdr arguments in NetClientState that make othermodule get real vnet_hdr_len easily. Signed-off-by: Zhang Chen --- include/net/net.h | 2 ++ net/net.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/net/net.h b/include/net/net.h index 99b28d5..70edfc0 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -100,6 +100,8 @@ struct NetClientState { unsigned int queue_index; unsigned rxfilter_notify_enabled:1; int vring_enable; +bool using_vnet_hdr; +int vnet_hdr_len; QTAILQ_HEAD(NetFilterHead, NetFilterState) filters; }; diff --git a/net/net.c b/net/net.c index 0ac3b9e..a00a0c9 100644 --- a/net/net.c +++ b/net/net.c @@ -472,6 +472,7 @@ void qemu_using_vnet_hdr(NetClientState *nc, bool enable) return; } +nc->using_vnet_hdr = enable; nc->info->using_vnet_hdr(nc, enable); } @@ -491,6 +492,7 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len) return; } +nc->vnet_hdr_len = len; nc->info->set_vnet_hdr_len(nc, len); } For what it's worth, now having those fields in NetClientState it is possible to remove a deref to NetClientInfo in qemu_has_vnet_hdr() and qemu_has_vnet_hdr_len(). Anyway, Reviewed-by: Philippe Mathieu-Daudé Yes and this could be done on top with removing private e.g vnet_hdr_len. Do you means we will remove the qemu_has_vnet_hdr() and qemu_has_vnet_hdr_len() in the future? Thanks Zhang Chen Yes and e.g both tap and netmap have its private vnet header field. We can remove them too. Thanks Thanks .
Re: [Qemu-devel] [PATCH V4 01/12] net: Add vnet_hdr_len related arguments in NetClientState
On 05/15/2017 04:06 PM, Jason Wang wrote: On 2017年05月15日 14:56, Zhang Chen wrote: On 05/15/2017 11:24 AM, Jason Wang wrote: On 2017年05月14日 05:24, Philippe Mathieu-Daudé wrote: Hi Zhang On 05/11/2017 10:41 PM, Zhang Chen wrote: Add vnet_hdr_len and using_vnet_hdr arguments in NetClientState that make othermodule get real vnet_hdr_len easily. Signed-off-by: Zhang Chen --- include/net/net.h | 2 ++ net/net.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/net/net.h b/include/net/net.h index 99b28d5..70edfc0 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -100,6 +100,8 @@ struct NetClientState { unsigned int queue_index; unsigned rxfilter_notify_enabled:1; int vring_enable; +bool using_vnet_hdr; +int vnet_hdr_len; QTAILQ_HEAD(NetFilterHead, NetFilterState) filters; }; diff --git a/net/net.c b/net/net.c index 0ac3b9e..a00a0c9 100644 --- a/net/net.c +++ b/net/net.c @@ -472,6 +472,7 @@ void qemu_using_vnet_hdr(NetClientState *nc, bool enable) return; } +nc->using_vnet_hdr = enable; nc->info->using_vnet_hdr(nc, enable); } @@ -491,6 +492,7 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len) return; } +nc->vnet_hdr_len = len; nc->info->set_vnet_hdr_len(nc, len); } For what it's worth, now having those fields in NetClientState it is possible to remove a deref to NetClientInfo in qemu_has_vnet_hdr() and qemu_has_vnet_hdr_len(). Anyway, Reviewed-by: Philippe Mathieu-Daudé Yes and this could be done on top with removing private e.g vnet_hdr_len. Do you means we will remove the qemu_has_vnet_hdr() and qemu_has_vnet_hdr_len() in the future? Thanks Zhang Chen Yes and e.g both tap and netmap have its private vnet header field. We can remove them too. Maybe I should do this job in this series next version or a independent series? Thanks Zhang Chen Thanks Thanks . . -- Thanks Zhang Chen
Re: [Qemu-devel] [PATCH V4 07/12] net/colo.c: Make vnet_hdr_len as packet property
On 2017年05月15日 16:03, Zhang Chen wrote: On 05/15/2017 12:05 PM, Jason Wang wrote: On 2017年05月12日 09:41, Zhang Chen wrote: We can use this property flush and send packet with vnet_hdr_len. Signed-off-by: Zhang Chen Then I think it's not necessary to store vnet_hdr_len in SocketReadState? Do you means we keep the patch 05/12 in original? Thanks Zhang Chen I mean we could fetch vnet_hdr_len from the buf directly. Or is there any advantage to store it in SocketReadState? Thanks
Re: [Qemu-devel] [PATCH RESEND v2 01/21] qdev: Replace cannot_instantiate_with_device_add_yet with !user_creatable
Eduardo Habkost writes: > cannot_instantiate_with_device_add_yet was introduced by commit > efec3dd631d94160288392721a5f9c39e50fb2bc to replace no_user. It was > supposed to be a temporary measure. > > When it was introduced, we had 54 > cannot_instantiate_with_device_add_yet=true lines in the code. > Today (3 years later) this number has not shrinked: we now have shrunk > 57 cannot_instantiate_with_device_add_yet=true lines. I think it > is safe to say it is not a temporary measure, and we won't see > the flag go away soon. > > Instead of a long field name that misleads people to believe it > is temporary, replace it a shorter and less misleading field: > user_creatable. > > Except for code comments, changes were generated using the > following Coccinelle patch: > > @@ > expression DC; > @@ > ( > -DC->cannot_instantiate_with_device_add_yet = false; > +DC->user_creatable = true; > | > -DC->cannot_instantiate_with_device_add_yet = true; > +DC->user_creatable = false; > ) > > @@ > typedef ObjectClass; > expression dc; > identifier class, data; > @@ >static void device_class_init(ObjectClass *class, void *data) >{ >... >dc->hotpluggable = true; > +dc->user_creatable = true; >... >} > > @@ > @@ >struct DeviceClass { >... > -bool cannot_instantiate_with_device_add_yet; > +bool user_creatable; >... > } > > @@ > expression DC; > @@ > ( > -!DC->cannot_instantiate_with_device_add_yet > +DC->user_creatable > | > -DC->cannot_instantiate_with_device_add_yet > +!DC->user_creatable > ) > > Cc: Alistair Francis > Cc: Laszlo Ersek > Cc: Marcel Apfelbaum > Cc: Markus Armbruster > Cc: Peter Maydell > Cc: Thomas Huth > Acked-by: Alistair Francis > Reviewed-by: Thomas Huth > Reviewed-by: Marcel Apfelbaum > Acked-by: Marcel Apfelbaum > Signed-off-by: Eduardo Habkost > --- > Changes v1 -> v2: > * (none) > > Changes v2 -> v3: > * Fixed commit ID reference on commit message > * (No code changes) > --- > include/hw/qdev-core.h | 10 +- > include/hw/qdev-properties.h| 4 ++-- > hw/acpi/piix4.c | 2 +- > hw/arm/spitz.c | 2 +- > hw/audio/marvell_88w8618.c | 2 +- > hw/audio/pcspk.c| 2 +- > hw/core/or-irq.c| 2 +- > hw/core/qdev.c | 1 + > hw/core/register.c | 2 +- > hw/dma/i8257.c | 2 +- > hw/dma/sparc32_dma.c| 2 +- > hw/gpio/omap_gpio.c | 4 ++-- > hw/i2c/omap_i2c.c | 2 +- > hw/i2c/smbus_eeprom.c | 2 +- > hw/i2c/smbus_ich9.c | 2 +- > hw/i386/pc.c| 2 +- > hw/input/vmmouse.c | 2 +- > hw/intc/apic_common.c | 2 +- > hw/intc/etraxfs_pic.c | 2 +- > hw/intc/grlib_irqmp.c | 2 +- > hw/intc/i8259_common.c | 2 +- > hw/intc/nios2_iic.c | 2 +- > hw/intc/omap_intc.c | 4 ++-- > hw/isa/lpc_ich9.c | 2 +- > hw/isa/piix4.c | 2 +- > hw/isa/vt82c686.c | 2 +- > hw/mips/gt64xxx_pci.c | 2 +- > hw/misc/vmport.c| 2 +- > hw/net/dp8393x.c| 2 +- > hw/net/etraxfs_eth.c| 2 +- > hw/net/lance.c | 2 +- > hw/pci-bridge/dec.c | 2 +- > hw/pci-bridge/pci_expander_bridge.c | 2 +- > hw/pci-host/apb.c | 2 +- > hw/pci-host/bonito.c| 2 +- > hw/pci-host/gpex.c | 2 +- > hw/pci-host/grackle.c | 2 +- > hw/pci-host/piix.c | 6 +++--- > hw/pci-host/ppce500.c | 2 +- > hw/pci-host/prep.c | 2 +- > hw/pci-host/q35.c | 4 ++-- > hw/pci-host/uninorth.c | 8 > hw/pci-host/versatile.c | 2 +- > hw/pci-host/xilinx-pcie.c | 2 +- > hw/ppc/ppc4xx_pci.c | 2 +- > hw/ppc/spapr_drc.c | 2 +- > hw/s390x/s390-pci-bus.c | 2 +- > hw/sd/milkymist-memcard.c | 2 +- > hw/sd/pl181.c | 2 +- > hw/sh4/sh_pci.c | 2 +- > hw/timer/i8254_common.c | 2 +- > hw/timer/mc146818rtc.c | 2 +- > monitor.c | 2 +- > qdev-monitor.c | 6 +++--- > qom/cpu.c | 2 +- > target/i386/cpu.c | 2 +- > 56 files changed, 71 insertions(+), 70 deletions(-) > > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h > index 4bf86b0ad8..6ee49fbe33 100644 > --- a/include/hw/qdev-core.h > +++ b/include/hw/qdev-core.h > @@ -103,16 +103,16 @@ typedef struct DeviceClass { > Property *props; > > /* > -
[Qemu-devel] [PATCH] target/ppc: reset reservation in do_rfi()
For transitioning back to userspace after the interrupt. Suggested-by: Richard Henderson Signed-off-by: Nikunj A Dadhania --- target/ppc/excp_helper.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c index a6bcb47..9cb2123 100644 --- a/target/ppc/excp_helper.c +++ b/target/ppc/excp_helper.c @@ -995,6 +995,9 @@ static inline void do_rfi(CPUPPCState *env, target_ulong nip, target_ulong msr) */ cs->interrupt_request |= CPU_INTERRUPT_EXITTB; +/* Reset the reservation */ +env->reserve_addr = -1; + /* Context synchronizing: check if TCG TLB needs flush */ check_tlb_flush(env, false); } -- 2.9.3
Re: [Qemu-devel] [PATCH RESEND v2 03/21] xen-backend: Remove FIXME comment about user_creatable flag
Stefano or Anthony, please review. Eduardo Habkost writes: > xen-backend can be plugged/unplugged dynamically when using the > Xen accelerator, so keep the user_creatable flag on the device > class and remove the FIXME comment. > > Cc: Juergen Gross , > Cc: Peter Maydell , > Cc: Thomas Huth > Cc: sstabell...@kernel.org > Cc: Markus Armbruster , > Cc: Marcel Apfelbaum , > Cc: Laszlo Ersek > Acked-by: Juergen Gross > Acked-by: Marcel Apfelbaum > Signed-off-by: Eduardo Habkost > --- > Changes series v1 -> v2: > * (New patch added to series) > --- > hw/xen/xen_backend.c | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c > index 67cb4cb9f0..2b91d2d458 100644 > --- a/hw/xen/xen_backend.c > +++ b/hw/xen/xen_backend.c > @@ -619,10 +619,7 @@ static void xendev_class_init(ObjectClass *klass, void > *data) > > dc->props = xendev_properties; > set_bit(DEVICE_CATEGORY_MISC, dc->categories); > -/* > - * FIXME: Set only because we are not sure yet if this device > - * will be outside the q35 sysbus whitelist. > - */ > +/* xen-backend devices can be plugged/unplugged dynamically */ > dc->user_creatable = true; > }
[Qemu-devel] [PATCH 0/4] exec: address space translation cleanups
The problem is that, address_space_get_iotlb_entry() shares a lot with address_space_translate(). This patch tries to abstract the shared elements. Originally, this work is derived from discussion from VT-d passthrough series discussions [1]. But for sure we can just see this series as a standalone cleanup. So I posted it separately here. Smoke tests are done with general VM boots, IOs, especially with vhost dmar configurations. I believe with current series I can throw away the old patch [1], which may be good. But before that, please kindly review. Thanks. References: [1] https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg02335.html Peter Xu (4): exec: simplify phys_page_find() params exec: rename resolve_subpage exec: further use is_mmio exec: abstract address_space_do_translate() exec.c | 122 - 1 file changed, 76 insertions(+), 46 deletions(-) -- 2.7.4
[Qemu-devel] [PATCH 1/4] exec: simplify phys_page_find() params
It really only plays with the dispatchers, so the parameter list does not need that complexity. This helps for readability at least. Signed-off-by: Peter Xu --- exec.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/exec.c b/exec.c index eac6085..1d76c63 100644 --- a/exec.c +++ b/exec.c @@ -371,10 +371,11 @@ static inline bool section_covers_addr(const MemoryRegionSection *section, int128_getlo(section->size), addr); } -static MemoryRegionSection *phys_page_find(PhysPageEntry lp, hwaddr addr, - Node *nodes, MemoryRegionSection *sections) +static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr addr) { -PhysPageEntry *p; +PhysPageEntry lp = d->phys_map, *p; +Node *nodes = d->map.nodes; +MemoryRegionSection *sections = d->map.sections; hwaddr index = addr >> TARGET_PAGE_BITS; int i; @@ -412,8 +413,7 @@ static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch *d, section_covers_addr(section, addr)) { update = false; } else { -section = phys_page_find(d->phys_map, addr, d->map.nodes, - d->map.sections); +section = phys_page_find(d, addr); update = true; } if (resolve_subpage && section->mr->subpage) { @@ -1246,8 +1246,7 @@ static void register_subpage(AddressSpaceDispatch *d, MemoryRegionSection *secti subpage_t *subpage; hwaddr base = section->offset_within_address_space & TARGET_PAGE_MASK; -MemoryRegionSection *existing = phys_page_find(d->phys_map, base, - d->map.nodes, d->map.sections); +MemoryRegionSection *existing = phys_page_find(d, base); MemoryRegionSection subsection = { .offset_within_address_space = base, .size = int128_make64(TARGET_PAGE_SIZE), -- 2.7.4
[Qemu-devel] [PATCH 3/4] exec: further use is_mmio
This is MMIO-specific tunes on the size. Let's skip it for non-MMIO translations. Signed-off-by: Peter Xu --- exec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exec.c b/exec.c index 0adae94..32e5394 100644 --- a/exec.c +++ b/exec.c @@ -455,7 +455,7 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x * everything works fine. If the incoming length is large, however, * the caller really has to do the clamping through memory_access_size. */ -if (memory_region_is_ram(mr)) { +if (is_mmio && memory_region_is_ram(mr)) { diff = int128_sub(section->size, int128_make64(addr)); *plen = int128_get64(int128_min(diff, int128_make64(*plen))); } -- 2.7.4
[Qemu-devel] [PATCH 2/4] exec: rename resolve_subpage
It is not easy for people to know "what this parameter does" before knowing "what is subpage". Let's use "is_mmio" to make it easier to understand. Signed-off-by: Peter Xu --- exec.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/exec.c b/exec.c index 1d76c63..0adae94 100644 --- a/exec.c +++ b/exec.c @@ -403,7 +403,7 @@ bool memory_region_is_unassigned(MemoryRegion *mr) /* Called from RCU critical section */ static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch *d, hwaddr addr, -bool resolve_subpage) +bool is_mmio) { MemoryRegionSection *section = atomic_read(&d->mru_section); subpage_t *subpage; @@ -416,7 +416,7 @@ static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch *d, section = phys_page_find(d, addr); update = true; } -if (resolve_subpage && section->mr->subpage) { +if (is_mmio && section->mr->subpage) { subpage = container_of(section->mr, subpage_t, iomem); section = &d->map.sections[subpage->sub_section[SUBPAGE_IDX(addr)]]; } @@ -429,13 +429,13 @@ static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch *d, /* Called from RCU critical section */ static MemoryRegionSection * address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *xlat, - hwaddr *plen, bool resolve_subpage) + hwaddr *plen, bool is_mmio) { MemoryRegionSection *section; MemoryRegion *mr; Int128 diff; -section = address_space_lookup_region(d, addr, resolve_subpage); +section = address_space_lookup_region(d, addr, is_mmio); /* Compute offset within MemoryRegionSection */ addr -= section->offset_within_address_space; -- 2.7.4
[Qemu-devel] [PATCH 4/4] exec: abstract address_space_do_translate()
This function is an abstraction helper for address_space_translate() and address_space_get_iotlb_entry(). It does the lookup of address into memory region section, then do proper IOMMU translation if necessary. With that, refactor the two existing functions a bit to use it. Signed-off-by: Peter Xu --- exec.c | 99 +++--- 1 file changed, 65 insertions(+), 34 deletions(-) diff --git a/exec.c b/exec.c index 32e5394..efc80a8 100644 --- a/exec.c +++ b/exec.c @@ -463,18 +463,20 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x } /* Called from RCU critical section */ -IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, -bool is_write) +static MemoryRegionSection address_space_do_translate(AddressSpace *as, + hwaddr addr, + hwaddr *xlat, + hwaddr *plen, + bool is_write, + bool is_mmio) { -IOMMUTLBEntry iotlb = {0}; +IOMMUTLBEntry iotlb; MemoryRegionSection *section; MemoryRegion *mr; for (;;) { AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch); -section = address_space_lookup_region(d, addr, false); -addr = addr - section->offset_within_address_space - + section->offset_within_region; +section = address_space_translate_internal(d, addr, &addr, plen, is_mmio); mr = section->mr; if (!mr->iommu_ops) { @@ -482,55 +484,84 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, } iotlb = mr->iommu_ops->translate(mr, addr, is_write); +addr = ((iotlb.translated_addr & ~iotlb.addr_mask) +| (addr & iotlb.addr_mask)); +*plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1); if (!(iotlb.perm & (1 << is_write))) { -iotlb.target_as = NULL; -break; +goto translate_fail; } -addr = ((iotlb.translated_addr & ~iotlb.addr_mask) -| (addr & iotlb.addr_mask)); as = iotlb.target_as; } -return iotlb; +*xlat = addr; + +return *section; + +translate_fail: +return (MemoryRegionSection) { .mr = &io_mem_unassigned }; } /* Called from RCU critical section */ -MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr, - hwaddr *xlat, hwaddr *plen, - bool is_write) +IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, +bool is_write) { -IOMMUTLBEntry iotlb; -MemoryRegionSection *section; -MemoryRegion *mr; +MemoryRegionSection section; +hwaddr xlat, plen; -for (;;) { -AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch); -section = address_space_translate_internal(d, addr, &addr, plen, true); -mr = section->mr; +/* Try to get maximum page mask during translation. */ +plen = (hwaddr)-1; -if (!mr->iommu_ops) { -break; -} +/* This can never be MMIO. */ +section = address_space_do_translate(as, addr, &xlat, &plen, + is_write, false); -iotlb = mr->iommu_ops->translate(mr, addr, is_write); -addr = ((iotlb.translated_addr & ~iotlb.addr_mask) -| (addr & iotlb.addr_mask)); -*plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1); -if (!(iotlb.perm & (1 << is_write))) { -mr = &io_mem_unassigned; -break; -} +/* Illegal translation */ +if (section.mr == &io_mem_unassigned) { +goto iotlb_fail; +} -as = iotlb.target_as; +if (plen == (hwaddr)-1) { +/* + * We use default page size here. Logically it only happens + * for identity mappings. + */ +plen = TARGET_PAGE_SIZE; } +/* Convert to address mask */ +plen -= 1; + +return (IOMMUTLBEntry) { +.target_as = section.address_space, +.iova = addr & ~plen, +.translated_addr = xlat & ~plen, +.addr_mask = plen, +/* IOTLBs are for DMAs, and DMA only allows on RAMs. */ +.perm = IOMMU_RW, +}; + +iotlb_fail: +return (IOMMUTLBEntry) {0}; +} + +/* Called from RCU critical section */ +MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr, + hwaddr *xlat, hwaddr *plen, + bool is_write) +{ +MemoryRegion *mr; +MemoryRegionSection section; + +/* This can be MMIO, so setup MMIO bit. */ +section = address
Re: [Qemu-devel] [PATCH v3 04/12] memory: fix address_space_get_iotlb_entry()
On Fri, May 12, 2017 at 03:25:08PM +1000, David Gibson wrote: > On Thu, May 11, 2017 at 05:36:03PM +0800, Peter Xu wrote: > > On Thu, May 11, 2017 at 11:56:38AM +1000, David Gibson wrote: > > > On Wed, May 10, 2017 at 04:01:47PM +0800, Peter Xu wrote: > > > > This function has an assumption that we will definitely call translate() > > > > once (or say, the addr will be located inside one IOMMU memory region), > > > > otherwise an empty IOTLB will be returned. Nevertheless, this is not > > > > what we want. When there is no IOMMU memory region, we should build up a > > > > static mapping for the caller, instead of an invalid IOTLB. > > > > > > > > We won't trigger this path before VT-d passthrough mode. When > > > > passthrough mode for a vhost device is setup, VT-d is possible to > > > > disable the IOMMU region for that device. Without current patch, we'll > > > > get a vhost boot failure, and it'll be failed over to virtio userspace > > > > mode. > > > > > > This doesn't look right to me. You're assuming the target is > > > address_space_memory, which might not be the case - and you should be > > > able to check from the MR you do hit. Furthermore it doesn't look > > > like you're accounting for the trivial translation if the section's > > > offset in the address space is different from its offset in the MR. > > > > Do you mean this line? > > > > addr = addr - section->offset_within_address_space > >+ section->offset_within_region; > > Uh.. where is that line? But.. wait, yes, I think I was mistaken. I saw: > .translated_addr = iova, > > and thought that meant you were assuming an identify mapping from iova > to translated addr. But thinking more carefull, IIRC iova and > translated_addr are both relative to the MR, not the AS, so I think > that is correct after all. > > > I thought it was calculating the relative address against that memory > > region. That should only be useful if we want to do further > > translate(), right? For the path that this patch tries to handle (when > > there is no translate() call), then this "addr" is useless here? > > > > Regarding to the address space assignment - do you mean, e.g., I > > should use section->address_space here instead of > > &system_address_space? If so, I can do the switch. > > Yes, I think you should. > > > But after all, for > > now address_space_get_iotlb_entry() is only used by vhost codes, and > > it only check against iotlb.target_as == NULL, so the address space > > didn't count too much here... > > > Another reason I used &address_space_memory is that in > > vfio_iommu_map_notify() we have a check against it: > > > > if (iotlb->target_as != &address_space_memory) { > > error_report("Wrong target AS \"%s\", only system memory is > > allowed", > > iotlb->target_as->name ? iotlb->target_as->name : > > "none"); > > return; > > } > > > > Or say, we have some assumptions (not only in this patch) that assumes > > this iotlb.target_as should be system_address_space. > > Right, the vhost code can only handle some IOMMU setups - something > like nested IOMMUs would break it. But this way if someone sets up a > machine with an IOMMU configuration that vhost can't handle, you'll > get an error message, rather than accesses to unexpected locations, > which could cause really hard to debug corruption. > > In other words we make assumptions, but we should _test_ those > assumptions. > > I also think it would make sense to use address_space_translate() if we > can, since it's an existing interface for a very similar operation. I did give it a shot to generalize these two functions in this series (I just posted it): [PATCH 0/4] exec: address space translation cleanups Please see whether that'll be a good replacement of this single patch. Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH 2/4] exec: rename resolve_subpage
On 15/05/2017 10:50, Peter Xu wrote: > It is not easy for people to know "what this parameter does" before > knowing "what is subpage". Let's use "is_mmio" to make it easier to > understand. > > Signed-off-by: Peter Xu Maybe invert the direction and call it for_iotlb? Paolo > --- > exec.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/exec.c b/exec.c > index 1d76c63..0adae94 100644 > --- a/exec.c > +++ b/exec.c > @@ -403,7 +403,7 @@ bool memory_region_is_unassigned(MemoryRegion *mr) > /* Called from RCU critical section */ > static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch > *d, > hwaddr addr, > -bool resolve_subpage) > +bool is_mmio) > { > MemoryRegionSection *section = atomic_read(&d->mru_section); > subpage_t *subpage; > @@ -416,7 +416,7 @@ static MemoryRegionSection > *address_space_lookup_region(AddressSpaceDispatch *d, > section = phys_page_find(d, addr); > update = true; > } > -if (resolve_subpage && section->mr->subpage) { > +if (is_mmio && section->mr->subpage) { > subpage = container_of(section->mr, subpage_t, iomem); > section = &d->map.sections[subpage->sub_section[SUBPAGE_IDX(addr)]]; > } > @@ -429,13 +429,13 @@ static MemoryRegionSection > *address_space_lookup_region(AddressSpaceDispatch *d, > /* Called from RCU critical section */ > static MemoryRegionSection * > address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, > hwaddr *xlat, > - hwaddr *plen, bool resolve_subpage) > + hwaddr *plen, bool is_mmio) > { > MemoryRegionSection *section; > MemoryRegion *mr; > Int128 diff; > > -section = address_space_lookup_region(d, addr, resolve_subpage); > +section = address_space_lookup_region(d, addr, is_mmio); > /* Compute offset within MemoryRegionSection */ > addr -= section->offset_within_address_space; > >
Re: [Qemu-devel] [PATCH 3/4] exec: further use is_mmio
On 15/05/2017 10:50, Peter Xu wrote: > This is MMIO-specific tunes on the size. Let's skip it for non-MMIO > translations. Can you explain this better? This is not specific to MMIO, in fact it's for RAM... Paolo > Signed-off-by: Peter Xu > --- > exec.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/exec.c b/exec.c > index 0adae94..32e5394 100644 > --- a/exec.c > +++ b/exec.c > @@ -455,7 +455,7 @@ address_space_translate_internal(AddressSpaceDispatch *d, > hwaddr addr, hwaddr *x > * everything works fine. If the incoming length is large, however, > * the caller really has to do the clamping through memory_access_size. > */ > -if (memory_region_is_ram(mr)) { > +if (is_mmio && memory_region_is_ram(mr)) { > diff = int128_sub(section->size, int128_make64(addr)); > *plen = int128_get64(int128_min(diff, int128_make64(*plen)));
Re: [Qemu-devel] [Qemu-block] [PULL 05/58] qemu-img: Update documentation for -U
On Fri, 05/12 19:37, Max Reitz wrote: > On 2017-05-11 16:32, Kevin Wolf wrote: > > From: Fam Zheng > > > > Signed-off-by: Fam Zheng > > Signed-off-by: Kevin Wolf > > --- > > qemu-img-cmds.hx | 36 ++-- > > 1 file changed, 18 insertions(+), 18 deletions(-) > > > > diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx > > index bf4ce59..e5bc28f 100644 > > --- a/qemu-img-cmds.hx > > +++ b/qemu-img-cmds.hx > > [...] > > > DEF("convert", img_convert, > > -"convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f > > fmt] [-t cache] [-T src_cache] [-O output_fmt] [-B backing_file] [-o > > options] [-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m > > num_coroutines] [-W] filename [filename2 [...]] output_filename") > > +"convert [--object objectdef] [--image-opts] [-U] [-c] [-p] [-q] [-n] > > [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s > > snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m > > num_coroutines] [-W] filename [filename2 [...]] output_filename") > > STEXI > > -@item convert [--object @var{objectdef}] [--image-opts] [-c] [-p] [-q] > > [-n] [-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O > > @var{output_fmt}] [-B @var{backing_file}] [-o @var{options}] [-s > > @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}] > > [-m @var{num_coroutines}] [-W] @var{filename} [@var{filename2} [...]] > > @var{output_filename} > > +@item convert [--object @var{objectdef}] [--image-opts] [-U] [-c] [-p] > > [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O > > @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_id_or_name}] [-l > > @var{snapshot_param}] [-S @var{sparse_size}] [-m @var{num_coroutines}] [-W] > > @var{filename} [@var{filename2} [...]] @var{output_filename} > > Soo... Who gets to add the -B documentation back? > Oops, lost during a rebase, must be. I'll send a patch! Thanks for catching this. Fam
Re: [Qemu-devel] QEMU seg-fault with intermediate image streaming -- bdrv_reopen() in stream_start()
On Sat 13 May 2017 12:45:36 AM CEST, Kashyap Chamarthy wrote: > Try to perform intermediate streaming (pull clusters from 'disk1.qcow2' > into 'b.qcow2': > > (QEMU) block-stream device=#block832 base=disk1.qcow2 > > > Result > -- > > QEMU crashes with SIGSEGV: I see the problem, I'll send a patch now. Thanks for reporting it! Berto
Re: [Qemu-devel] e1000e migration
* Dmitry Fleytman (dmi...@daynix.com) wrote: > Hello Dave, Hi Dmitry, Thanks for the reply. > We are trying to reproduce this issue on our systems but with no luck so far… Note our QE hit this with both a Win8.1 and a win2012r2 guest - although the 2012r2 is reported to have recoverd after a few minutes. 2016 apparently works OK. > From what you describe it looks like some bit in ICR is not being cleared by > the driver. > This usually means that this bit should never be set in that specific > interrupt mode. > > Could you please check which bit is not cleared and who sets it? The full set of e1000e_irq_pending_interrupts after migration is: 23004@1494519346.673905:e1000e_irq_pending_interrupts ICR PENDING: 0x10 (ICR: 0x80100082, IMS: 0x1f4) 23004@1494519346.674787:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x80100082, IMS: 0x1e4) 23004@1494519346.674946:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x80100082, IMS: 0x1e4) 23004@1494519346.675119:e1000e_irq_pending_interrupts ICR PENDING: 0x20 (ICR: 0x80300082, IMS: 0x1e4) 23004@1494519346.675302:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x80100082, IMS: 0x1c4) 23004@1494519346.716279:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x80300082, IMS: 0x1c4) 23004@1494519346.716380:e1000e_irq_pending_interrupts ICR PENDING: 0x100 (ICR: 0x813000c2, IMS: 0x1c4) 23004@1494519346.717040:e1000e_irq_pending_interrupts ICR PENDING: 0x100 (ICR: 0x813000c2, IMS: 0x144) 23004@1494519346.717276:e1000e_irq_pending_interrupts ICR PENDING: 0x100 (ICR: 0x813000c2, IMS: 0x104) 23004@1494519346.717443:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x813000c2, IMS: 0x4) 23004@1494519346.717567:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x813000c2, IMS: 0x4) 23004@1494519346.717782:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x813000c2, IMS: 0x4) 23004@1494519346.717918:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x813000c2, IMS: 0x4) 23004@1494519346.718319:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x813000c2, IMS: 0x4) 23004@1494519346.718523:e1000e_irq_pending_interrupts ICR PENDING: 0x20 (ICR: 0x813000c2, IMS: 0xa4) 23004@1494519346.718684:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x811000c2, IMS: 0x4) 23004@1494519346.718890:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x811000c2, IMS: 0x4) 23004@1494519346.719034:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x811000c2, IMS: 0xa4) 23004@1494519346.719130:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x811000c2, IMS: 0xa4) 23004@1494519346.722699:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x811000c2, IMS: 0xa4) 23004@1494519346.722868:e1000e_irq_pending_interrupts ICR PENDING: 0x20 (ICR: 0x813000c2, IMS: 0xa4) 23004@1494519346.723068:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x811000c2, IMS: 0x84) 23004@1494519346.731198:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x813000c2, IMS: 0x84) 23004@1494519346.731422:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x813000c2, IMS: 0x4) 23004@1494519346.731930:e1000e_irq_pending_interrupts ICR PENDING: 0x20 (ICR: 0x813000c2, IMS: 0xa4) 23004@1494519346.732082:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x811000c2, IMS: 0x4) 23004@1494519346.732274:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x811000c2, IMS: 0x4) 23004@1494519346.732404:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x811000c2, IMS: 0xa4) 23004@1494519346.732504:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x811000c2, IMS: 0xa4) 23004@1494519346.784150:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa4) 23004@1494519346.786506:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa4) 23004@1494519346.786534:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa4) 23004@1494519346.789644:e1000e_irq_pending_interrupts ICR PENDING: 0x100 (ICR: 0x815000c2, IMS: 0x1a4) 23004@1494519346.789864:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa4) 23004@1494519346.789992:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa4) 23004@1494519346.790413:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa4) 23004@1494519346.790539:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa4) 23004@1494519346.792593:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa4) 23004@1494519346.792620:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa4) 23004@1494519346.795943:e1000e_irq_pending_interrupts ICR PENDING: 0x100 (ICR: 0x815000c2, IMS: 0x1a4) and then I think we get stuck in this cycle of this one always being the one that fires repeatedly. I think that
Re: [Qemu-devel] [PATCH] virtio-net: keep the packet layout intact
Ping for comments, thanks. On 05/11/2017 12:57 PM, Wei Wang wrote: The current implementation may change the packet layout when vnet_hdr needs an endianness swap. The layout change causes one more iov to be added to the iov[] passed from the guest, which is a barrier to making the TX queue size 1024 due to the possible off-by-one issue. This patch changes the implementation to remain the packet layout intact. In this case, the number of iov[] passed to writev will be equal to the number obtained from the guest. Signed-off-by: Wei Wang --- hw/net/virtio-net.c | 29 ++--- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 7d091c9..a51e9a8 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1287,8 +1287,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) for (;;) { ssize_t ret; unsigned int out_num; -struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], *out_sg; -struct virtio_net_hdr_mrg_rxbuf mhdr; +struct iovec sg[VIRTQUEUE_MAX_SIZE], mhdr, *out_sg; elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement)); if (!elem) { @@ -1305,7 +1304,10 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) } if (n->has_vnet_hdr) { -if (iov_to_buf(out_sg, out_num, 0, &mhdr, n->guest_hdr_len) < +/* Buffer to copy vnet_hdr and the possible adjacent data */ +mhdr.iov_len = out_sg[0].iov_len; +mhdr.iov_base = g_malloc0(mhdr.iov_len); +if (iov_to_buf(out_sg, out_num, 0, mhdr.iov_base, mhdr.iov_len) < n->guest_hdr_len) { virtio_error(vdev, "virtio-net header incorrect"); virtqueue_detach_element(q->tx_vq, elem, 0); @@ -1313,17 +1315,12 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) return -EINVAL; } if (n->needs_vnet_hdr_swap) { -virtio_net_hdr_swap(vdev, (void *) &mhdr); -sg2[0].iov_base = &mhdr; -sg2[0].iov_len = n->guest_hdr_len; -out_num = iov_copy(&sg2[1], ARRAY_SIZE(sg2) - 1, - out_sg, out_num, - n->guest_hdr_len, -1); -if (out_num == VIRTQUEUE_MAX_SIZE) { -goto drop; - } -out_num += 1; -out_sg = sg2; +virtio_net_hdr_swap(vdev, mhdr.iov_base); +/* Copy the first iov where the vnet_hdr resides in */ +out_num = iov_copy(sg, 1, &mhdr, 1, 0, mhdr.iov_len); +out_num += iov_copy(sg + 1, ARRAY_SIZE(sg) - 1, out_sg + 1, +elem->out_num - 1, 0, -1); +out_sg = sg; } } /* @@ -1345,13 +1342,15 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic, queue_index), out_sg, out_num, virtio_net_tx_complete); +if (n->has_vnet_hdr) { +g_free(mhdr.iov_base); +} if (ret == 0) { virtio_queue_set_notification(q->tx_vq, 0); q->async_tx.elem = elem; return -EBUSY; } -drop: virtqueue_push(q->tx_vq, elem, 0); virtio_notify(vdev, q->tx_vq); g_free(elem);
Re: [Qemu-devel] [PATCH 3/4] exec: further use is_mmio
On Mon, May 15, 2017 at 11:04:52AM +0200, Paolo Bonzini wrote: > > > On 15/05/2017 10:50, Peter Xu wrote: > > This is MMIO-specific tunes on the size. Let's skip it for non-MMIO > > translations. > > Can you explain this better? This is not specific to MMIO, in fact it's > for RAM... Yes, I misread the codes and comments, sorry... Please ignore this patch. -- Peter Xu
Re: [Qemu-devel] KVM "fake DAX" device flushing
On Fri, May 12, 2017 at 06:53:44PM +0200, Kevin Wolf wrote: > Am 12.05.2017 um 15:42 hat Stefan Hajnoczi geschrieben: > > On Thu, May 11, 2017 at 05:38:40PM -0400, Rik van Riel wrote: > > > On Thu, 2017-05-11 at 14:17 -0400, Stefan Hajnoczi wrote: > > > > On Wed, May 10, 2017 at 09:26:00PM +0530, Pankaj Gupta wrote: > > > > > * For live migration use case, if host side backing file is > > > > > shared storage, we need to flush the page cache for the disk > > > > > image at the destination (new fadvise interface, > > > > > FADV_INVALIDATE_CACHE?) > > > > > before starting execution of the guest on the destination host. > > > > > > > > Good point. QEMU currently only supports live migration with > > > > O_DIRECT. > > > > I think the problem was that userspace cannot guarantee consistency > > > > in > > > > the general case. If you find a solution to this problem for fake > > > > NVDIMM then maybe the QEMU block layer can also begin supporting live > > > > migration with buffered I/O. > > > > > > I'll be happy to work with you on that, independently > > > of Pankaj's project. > > > > > > It looks like the fadvise system call could be extended > > > pretty easily with an FADV_INVALIDATE_CACHE command, the > > > other side of which can simply hook into the existing > > > page cache invalidation code in the kernel. > > > > > > Qemu will need to know whether the invalidation succeeded, > > > but that is something we can test for pretty easily before > > > returning to userspace. > > > > Sounds great. I will review the long discussions that took place on > > qemu-devel about cache invalidation for live migration - just want to > > make sure there were no other reasons why only O_DIRECT is supported > > :). > > There are other reasons why we recommend against using non-O_DIRECT > modes in production (including the error handling), but with respect to > live migration, this is the only one I'm aware of. > > As I already said in the private email thread, an FADV_INVALIDATE_CACHE > should do the trick and I'd be happy to work with you guys on that. Okay, I didn't know you and Rik had already discussed this in private. The QEMU change is probably not difficult. Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH RESEND v2 02/21] sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE
Eduardo Habkost writes: > commit 33cd52b5d7b9adfd009e95f07e6c64dd88ae2a31 unset > cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all > sysbus devices appear on "-device help" and lack the "no-user" > flag in "info qdm". > > To fix this, we can set user_creatable=false by default on > TYPE_SYS_BUS_DEVICE, but this requires setting > user_creatable=true explicitly on the sysbus devices that > actually work with -device. > > Fortunately today we have just a few has_dynamic_sysbus=1 > machines: virt, pc-q35-*, ppce500, and spapr. > > virt, ppce500, and spapr have extra checks to ensure just a few > device types can be instantiated: > > * virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE. > * ppce500 supports only TYPE_ETSEC_COMMON. > * spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE. > > This patch sets user_creatable=true explicitly on those 4 device > classes. Correct me if I'm wrong: the devices become user_creatable regardless of machine type, but they're actually creatable only for specific machine types. Not perfect, but much better than what we have now. > Now, the more complex cases: > > pc-q35-*: q35 has no sysbus device whitelist yet (which is a > separate bug). We are in the process of fixing it and building a > sysbus whitelist on q35, but in the meantime we can fix the > "-device help" and "info qdm" bugs mentioned above. Also, despite > not being strictly necessary for fixing the q35 bug, reducing the > list of user_creatable=true devices will help us be more > confident when building the q35 whitelist. > > xen: We also have a hack at xen_set_dynamic_sysbus(), that sets > has_dynamic_sysbus=true at runtime when using the Xen > accelerator. This hack is only used to allow xen-backend devices > to be dynamically plugged/unplugged. > > This means today we can use -device with the following 22 device > types, that are the ones compiled into the qemu-system-x86_64 and > qemu-system-i386 binaries: > > * allwinner-ahci > * amd-iommu > * cfi.pflash01 > * esp > * fw_cfg_io > * fw_cfg_mem > * generic-sdhci > * hpet > * intel-iommu > * ioapic > * isabus-bridge > * kvmclock > * kvm-ioapic > * kvmvapic > * SUNW,fdtwo > * sysbus-ahci > * sysbus-fdc > * sysbus-ohci > * unimplemented-device > * virtio-mmio > * xen-backend > * xen-sysdev > > This patch adds user_creatable=true explicitly to those devices, > temporarily, just to keep 100% compatibility with existing > behavior of q35. Subsequent patches will remove > user_creatable=true from the devices that are really not meant to > user-creatable on any machine, and remove the FIXME comment from > the ones that are really supposed to be user-creatable. This is > being done in separate patches because we still don't have an > obvious list of devices that will be whitelisted by q35, and I > would like to get each device reviewed individually. Later patches indeed remove all the FIXMEs. Good. > Cc: Alexander Graf > Cc: Alex Williamson > Cc: Alistair Francis > Cc: Beniamino Galvani > Cc: Christian Borntraeger > Cc: Cornelia Huck > Cc: David Gibson > Cc: "Edgar E. Iglesias" > Cc: Eduardo Habkost > Cc: Frank Blaschka > Cc: Gabriel L. Somlo > Cc: Gerd Hoffmann > Cc: Igor Mammedov > Cc: Jason Wang > Cc: John Snow > Cc: Juergen Gross > Cc: Kevin Wolf > Cc: Laszlo Ersek > Cc: Marcel Apfelbaum > Cc: Markus Armbruster > Cc: Max Reitz > Cc: "Michael S. Tsirkin" > Cc: Paolo Bonzini > Cc: Peter Maydell > Cc: Pierre Morel > Cc: Prasad J Pandit > Cc: qemu-...@nongnu.org > Cc: qemu-bl...@nongnu.org > Cc: qemu-...@nongnu.org > Cc: Richard Henderson > Cc: Rob Herring > Cc: Shannon Zhao > Cc: sstabell...@kernel.org > Cc: Thomas Huth > Cc: Yi Min Zhao > Acked-by: John Snow > Acked-by: Juergen Gross > Acked-by: Marcel Apfelbaum > Signed-off-by: Eduardo Habkost > --- > Changes v1 -> v2: > * Rewrite commit message: don't pretend we are actually fixing > the q35 issue. We're just fixing "info qdm" and "-device help". > Making it easier to fix q35 is just a nice side-effect. > * Rewrite FIXME comments to make it clear that we just have > user_creatable=true because we don't know yet if the device > should be in the q35 whitelist > --- > hw/block/fdc.c | 10 ++ > hw/block/pflash_cfi01.c | 5 + > hw/core/sysbus.c | 11 +++ > hw/i386/amd_iommu.c | 5 + > hw/i386/intel_iommu.c| 5 + > hw/i386/kvm/clock.c | 5 + > hw/i386/kvm/ioapic.c | 5 + > hw/i386/kvmvapic.c | 5 + > hw/ide/ahci.c| 10 ++ > hw/intc/ioapic.c | 5 + > hw/isa/isa-bus.c | 5 + > hw/misc/unimp.c | 5 + > hw/net/fsl_etsec/etsec.c | 2 ++ > hw/nvram/fw_cfg.c| 10 ++ > hw/ppc/spapr_pci.c | 2 ++ > hw/scsi/esp.c| 5 + > hw/sd/sdhci.c| 5 + > hw/timer/hpet.c | 5 + > hw/usb/hcd-ohci.c| 5 + > hw/vfio/amd-xgbe.c | 2 ++
[Qemu-devel] [PATCH] stream: fix crash in stream_start() when block_job_create() fails
The code that tries to reopen a BlockDriverState in stream_start() when the creation of a new block job fails crashes because it attempts to dereference a pointer that is known to be NULL. This is a regression introduced in a170a91fd3eab6155da39e740381867e, likely because the code was copied from stream_complete(). Signed-off-by: Alberto Garcia --- block/stream.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/stream.c b/block/stream.c index 0113710845..52d329f5c6 100644 --- a/block/stream.c +++ b/block/stream.c @@ -280,6 +280,6 @@ void stream_start(const char *job_id, BlockDriverState *bs, fail: if (orig_bs_flags != bdrv_get_flags(bs)) { -bdrv_reopen(bs, s->bs_flags, NULL); +bdrv_reopen(bs, orig_bs_flags, NULL); } } -- 2.11.0
Re: [Qemu-devel] [PATCH 1/3] migration: Create block capabilities for shared and enable
Eric Blake wrote: > On 05/11/2017 11:32 AM, Juan Quintela wrote: >> Those two capabilities were added through the command line. Notice that >> we just created them. This is just the boilerplate. >> >> Signed-off-by: Juan Quintela >> Reviewed-by: Eric Blake >> >> -- >> >> Make migrate_set_block_* take a boolean argument. > > Question - do we support the orthogonal selection of all 4 combinations > under HMP 'migrate' (no argument, -b alone, -i alone, -b and -i > together), or are there only 3 actual states? If the latter, should we > represent this as a single enum-valued property, rather than as two > independent boolean properties? { 'enum': 'MigrationCapability', 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram'] } My understanding is that we can only have boolean capabilities here. Or, how could we put a non-boolean capability? There are three states as far as I can see. Later, Juan.
Re: [Qemu-devel] [PATCH] nbd: strict nbd_wr_syncv
12.05.2017 19:30, Paolo Bonzini wrote: On 12/05/2017 17:57, Vladimir Sementsov-Ogievskiy wrote: 12.05.2017 18:46, Paolo Bonzini wrote: On 12/05/2017 16:17, Vladimir Sementsov-Ogievskiy wrote: nbd_wr_syncv is called either from coroutine or from client negotiation code, when socket is in blocking mode. So, -EAGAIN is impossible. Furthermore, EAGAIN is confusing, as, what to read/write again? With EAGAIN as a return code we don't know how much data is already read or written by the function, so in case of EAGAIN the whole communication is broken. Signed-off-by: Vladimir Sementsov-Ogievskiy --- Hi all! If I understand right, nbd_wr_syncv is called either from coroutine or from client negotiation code, when socket is in blocking mode. So, some thoughts 1. let's establish this with an assert, because returning EAGAIN is confusing (see above) Yes, this seems like a good idea. 2. should we in case of non-coroutine context start this coroutine in nbd_wr_syncv, like in bdrv_prwv_co, and use non-blocking mode? 3. is there any problems or disadvantages of moving client negotiation to coroutine too? When you move code to coroutines you need to be aware of what code can now run concurrently, for example the monitor. I'm not sure that it's possible to do this. Hmm, can you please give some example of a problem? qcow2_open starts coroutines to read it's header, why nbd_open can't start coroutine/coroutines to read/write some negotiation data? Ah, it's not a problem if you use synchronous I/O (aio_poll) within the coroutines. But then it's still blocking I/O in every way (except you've fcntl-ed the descriptor to make it non-blocking); it's simply hidden underneath coroutines and aio_poll. I mean, make negotiation behave like normal nbd communication, non-blocking socket + yield.. So, some other coroutines may do their work, while nbd-negotiation coroutine waits for io.. Paolo -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH 1/3] migration: Create block capabilities for shared and enable
* Juan Quintela (quint...@redhat.com) wrote: > Eric Blake wrote: > > On 05/11/2017 11:32 AM, Juan Quintela wrote: > >> Those two capabilities were added through the command line. Notice that > >> we just created them. This is just the boilerplate. > >> > >> Signed-off-by: Juan Quintela > >> Reviewed-by: Eric Blake > >> > >> -- > >> > >> Make migrate_set_block_* take a boolean argument. > > > > Question - do we support the orthogonal selection of all 4 combinations > > under HMP 'migrate' (no argument, -b alone, -i alone, -b and -i > > together), or are there only 3 actual states? If the latter, should we > > represent this as a single enum-valued property, rather than as two > > independent boolean properties? > > { 'enum': 'MigrationCapability', > 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', >'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram'] } > > > My understanding is that we can only have boolean capabilities here. > Or, how could we put a non-boolean capability? Lets keep this simple and stick with the booleans. Dave > There are three states as far as I can see. > > Later, Juan. > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams
Eric Blake wrote: > On 05/12/2017 05:55 AM, Juan Quintela wrote: @@ -1239,6 +1240,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } if (has_inc && inc) { +migrate_set_block_enabled(s, true); migrate_set_block_shared(s, true); >>> >>> [2] >>> >>> IIUC for [1] & [2] we are solving the same problem that "shared" >>> depends on "enabled" bit. Would it be good to unitfy this dependency >>> somewhere? E.g., by changing migrate_set_block_shared() into: >>> >>> void migrate_set_block_shared(MigrationState *s, bool value) >>> { >>> s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = value; >>> if (value) { >>> migrate_set_block_enabled(s, true); >>> } >>> } >> >> ok with this. > > Or, as I commented on 1/3, maybe having a single property that is a > tri-state enum value, instead of 2 separate boolean properties, might be > nicer (but certainly a bit more complex to code up). If you teach me how to do the qapi/qmp part, I will do the other bits. I don't really care if we do it one way or the other. >> I will add once here that when we disable block enabled, we also disable >> shared, or just let it that way? >> >>> Another thing to mention: after switching to the capability interface, >>> we'll cache the "enabled" and "shared" bits now while we don't cache >>> it before, right? IIUC it'll affect behavior of such sequence: >>> >>> - 1st migrate with enabled=1, shared=1, then >>> - 2nd migrate with enabled=0, shared=0 >>> >>> Before the series, the 2nd migrate will use enabled=shared=0, but >>> after the series it should be using enabled=shared=1. Not sure whether >>> this would be a problem (or I missed anything?). >> >> We can't be consistent with both old/new way. >> >> Old way: we always setup the capabilities on command line (that should >> have been deprecated long, long ago) > > Well, the easy way out is to have the HMP migrate command (I assume > that's what you mean by "on command line") explicitly clear the > parameters if it is called without the -b/-i flag. So the start of each > migration is what changes the properties, so long as you are still using > HMP to start the migration. Or, on the QMP side, since 'migrate' has > optional 'blk' and 'inc' booleans, basically leave the settings alone if > the parameters were omitted, and explicitly update the property to the > value of those parameters if they were present. We are going to have trouble whatever way that we do it, or we start doing lots of strange things. Forget about qmp, we are going to assume that it is consistent with hmp. migrate_set_capabilities block_enabled on migrate -b . Should migrate disable the block_enabled capability? Give one warning/error? And notice that this only matter if we do a migration, we cancel/get one error, and then we migrate again. What I tried to do is assume that -b/-i arguments don't exist. And if the user use them, we implement its behaviour with the minimal fuss possibly. Only way that I can think of being consistent and bug compatible will be to store: - old block_shared/enanbeld capability value - if we set -b/-i on the command line In migration cleanup do the right thing depending on this four variables. I think that it is adding lots of complexity for very few improvement. > Or is the proposal that we are also going to simplify the QMP 'migrate' > command to get rid of crufty parameters? I didn't read it that way, but I would not oppose O:-) Later, Juan.
[Qemu-devel] ping Re: [PATCH v18 00/25] qcow2: persistent dirty bitmaps
ping... What about this? 03.05.2017 15:25, Vladimir Sementsov-Ogievskiy wrote: Hi all! There is a new update of qcow2-bitmap series - v18. web: https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=qcow2-bitmap-v18 git: https://src.openvz.org/scm/~vsementsov/qemu.git (tag qcow2-bitmap-v18) v18: rebased on master (sorry for v17) 08: contextual: qcow2_do_open is changed instead of qcow2_open rename s/need_update_header/update_header/ in qcow2_do_open, to not do it in 10 save r-b's by Max and John 09: new patch 10: load_bitmap_data: do not clear bitmap parameter - it should be already cleared (it actually created before single load_bitmap_data() call) if some bitmaps are loaded, but we can't write the image (it is readonly or inactive), so we can't mark bitmaps "in use" in the image, mark corresponding BdrvDirtyBitmap read-only. change error_setg to error_setg_errno for "Can't update bitmap directory" no needs to rename s/need_update_header/update_header/ here, as it done in 08 13: function bdrv_has_persistent_bitmaps becomes bdrv_has_changed_persistent_bitmaps, to handle readonly field. 14: declaration moved to the bottom of .h, save r-b's 15: firstly check bdrv_has_changed_persistent_bitmaps and only then fail on !can_write, and then QSIMPLEQ_INIT(&drop_tables) skip readonly bitmaps in saving loop 18: remove '#optional', 2.9 -> 2.10, save r-b's 19: remove '#optional', 2.9 -> 2.10, save r-b's 20: 2.9 -> 2.10, save r-b's 21: add check of read-only image open, drop r-b's 24: add comment to qapi/block-core.json, that block-dirty-bitmap-add removes bitmap from storage. r-b's by Max and John saved v17: 08: add r-b's by Max and John 09: clear unknown autoclear features from BDRVQcow2State before calling qcow2_load_autoloading_dirty_bitmaps(), and also do not extra update header if it is updated by qcow2_load_autoloading_dirty_bitmaps(). 11: new patch, splitted out from 16 12: rewrite commit message 14: changing bdrv_close moved to separate new patch 11 s/1/1ULL/ ; s/%u/%llu/ for two errors 16: s/No/Not/ add Max's r-b 24: new patch v16: mostly by Kevin's comments: 07: just moved here, as preparation for merging refcounts to 08 08: move "qcow2-bitmap: refcounts" staff to this patch, to not break qcow2-img check drop r-b's move necessary supporting static functions to this patch too (to not break compilation) fprintf -> error_report other small changes with error messages and comments code style for qcow2-bitmap.c: 'include "exec/log.h"' was dropped s/1/(1U << 0) for BME_FLAG_IN_USE add BME_TABLE_ENTRY_FLAG_ALL_ONES to replace magic 1 don't check 'cluster_size <= 0' in check_table_entry old "[PATCH v15 08/25] block: introduce auto-loading bitmaps" was dropped 09: was "[PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps" drop r-b's some staff was moved to 08 update_header_sync - error on bdrv_flush fail rename disk_sectors_in_bitmap_cluster to sectors_covered_by_bitmap_cluster and adjust comment. so, variable for storing this function result: s/dsc/sbc/ s/1/BME_TABLE_ENTRY_FLAG_ALL_ONES/ also, as Qcow2BitmapTable already introduced in 08, s/table_offset/table.offset/ and s/table_size/table.size, etc.. update_ext_header_and_dir_in_place: add comments, add additional update_header_sync, to reduce indeterminacy in case of error. call qcow2_load_autoloading_dirty_bitmaps directly from qcow2_open no .bdrv_load_autoloading_dirty_bitmaps 11: drop bdrv_store_persistent_dirty_bitmaps at all. drop r-b's 13: rename patch, rewrite commit msg drop r-b's move bdrv_release_named_dirty_bitmaps in bdrv_close() after drv->bdrv_close() Qcow2BitmapTable is already introduced, so no s/table_offset/table.offset/ and s/table_size/table.size, etc.. here old 25 with check_constraints_on_bitmap() improvements merged here (Eric) like in 09, s/dsc/sbc/ and s/disk_sectors_in_bitmap_cluster/sectors_covered_by_bitmap_cluster/ no .bdrv_store_persistent_dirty_bitmaps call qcow2_store_persistent_dirty_bitmaps directly from qcow2_inactivate 15: corresponding part of 25 merged here. Add John's r-b, as 25 was also reviewed by John. 16: add John's r-b v15: 13,14: add John's r-b 15: qcow2_can_store_new_dirty_bitmap: - check max dirty bitmaps and bitmap directory overhead - switch to error_prepend rm Max's r-b not add John's r-b 17-24: add John's r-b 25: changed because 15 changed, not add John's r-b v14: 07: use '|=' to update need_update_header add John's r-b add Max's r-b 09: remove unused bitmap_table_to_cpu() left Max's r-b, hope it's ok add John's r-b 10: remove extra new line add John's r-b 11: add John's r-b 12: add John's r-b 13: small fixes by John's review: - remove weird g_free of NULL
[Qemu-devel] [PATCH v3 1/7] curl: strengthen assertion in curl_clean_state
curl_clean_state should only be called after all AIOCBs have been completed. This is not so obvious for the call from curl_detach_aio_context, so assert that. Cc: qemu-sta...@nongnu.org Reviewed-by: Jeff Cody Signed-off-by: Paolo Bonzini --- block/curl.c | 5 + 1 file changed, 5 insertions(+) diff --git a/block/curl.c b/block/curl.c index aa6e8cc0e5..010b0604c5 100644 --- a/block/curl.c +++ b/block/curl.c @@ -532,6 +532,11 @@ static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s) static void curl_clean_state(CURLState *s) { +int j; +for (j=0; j < CURL_NUM_ACB; j++) { +assert(!s->acb[j]); +} + if (s->s->multi) curl_multi_remove_handle(s->s->multi, s->curl); -- 2.12.2
[Qemu-devel] [PATCH v3 0/7] curl: locking cleanups/fixes, coroutine conversion, remove aio_poll
Compared to v2, this silences checkpatch and correctly destroy the mutex on exiting from curl_open with an error. Paolo Paolo Bonzini (7): curl: strengthen assertion in curl_clean_state curl: never invoke callbacks with s->mutex held curl: avoid recursive locking of BDRVCURLState mutex curl: split curl_find_state/curl_init_state curl: convert CURLAIOCB to byte values curl: convert readv to coroutines curl: do not do aio_poll when waiting for a free CURLState block/curl.c | 217 +-- 1 file changed, 121 insertions(+), 96 deletions(-) -- 2.12.2
[Qemu-devel] [PATCH v3 3/7] curl: avoid recursive locking of BDRVCURLState mutex
The curl driver has a ugly hack where, if it cannot find an empty CURLState, it just uses aio_poll to wait for one to be empty. This is probably buggy when used together with dataplane, and the simplest way to fix it is to use coroutines instead. A more immediate effect of the bug however is that it can cause a recursive call to curl_readv_bh_cb and recursively taking the BDRVCURLState mutex. This causes a deadlock. The fix is to unlock the mutex around aio_poll, but for cleanliness we should also take the mutex around all calls to curl_init_state, even if reaching the unlock/lock pair is impossible. The same is true for curl_clean_state. Reported-by: Kun Wei Tested-by: Richard W.M. Jones Cc: qemu-sta...@nongnu.org Cc: Jeff Cody Signed-off-by: Paolo Bonzini --- v2->v3: add missing qemu_mutex_destroy on failure case block/curl.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/block/curl.c b/block/curl.c index 924c2a2088..5e6ddf3a34 100644 --- a/block/curl.c +++ b/block/curl.c @@ -281,6 +281,7 @@ read_end: return size * nmemb; } +/* Called with s->mutex held. */ static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len, CURLAIOCB *acb) { @@ -453,6 +454,7 @@ static void curl_multi_timeout_do(void *arg) #endif } +/* Called with s->mutex held. */ static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s) { CURLState *state = NULL; @@ -471,7 +473,9 @@ static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s) break; } if (!state) { +qemu_mutex_unlock(&s->mutex); aio_poll(bdrv_get_aio_context(bs), true); +qemu_mutex_lock(&s->mutex); } } while(!state); @@ -534,6 +538,7 @@ static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s) return state; } +/* Called with s->mutex held. */ static void curl_clean_state(CURLState *s) { int j; @@ -565,6 +570,7 @@ static void curl_detach_aio_context(BlockDriverState *bs) BDRVCURLState *s = bs->opaque; int i; +qemu_mutex_lock(&s->mutex); for (i = 0; i < CURL_NUM_STATES; i++) { if (s->states[i].in_use) { curl_clean_state(&s->states[i]); @@ -580,6 +586,7 @@ static void curl_detach_aio_context(BlockDriverState *bs) curl_multi_cleanup(s->multi); s->multi = NULL; } +qemu_mutex_unlock(&s->mutex); timer_del(&s->timer); } @@ -677,6 +684,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags, return -EROFS; } +qemu_mutex_init(&s->mutex); opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort); qemu_opts_absorb_qdict(opts, options, &local_err); if (local_err) { @@ -747,7 +755,9 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags, DPRINTF("CURL: Opening %s\n", file); s->aio_context = bdrv_get_aio_context(bs); s->url = g_strdup(file); +qemu_mutex_lock(&s->mutex); state = curl_init_state(bs, s); +qemu_mutex_unlock(&s->mutex); if (!state) goto out_noclean; @@ -791,11 +801,12 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags, } DPRINTF("CURL: Size = %zd\n", s->len); +qemu_mutex_lock(&s->mutex); curl_clean_state(state); +qemu_mutex_unlock(&s->mutex); curl_easy_cleanup(state->curl); state->curl = NULL; -qemu_mutex_init(&s->mutex); curl_attach_aio_context(bs, bdrv_get_aio_context(bs)); qemu_opts_del(opts); @@ -806,6 +817,7 @@ out: curl_easy_cleanup(state->curl); state->curl = NULL; out_noclean: +qemu_mutex_destroy(&s->mutex); g_free(s->cookie); g_free(s->url); qemu_opts_del(opts); -- 2.12.2
[Qemu-devel] [PATCH v3 4/7] curl: split curl_find_state/curl_init_state
If curl_easy_init fails, a CURLState is left with s->in_use = 1. Split curl_init_state in two, so that we can distinguish the two failures and call curl_clean_state if needed. While at it, simplify curl_find_state, removing a dummy loop. The aio_poll loop is moved to the sole caller that needs it. Reviewed-by: Jeff Cody Signed-off-by: Paolo Bonzini --- block/curl.c | 52 ++-- 1 file changed, 30 insertions(+), 22 deletions(-) diff --git a/block/curl.c b/block/curl.c index 5e6ddf3a34..29132ab835 100644 --- a/block/curl.c +++ b/block/curl.c @@ -455,34 +455,27 @@ static void curl_multi_timeout_do(void *arg) } /* Called with s->mutex held. */ -static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s) +static CURLState *curl_find_state(BDRVCURLState *s) { CURLState *state = NULL; -int i, j; - -do { -for (i=0; istates[i].acb[j]) -continue; -if (s->states[i].in_use) -continue; +int i; +for (i=0; i < CURL_NUM_STATES; i++) { +if (!s->states[i].in_use) { state = &s->states[i]; state->in_use = 1; break; } -if (!state) { -qemu_mutex_unlock(&s->mutex); -aio_poll(bdrv_get_aio_context(bs), true); -qemu_mutex_lock(&s->mutex); -} -} while(!state); +} +return state; +} +static int curl_init_state(BDRVCURLState *s, CURLState *state) +{ if (!state->curl) { state->curl = curl_easy_init(); if (!state->curl) { -return NULL; +return -EIO; } curl_easy_setopt(state->curl, CURLOPT_URL, s->url); curl_easy_setopt(state->curl, CURLOPT_SSL_VERIFYPEER, @@ -535,7 +528,7 @@ static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s) QLIST_INIT(&state->sockets); state->s = s; -return state; +return 0; } /* Called with s->mutex held. */ @@ -756,13 +749,18 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags, s->aio_context = bdrv_get_aio_context(bs); s->url = g_strdup(file); qemu_mutex_lock(&s->mutex); -state = curl_init_state(bs, s); +state = curl_find_state(s); qemu_mutex_unlock(&s->mutex); -if (!state) +if (!state) { goto out_noclean; +} // Get file size +if (curl_init_state(s, state) < 0) { +goto out; +} + s->accept_range = false; curl_easy_setopt(state->curl, CURLOPT_NOBODY, 1); curl_easy_setopt(state->curl, CURLOPT_HEADERFUNCTION, @@ -857,8 +855,18 @@ static void curl_readv_bh_cb(void *p) } // No cache found, so let's start a new request -state = curl_init_state(acb->common.bs, s); -if (!state) { +for (;;) { +state = curl_find_state(s); +if (state) { +break; +} +qemu_mutex_unlock(&s->mutex); +aio_poll(bdrv_get_aio_context(bs), true); +qemu_mutex_lock(&s->mutex); +} + +if (curl_init_state(s, state) < 0) { +curl_clean_state(state); ret = -EIO; goto out; } -- 2.12.2
[Qemu-devel] [PATCH v3 6/7] curl: convert readv to coroutines
This is pretty simple. The bottom half goes away because, unlike bdrv_aio_readv, coroutine-based read can return immediately without yielding. However, for simplicity I kept the former bottom half handler in a separate function. Reviewed-by: Jeff Cody Signed-off-by: Paolo Bonzini --- block/curl.c | 94 1 file changed, 38 insertions(+), 56 deletions(-) diff --git a/block/curl.c b/block/curl.c index 401a1b1015..86a4a73834 100644 --- a/block/curl.c +++ b/block/curl.c @@ -76,10 +76,6 @@ static CURLMcode __curl_multi_socket_action(CURLM *multi_handle, #define CURL_TIMEOUT_DEFAULT 5 #define CURL_TIMEOUT_MAX 1 -#define FIND_RET_NONE 0 -#define FIND_RET_OK 1 -#define FIND_RET_WAIT 2 - #define CURL_BLOCK_OPT_URL "url" #define CURL_BLOCK_OPT_READAHEAD "readahead" #define CURL_BLOCK_OPT_SSLVERIFY "sslverify" @@ -93,11 +89,12 @@ static CURLMcode __curl_multi_socket_action(CURLM *multi_handle, struct BDRVCURLState; typedef struct CURLAIOCB { -BlockAIOCB common; +Coroutine *co; QEMUIOVector *qiov; uint64_t offset; uint64_t bytes; +int ret; size_t start; size_t end; @@ -268,11 +265,11 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque) request_length - offset); } +acb->ret = 0; +s->acb[i] = NULL; qemu_mutex_unlock(&s->s->mutex); -acb->common.cb(acb->common.opaque, 0); +aio_co_wake(acb->co); qemu_mutex_lock(&s->s->mutex); -qemu_aio_unref(acb); -s->acb[i] = NULL; } } @@ -282,8 +279,8 @@ read_end: } /* Called with s->mutex held. */ -static int curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len, - CURLAIOCB *acb) +static bool curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len, + CURLAIOCB *acb) { int i; uint64_t end = start + len; @@ -312,7 +309,8 @@ static int curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len, if (clamped_len < len) { qemu_iovec_memset(acb->qiov, clamped_len, 0, len - clamped_len); } -return FIND_RET_OK; +acb->ret = 0; +return true; } // Wait for unfinished chunks @@ -330,13 +328,13 @@ static int curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len, for (j=0; jacb[j]) { state->acb[j] = acb; -return FIND_RET_WAIT; +return true; } } } } -return FIND_RET_NONE; +return false; } /* Called with s->mutex held. */ @@ -381,11 +379,11 @@ static void curl_multi_check_completion(BDRVCURLState *s) continue; } +acb->ret = -EIO; +state->acb[i] = NULL; qemu_mutex_unlock(&s->mutex); -acb->common.cb(acb->common.opaque, -EIO); +aio_co_wake(acb->co); qemu_mutex_lock(&s->mutex); -qemu_aio_unref(acb); -state->acb[i] = NULL; } } @@ -822,19 +820,11 @@ out_noclean: return -EINVAL; } -static const AIOCBInfo curl_aiocb_info = { -.aiocb_size = sizeof(CURLAIOCB), -}; - - -static void curl_readv_bh_cb(void *p) +static void curl_setup_preadv(BlockDriverState *bs, CURLAIOCB *acb) { CURLState *state; int running; -int ret = -EINPROGRESS; -CURLAIOCB *acb = p; -BlockDriverState *bs = acb->common.bs; BDRVCURLState *s = bs->opaque; uint64_t start = acb->offset; @@ -844,14 +834,8 @@ static void curl_readv_bh_cb(void *p) // In case we have the requested data already (e.g. read-ahead), // we can just call the callback and be done. -switch (curl_find_buf(s, start, acb->bytes, acb)) { -case FIND_RET_OK: -ret = 0; -goto out; -case FIND_RET_WAIT: -goto out; -default: -break; +if (curl_find_buf(s, start, acb->bytes, acb)) { +goto out; } // No cache found, so let's start a new request @@ -867,7 +851,7 @@ static void curl_readv_bh_cb(void *p) if (curl_init_state(s, state) < 0) { curl_clean_state(state); -ret = -EIO; +acb->ret = -EIO; goto out; } @@ -882,7 +866,7 @@ static void curl_readv_bh_cb(void *p) state->orig_buf = g_try_malloc(state->buf_len); if (state->buf_len && state->orig_buf == NULL) { curl_clean_state(state); -ret = -ENOMEM; +acb->ret = -ENOMEM; goto out; } state->acb[0] = acb; @@ -899,26 +883,24 @@ static void curl_readv_bh_cb(void *p) out: qemu_mutex_unlock(&s->mutex
[Qemu-devel] [PATCH v3 2/7] curl: never invoke callbacks with s->mutex held
All curl callbacks go through curl_multi_do, and hence are called with s->mutex held. Note that with comments, and make curl_read_cb drop the lock before invoking the callback. Likewise for curl_find_buf, where the callback can be invoked by the caller. Cc: qemu-sta...@nongnu.org Reviewed-by: Jeff Cody Signed-off-by: Paolo Bonzini --- block/curl.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/block/curl.c b/block/curl.c index 010b0604c5..924c2a2088 100644 --- a/block/curl.c +++ b/block/curl.c @@ -147,6 +147,7 @@ static void curl_multi_do(void *arg); static void curl_multi_read(void *arg); #ifdef NEED_CURL_TIMER_CALLBACK +/* Called from curl_multi_do_locked, with s->mutex held. */ static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque) { BDRVCURLState *s = opaque; @@ -163,6 +164,7 @@ static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque) } #endif +/* Called from curl_multi_do_locked, with s->mutex held. */ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action, void *userp, void *sp) { @@ -212,6 +214,7 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action, return 0; } +/* Called from curl_multi_do_locked, with s->mutex held. */ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) { BDRVCURLState *s = opaque; @@ -226,6 +229,7 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) return realsize; } +/* Called from curl_multi_do_locked, with s->mutex held. */ static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque) { CURLState *s = ((CURLState*)opaque); @@ -264,7 +268,9 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque) request_length - offset); } +qemu_mutex_unlock(&s->s->mutex); acb->common.cb(acb->common.opaque, 0); +qemu_mutex_lock(&s->s->mutex); qemu_aio_unref(acb); s->acb[i] = NULL; } @@ -305,8 +311,6 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len, if (clamped_len < len) { qemu_iovec_memset(acb->qiov, clamped_len, 0, len - clamped_len); } -acb->common.cb(acb->common.opaque, 0); - return FIND_RET_OK; } @@ -832,8 +836,8 @@ static void curl_readv_bh_cb(void *p) // we can just call the callback and be done. switch (curl_find_buf(s, start, acb->nb_sectors * BDRV_SECTOR_SIZE, acb)) { case FIND_RET_OK: -qemu_aio_unref(acb); -// fall through +ret = 0; +goto out; case FIND_RET_WAIT: goto out; default: -- 2.12.2
Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams
On Fri, May 12, 2017 at 12:55:22PM +0200, Juan Quintela wrote: > Peter Xu wrote: > > On Thu, May 11, 2017 at 06:32:27PM +0200, Juan Quintela wrote: > > >> @@ -1214,9 +1218,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool > >> blk, > >> MigrationParams params; > >> const char *p; > >> > >> -params.blk = has_blk && blk; > >> -params.shared = has_inc && inc; > >> - > >> if (migration_is_setup_or_active(s->state) || > >> s->state == MIGRATION_STATUS_CANCELLING || > >> s->state == MIGRATION_STATUS_COLO) { > >> @@ -1239,6 +1240,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool > >> blk, > >> } > >> > >> if (has_inc && inc) { > >> +migrate_set_block_enabled(s, true); > >> migrate_set_block_shared(s, true); > > > > [2] > > > > IIUC for [1] & [2] we are solving the same problem that "shared" > > depends on "enabled" bit. Would it be good to unitfy this dependency > > somewhere? E.g., by changing migrate_set_block_shared() into: > > > > void migrate_set_block_shared(MigrationState *s, bool value) > > { > > s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = value; > > if (value) { > > migrate_set_block_enabled(s, true); > > } > > } > > ok with this. > I will add once here that when we disable block enabled, we also disable > shared, or just let it that way? I should mark "nitpick" in my above comment. Any way works for me. :) > > > Another thing to mention: after switching to the capability interface, > > we'll cache the "enabled" and "shared" bits now while we don't cache > > it before, right? IIUC it'll affect behavior of such sequence: > > > > - 1st migrate with enabled=1, shared=1, then > > - 2nd migrate with enabled=0, shared=0 > > > > Before the series, the 2nd migrate will use enabled=shared=0, but > > after the series it should be using enabled=shared=1. Not sure whether > > this would be a problem (or I missed anything?). > > We can't be consistent with both old/new way. > > Old way: we always setup the capabilities on command line (that should > have been deprecated long, long ago) > > New way: Once set, they stay set. > > So, alternatives are: > - If we are going to deprecate the old way, just let things as they are > on this patch (easier for me O:-) > > - Always disable this features at the end of migration: Compatible with > old migration semantics, bet we are inconsistent with all other > capabilities. > > - Add yet more code to only disable them when we are setting them > through the command line. More code to maintain. > > My idea would be to deprecate the migrate command line parameter, that > is the reason why I did the 1st option. I hope that in due curse, we > would be able to remove the code. But if anyone strongly think that any > of the other options is better, please let me know. I assume that sending continuous "migrate" is very rare, right (after all, normally source VM is useless after that...)? I was just trying to post this question up, and so... If you/Dave/others won't worry about its compatibility, I won't either. ;) Thanks! -- Peter Xu
[Qemu-devel] [PATCH v3 5/7] curl: convert CURLAIOCB to byte values
This is in preparation for the conversion from bdrv_aio_readv to bdrv_co_preadv, and it also requires changing some of the size_t values to uint64_t. This was broken before for disks > 2TB, but now it would break at 4GB. Reviewed-by: Jeff Cody Signed-off-by: Paolo Bonzini --- block/curl.c | 44 ++-- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/block/curl.c b/block/curl.c index 29132ab835..401a1b1015 100644 --- a/block/curl.c +++ b/block/curl.c @@ -96,8 +96,8 @@ typedef struct CURLAIOCB { BlockAIOCB common; QEMUIOVector *qiov; -int64_t sector_num; -int nb_sectors; +uint64_t offset; +uint64_t bytes; size_t start; size_t end; @@ -115,7 +115,7 @@ typedef struct CURLState CURL *curl; QLIST_HEAD(, CURLSocket) sockets; char *orig_buf; -size_t buf_start; +uint64_t buf_start; size_t buf_off; size_t buf_len; char range[128]; @@ -126,7 +126,7 @@ typedef struct CURLState typedef struct BDRVCURLState { CURLM *multi; QEMUTimer timer; -size_t len; +uint64_t len; CURLState states[CURL_NUM_STATES]; char *url; size_t readahead_size; @@ -257,7 +257,7 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque) continue; if ((s->buf_off >= acb->end)) { -size_t request_length = acb->nb_sectors * BDRV_SECTOR_SIZE; +size_t request_length = acb->bytes; qemu_iovec_from_buf(acb->qiov, 0, s->orig_buf + acb->start, acb->end - acb->start); @@ -282,18 +282,18 @@ read_end: } /* Called with s->mutex held. */ -static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len, +static int curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len, CURLAIOCB *acb) { int i; -size_t end = start + len; -size_t clamped_end = MIN(end, s->len); -size_t clamped_len = clamped_end - start; +uint64_t end = start + len; +uint64_t clamped_end = MIN(end, s->len); +uint64_t clamped_len = clamped_end - start; for (i=0; istates[i]; -size_t buf_end = (state->buf_start + state->buf_off); -size_t buf_fend = (state->buf_start + state->buf_len); +uint64_t buf_end = (state->buf_start + state->buf_off); +uint64_t buf_fend = (state->buf_start + state->buf_len); if (!state->orig_buf) continue; @@ -788,7 +788,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags, } #endif -s->len = (size_t)d; +s->len = d; if ((!strncasecmp(s->url, "http://";, strlen("http://";)) || !strncasecmp(s->url, "https://";, strlen("https://";))) @@ -797,7 +797,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags, "Server does not support 'range' (byte ranges)."); goto out; } -DPRINTF("CURL: Size = %zd\n", s->len); +DPRINTF("CURL: Size = %" PRIu64 "\n", s->len); qemu_mutex_lock(&s->mutex); curl_clean_state(state); @@ -837,14 +837,14 @@ static void curl_readv_bh_cb(void *p) BlockDriverState *bs = acb->common.bs; BDRVCURLState *s = bs->opaque; -size_t start = acb->sector_num * BDRV_SECTOR_SIZE; -size_t end; +uint64_t start = acb->offset; +uint64_t end; qemu_mutex_lock(&s->mutex); // In case we have the requested data already (e.g. read-ahead), // we can just call the callback and be done. -switch (curl_find_buf(s, start, acb->nb_sectors * BDRV_SECTOR_SIZE, acb)) { +switch (curl_find_buf(s, start, acb->bytes, acb)) { case FIND_RET_OK: ret = 0; goto out; @@ -872,7 +872,7 @@ static void curl_readv_bh_cb(void *p) } acb->start = 0; -acb->end = MIN(acb->nb_sectors * BDRV_SECTOR_SIZE, s->len - start); +acb->end = MIN(acb->bytes, s->len - start); state->buf_off = 0; g_free(state->orig_buf); @@ -887,9 +887,9 @@ static void curl_readv_bh_cb(void *p) } state->acb[0] = acb; -snprintf(state->range, 127, "%zd-%zd", start, end); -DPRINTF("CURL (AIO): Reading %llu at %zd (%s)\n", -(acb->nb_sectors * BDRV_SECTOR_SIZE), start, state->range); +snprintf(state->range, 127, "%" PRIu64 "-%" PRIu64, start, end); +DPRINTF("CURL (AIO): Reading %" PRIu64 " at %" PRIu64 " (%s)\n", +acb->bytes, start, state->range); curl_easy_setopt(state->curl, CURLOPT_RANGE, state->range); curl_multi_add_handle(s->multi, state->curl); @@ -914,8 +914,8 @@ static BlockAIOCB *curl_aio_readv(BlockDriverState *bs, acb = qemu_aio_get(&curl_aiocb_info, bs, cb, opaque); acb->qiov = qiov; -acb->sector_num = sector_num; -acb->nb_sectors = nb_sectors; +acb->offset = sector_num * BDRV_SECTOR_SIZE; +acb->bytes = nb_sectors * BDRV_SECTOR_SIZE; aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), curl
[Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd
Hi Block developers, I would like to add a feature to Qemu to drain all traffic from a block so that I can take external snaphosts without the risk to that in the middle of a write operation. Its meant for cases where where QGA freeze/thaw is not available. For me its enough to have this through qemu-io, but Kevin asked me to check if its not worth to have a stable API for it and present it via QMP/HMP. What are your thoughts? Thanks, Peter --- qemu-io-cmds.c | 78 ++ 1 file changed, 78 insertions(+) diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index 312fc6d..49d82fe 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -1565,6 +1565,82 @@ static const cmdinfo_t flush_cmd = { .oneline= "flush all in-core file state to disk", }; +static const cmdinfo_t drain_cmd; + +static int drain_f(BlockBackend *blk, int argc, char **argv) +{ +BlockDriverState *bs = blk_bs(blk); +bool flush = false; +int c; + +while ((c = getopt(argc, argv, "f")) != -1) { +switch (c) { +case 'f': +flush = true; +break; +default: +return qemuio_command_usage(&drain_cmd); +} +} + +if (optind != argc) { +return qemuio_command_usage(&drain_cmd); +} + + +if (bs->quiesce_counter) { +printf("drain failed: device is already drained!\n"); +return 1; +} + +bdrv_drained_begin(bs); /* complete I/O */ +if (flush) { +bdrv_flush(bs); +bdrv_drain(bs); /* in case flush left pending I/O */ +printf("flushed all pending I/O\n"); +} +printf("drain successful\n"); +return 0; +} + +static void drain_help(void) +{ +printf( +"\n" +" Drains all external I/O from the device\n" +"\n" +" -f, -- flush all in-core file state to disk\n" +"\n"); +} + +static const cmdinfo_t drain_cmd = { +.name = "drain", +.cfunc = drain_f, +.args = "[-f]", +.argmin = 0, +.argmax = -1, +.oneline= "cease to send I/O to the device", +.help = drain_help +}; + +static int undrain_f(BlockBackend *blk, int argc, char **argv) +{ +BlockDriverState *bs = blk_bs(blk); +if (!bs->quiesce_counter) { +printf("undrain failed: device is not drained!\n"); +return 1; +} +bdrv_drained_end(bs); +printf("undrain successful\n"); +return 0; +} + +static const cmdinfo_t undrain_cmd = { +.name = "undrain", +.cfunc = undrain_f, +.oneline= "continue I/O to a drained device", +}; + static int truncate_f(BlockBackend *blk, int argc, char **argv) { int64_t offset; @@ -2296,6 +2372,8 @@ static void __attribute((constructor)) init_qemuio_commands(void) qemuio_add_command(&aio_write_cmd); qemuio_add_command(&aio_flush_cmd); qemuio_add_command(&flush_cmd); +qemuio_add_command(&drain_cmd); +qemuio_add_command(&undrain_cmd); qemuio_add_command(&truncate_cmd); qemuio_add_command(&length_cmd); qemuio_add_command(&info_cmd); -- 1.9.1
[Qemu-devel] [PATCH v3 7/7] curl: do not do aio_poll when waiting for a free CURLState
Instead, put the CURLAIOCB on a wait list and yield; curl_clean_state will wake the corresponding coroutine. Because of CURL's callback-based structure, we cannot easily convert everything to CoMutex/CoQueue; keeping the QemuMutex is simpler. However, CoQueue is a simple wrapper around a linked list, so we can easily use QSIMPLEQ and open-code a CoQueue, protected by the BDRVCURLState QemuMutex instead of a CoMutex. Reviewed-by: Jeff Cody Signed-off-by: Paolo Bonzini --- block/curl.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/block/curl.c b/block/curl.c index 86a4a73834..630c6e5187 100644 --- a/block/curl.c +++ b/block/curl.c @@ -98,6 +98,8 @@ typedef struct CURLAIOCB { size_t start; size_t end; + +QSIMPLEQ_ENTRY(CURLAIOCB) next; } CURLAIOCB; typedef struct CURLSocket { @@ -133,6 +135,7 @@ typedef struct BDRVCURLState { bool accept_range; AioContext *aio_context; QemuMutex mutex; +QSIMPLEQ_HEAD(, CURLAIOCB) free_state_waitq; char *username; char *password; char *proxyusername; @@ -532,6 +535,7 @@ static int curl_init_state(BDRVCURLState *s, CURLState *state) /* Called with s->mutex held. */ static void curl_clean_state(CURLState *s) { +CURLAIOCB *next; int j; for (j=0; j < CURL_NUM_ACB; j++) { assert(!s->acb[j]); @@ -548,6 +552,14 @@ static void curl_clean_state(CURLState *s) } s->in_use = 0; + +next = QSIMPLEQ_FIRST(&s->s->free_state_waitq); +if (next) { +QSIMPLEQ_REMOVE_HEAD(&s->s->free_state_waitq, next); +qemu_mutex_unlock(&s->s->mutex); +aio_co_wake(next->co); +qemu_mutex_lock(&s->s->mutex); +} } static void curl_parse_filename(const char *filename, QDict *options, @@ -744,6 +756,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags, } DPRINTF("CURL: Opening %s\n", file); +QSIMPLEQ_INIT(&s->free_state_waitq); s->aio_context = bdrv_get_aio_context(bs); s->url = g_strdup(file); qemu_mutex_lock(&s->mutex); @@ -844,8 +857,9 @@ static void curl_setup_preadv(BlockDriverState *bs, CURLAIOCB *acb) if (state) { break; } +QSIMPLEQ_INSERT_TAIL(&s->free_state_waitq, acb, next); qemu_mutex_unlock(&s->mutex); -aio_poll(bdrv_get_aio_context(bs), true); +qemu_coroutine_yield(); qemu_mutex_lock(&s->mutex); } -- 2.12.2
Re: [Qemu-devel] [PATCH] stream: fix crash in stream_start() when block_job_create() fails
Am 15.05.2017 um 11:34 hat Alberto Garcia geschrieben: > The code that tries to reopen a BlockDriverState in stream_start() > when the creation of a new block job fails crashes because it attempts > to dereference a pointer that is known to be NULL. > > This is a regression introduced in a170a91fd3eab6155da39e740381867e, > likely because the code was copied from stream_complete(). > > Signed-off-by: Alberto Garcia Cc: qemu-sta...@nongnu.org Thanks, applied to the block branch. Kevin
Re: [Qemu-devel] [PATCH 1/2] migration: Remove use of old MigrationParams
On Fri, May 12, 2017 at 05:54:48PM +0200, Juan Quintela wrote: > We have change in the previous patch to use migration capabilities for > it. Notice that we continue using the old command line flags from > migrate command from the time being. Remove the set_params method as > now it is empty. > > Signed-off-by: Juan Quintela > Reviewed-by: zhanghailiang > > -- > - Maintain shared/enabled dependency (Xu suggestien) > --- > include/migration/migration.h | 3 +-- > migration/block.c | 17 ++--- > migration/colo.c | 5 +++-- > migration/migration.c | 14 +++--- > migration/savevm.c| 2 -- > 5 files changed, 17 insertions(+), 24 deletions(-) > > diff --git a/include/migration/migration.h b/include/migration/migration.h > index 258e4ff..d228f29 100644 > --- a/include/migration/migration.h > +++ b/include/migration/migration.h > @@ -43,8 +43,7 @@ > extern int only_migratable; > > struct MigrationParams { > -bool blk; > -bool shared; > +bool unused; /* C doesn't allow empty structs */ > }; > > /* Messages sent on the return path from destination to source */ > diff --git a/migration/block.c b/migration/block.c > index 060087f..fcfa823 100644 > --- a/migration/block.c > +++ b/migration/block.c > @@ -94,9 +94,6 @@ typedef struct BlkMigBlock { > } BlkMigBlock; > > typedef struct BlkMigState { > -/* Written during setup phase. Can be read without a lock. */ > -int blk_enable; > -int shared_base; > QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list; > int64_t total_sector_sum; > bool zero_blocks; > @@ -425,7 +422,7 @@ static int init_blk_migration(QEMUFile *f) > bmds->bulk_completed = 0; > bmds->total_sectors = sectors; > bmds->completed_sectors = 0; > -bmds->shared_base = block_mig_state.shared_base; > +bmds->shared_base = migrate_use_block_shared(); > > assert(i < num_bs); > bmds_bs[i].bmds = bmds; > @@ -994,22 +991,12 @@ static int block_load(QEMUFile *f, void *opaque, int > version_id) > return 0; > } > > -static void block_set_params(const MigrationParams *params, void *opaque) > -{ > -block_mig_state.blk_enable = params->blk; > -block_mig_state.shared_base = params->shared; > - > -/* shared base means that blk_enable = 1 */ > -block_mig_state.blk_enable |= params->shared; > -} > - > static bool block_is_active(void *opaque) > { > -return block_mig_state.blk_enable == 1; > +return migrate_use_block_enabled(); > } > > static SaveVMHandlers savevm_block_handlers = { > -.set_params = block_set_params, > .save_live_setup = block_save_setup, > .save_live_iterate = block_save_iterate, > .save_live_complete_precopy = block_save_complete, > diff --git a/migration/colo.c b/migration/colo.c > index 963c802..e772384 100644 > --- a/migration/colo.c > +++ b/migration/colo.c > @@ -14,6 +14,7 @@ > #include "qemu/timer.h" > #include "sysemu/sysemu.h" > #include "migration/colo.h" > +#include "migration/block.h" > #include "io/channel-buffer.h" > #include "trace.h" > #include "qemu/error-report.h" > @@ -345,8 +346,8 @@ static int colo_do_checkpoint_transaction(MigrationState > *s, > } > > /* Disable block migration */ > -s->params.blk = 0; > -s->params.shared = 0; > +migrate_set_block_enabled(s, false); > +migrate_set_block_shared(s, false); > qemu_savevm_state_header(fb); > qemu_savevm_state_begin(fb, &s->params); > qemu_mutex_lock_iothread(); > diff --git a/migration/migration.c b/migration/migration.c > index 78102eb..0c32609 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -788,6 +788,10 @@ void > qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, > s->enabled_capabilities[cap->value->capability] = cap->value->state; > } > > +if (s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED]) { > +s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED] = true; > +} > + > if (migrate_postcopy_ram()) { > if (migrate_use_compression()) { > /* The decompression threads asynchronously write into RAM > @@ -1214,11 +1218,17 @@ bool migration_is_blocked(Error **errp) > void migrate_set_block_shared(MigrationState *s, bool value) > { > s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = value; > +if (value) { > +s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED] = true; > +} > } > > void migrate_set_block_enabled(MigrationState *s, bool value) > { > s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED] = value; > +if (!value) { > +s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = false; > +} > } > > void qmp_migrate(const char *uri, bool has_blk, bool blk, > @@ -1230,9 +1240,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool > blk, > MigrationP
Re: [Qemu-devel] [PATCH 2/2] migration: Remove old MigrationParams
On Fri, May 12, 2017 at 05:54:49PM +0200, Juan Quintela wrote: > Not used anymore after moving block migration to use capabilities. > > Signed-off-by: Juan Quintela > Reviewed-by: Dr. David Alan Gilbert > Reviewed-by: zhanghailiang Reviewed-by: Peter Xu
Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote: > Hi Yi, > > On 26/04/17 11:12, Liu, Yi L wrote: > > From: "Liu, Yi L" > > > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > > invalidate request from guest to host. > > > > In the case of SVM virtualization on VT-d, host IOMMU driver has > > no knowledge of caching structure updates unless the guest > > invalidation activities are passed down to the host. So a new > > IOCTL is needed to propagate the guest cache invalidation through > > VFIO. > > > > Signed-off-by: Liu, Yi L > > --- > > include/uapi/linux/vfio.h | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 6b97987..50c51f8 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > > > +/* For IOMMU TLB Invalidation Propagation */ > > +struct vfio_iommu_tlb_invalidate { > > + __u32 argsz; > > + __u32 length; > > + __u8data[]; > > +}; > > We initially discussed something a little more generic than this, with > most info explicitly described and only pIOMMU-specific quirks and hints > in an opaque structure. Out of curiosity, why the change? I'm not against > a fully opaque structure, but there seem to be a large overlap between TLB > invalidations across architectures. Hi Jean, As my cover letter mentioned, it is an open on the iommu tlb invalidate propagation. Paste it here since it's in the cover letter for Qemu part changes. Pls refer to the [Open] session in the following link. http://www.spinics.net/lists/kvm/msg148798.html I want to see if community wants to have opaque structure or not on iommu tlb invalidate propagation. Personally, I incline to use opaque structure. But it's better to gather the comments before deciding it. To assist the discussion, I put the full opaque structure here. Once community gets consensus on using opaque structure for iommu tlb invalidate propagation, I'm glad to work with you on a structure with partial opaque since there seems to be overlap across arch. > > For what it's worth, when prototyping the paravirtualized IOMMU I came up > with the following. > > (From the paravirtualized POV, the SMMU also has to swizzle endianess > after unpacking an opaque structure, since userspace doesn't know what's > in it and guest might use a different endianess. So we need to force all > opaque data to be e.g. little-endian.) > > struct vfio_iommu_tlb_invalidate { > __u32 argsz; > __u32 scope; > __u32 flags; > __u32 pasid; > __u64 vaddr; > __u64 size; > __u8data[]; > }; > > Scope is a bitfields restricting the invalidation scope. By default > invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr > and @size are unused. > > Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation > scope to the pasid described by @pasid. > Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation > scope to the address range described by (@vaddr, @size). > > So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA > range for *all* pasids (as well as no_pasid). Setting scope = > (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate > the VA range only for @pasid. > Besides VA range flusing, there is PASID Cache flushing on VT-d. How about SMMU? So I think besides the two scope you defined, may need one more to indicate if it's PASID Cache flushing. > Flags depend on the selected scope: > > VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either > without scope or with INVALIDATE_VADDR) targets non-pasid mappings > exclusively (some architectures, e.g. SMMU, allow this) > > VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need > to invalidate all intermediate tables cached as part of the PTW for vaddr, > only the last-level entry (pte). This is a hint. > > I guess what's missing for Intel IOMMU and would go in @data is the > "global" hint (which we don't have in SMMU invalidations). Do you see > anything else, that the pIOMMU cannot deduce from this structure? > For Intel platform, Drain read/write would be needed in the opaque. Thanks, Yi L
[Qemu-devel] [PATCH] qemu-img: Fix documentation of convert
It got lost in commit a8d16f9ca "qemu-img: Update documentation for -U". Reported-by: Max Reitz Signed-off-by: Fam Zheng --- qemu-img-cmds.hx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index e5bc28f..469ec12 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -40,9 +40,9 @@ STEXI ETEXI DEF("convert", img_convert, -"convert [--object objectdef] [--image-opts] [-U] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m num_coroutines] [-W] filename [filename2 [...]] output_filename") +"convert [--object objectdef] [--image-opts] [-U] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-B backing_file] [-o options] [-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m num_coroutines] [-W] filename [filename2 [...]] output_filename") STEXI -@item convert [--object @var{objectdef}] [--image-opts] [-U] [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}] [-m @var{num_coroutines}] [-W] @var{filename} [@var{filename2} [...]] @var{output_filename} +@item convert [--object @var{objectdef}] [--image-opts] [-U] [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-B @var{backing_file}][-o @var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}] [-m @var{num_coroutines}] [-W] @var{filename} [@var{filename2} [...]] @var{output_filename} ETEXI DEF("dd", img_dd, -- 2.9.3
Re: [Qemu-devel] [PATCH v8 0/4] Improve convert and dd commands
On Fri, 05/12 19:41, Max Reitz wrote: > The fun is increased by the fact that the locking series has > (inadvertently) removed the -B documentation from convert, so there is > another conflict looming in the future... Sorry about the mistake there.. I've posted a patch for that: [Qemu-devel] [PATCH] qemu-img: Fix documentation of convert Fam
Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams
* Juan Quintela (quint...@redhat.com) wrote: > Eric Blake wrote: > > On 05/12/2017 05:55 AM, Juan Quintela wrote: > @@ -1239,6 +1240,7 @@ void qmp_migrate(const char *uri, bool has_blk, > bool blk, > } > > if (has_inc && inc) { > +migrate_set_block_enabled(s, true); > migrate_set_block_shared(s, true); > >>> > >>> [2] > >>> > >>> IIUC for [1] & [2] we are solving the same problem that "shared" > >>> depends on "enabled" bit. Would it be good to unitfy this dependency > >>> somewhere? E.g., by changing migrate_set_block_shared() into: > >>> > >>> void migrate_set_block_shared(MigrationState *s, bool value) > >>> { > >>> s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = value; > >>> if (value) { > >>> migrate_set_block_enabled(s, true); > >>> } > >>> } > >> > >> ok with this. > > > > Or, as I commented on 1/3, maybe having a single property that is a > > tri-state enum value, instead of 2 separate boolean properties, might be > > nicer (but certainly a bit more complex to code up). > > If you teach me how to do the qapi/qmp part, I will do the other bits. > I don't really care if we do it one way or the other. > > >> I will add once here that when we disable block enabled, we also disable > >> shared, or just let it that way? > >> > >>> Another thing to mention: after switching to the capability interface, > >>> we'll cache the "enabled" and "shared" bits now while we don't cache > >>> it before, right? IIUC it'll affect behavior of such sequence: > >>> > >>> - 1st migrate with enabled=1, shared=1, then > >>> - 2nd migrate with enabled=0, shared=0 > >>> > >>> Before the series, the 2nd migrate will use enabled=shared=0, but > >>> after the series it should be using enabled=shared=1. Not sure whether > >>> this would be a problem (or I missed anything?). > >> > >> We can't be consistent with both old/new way. > >> > >> Old way: we always setup the capabilities on command line (that should > >> have been deprecated long, long ago) > > > > Well, the easy way out is to have the HMP migrate command (I assume > > that's what you mean by "on command line") explicitly clear the > > parameters if it is called without the -b/-i flag. So the start of each > > migration is what changes the properties, so long as you are still using > > HMP to start the migration. Or, on the QMP side, since 'migrate' has > > optional 'blk' and 'inc' booleans, basically leave the settings alone if > > the parameters were omitted, and explicitly update the property to the > > value of those parameters if they were present. > > We are going to have trouble whatever way that we do it, or we start > doing lots of strange things. > > Forget about qmp, we are going to assume that it is consistent with hmp. > > migrate_set_capabilities block_enabled on > migrate -b . > > Should migrate disable the block_enabled capability? Give one > warning/error? > > And notice that this only matter if we do a migration, we cancel/get one > error, and then we migrate again. > > What I tried to do is assume that -b/-i arguments don't exist. And if > the user use them, we implement its behaviour with the minimal fuss > possibly. > > Only way that I can think of being consistent and bug compatible will be > to store: > - old block_shared/enanbeld capability value > - if we set -b/-i on the command line > > In migration cleanup do the right thing depending on this four > variables. I think that it is adding lots of complexity for very few > improvement. > > > > Or is the proposal that we are also going to simplify the QMP 'migrate' > > command to get rid of crufty parameters? > > I didn't read it that way, but I would not oppose O:-) > Ewww this is messy; you end up with almost as much code as the old flags you're trying to remove. For HMP you could gently deprecate the flags over time and give a warning telling people to use the capabilities; but doing that in one go is a bit nasty, and you still have to do something cleverer for the QMP which is similar. I think I agree though that migrate, cancel, migrate should work sensibly and it's hard to get it right. Dave > Later, Juan. -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] [PATCH] ehci: fix overflow in frame timer code
In case the frame timer doesn't run for a while due to the host being busy skipped_uframes can become big enough that UFRAME_TIMER_NS * skipped_uframes overflows. Which in turn throws off all subsequent ehci frame timer calculations. Reported-by: 李林 <8610...@163.com> Signed-off-by: Gerd Hoffmann --- hw/usb/hcd-ehci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 3703a8dddc..17c572c55f 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -2238,7 +2238,7 @@ static void ehci_work_bh(void *opaque) int need_timer = 0; int64_t expire_time, t_now; uint64_t ns_elapsed; -int uframes, skipped_uframes; +uint64_t uframes, skipped_uframes; int i; t_now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); -- 2.9.3
Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd
On Mon, 05/15 12:02, Peter Lieven wrote: > Hi Block developers, > > I would like to add a feature to Qemu to drain all traffic from a block so > that > I can take external snaphosts without the risk to that in the middle of a > write > operation. Its meant for cases where where QGA freeze/thaw is not available. > > For me its enough to have this through qemu-io, but Kevin asked me to check > if its not worth to have a stable API for it and present it via QMP/HMP. > > What are your thoughts? For debugging purpose or a "hacky" usage where you know what you are doing, it may be fine to have this. The only issue is it should be a separate flag, like BlockJob.user_paused. What happens from guest perspective? In the case of virtio, the request queue is not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still handled, the command is not effective (or rather the implementation is not complete). Fam > > Thanks, > Peter > > --- > qemu-io-cmds.c | 78 > ++ > 1 file changed, 78 insertions(+) > > diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c > index 312fc6d..49d82fe 100644 > --- a/qemu-io-cmds.c > +++ b/qemu-io-cmds.c > @@ -1565,6 +1565,82 @@ static const cmdinfo_t flush_cmd = { > .oneline= "flush all in-core file state to disk", > }; > > +static const cmdinfo_t drain_cmd; > + > +static int drain_f(BlockBackend *blk, int argc, char **argv) > +{ > +BlockDriverState *bs = blk_bs(blk); > +bool flush = false; > +int c; > + > +while ((c = getopt(argc, argv, "f")) != -1) { > +switch (c) { > +case 'f': > +flush = true; > +break; > +default: > +return qemuio_command_usage(&drain_cmd); > +} > +} > + > +if (optind != argc) { > +return qemuio_command_usage(&drain_cmd); > +} > + > + > +if (bs->quiesce_counter) { > +printf("drain failed: device is already drained!\n"); > +return 1; > +} > + > +bdrv_drained_begin(bs); /* complete I/O */ > +if (flush) { > +bdrv_flush(bs); > +bdrv_drain(bs); /* in case flush left pending I/O */ > +printf("flushed all pending I/O\n"); > +} > +printf("drain successful\n"); > +return 0; > +} > + > +static void drain_help(void) > +{ > +printf( > +"\n" > +" Drains all external I/O from the device\n" > +"\n" > +" -f, -- flush all in-core file state to disk\n" > +"\n"); > +} > + > +static const cmdinfo_t drain_cmd = { > +.name = "drain", > +.cfunc = drain_f, > +.args = "[-f]", > +.argmin = 0, > +.argmax = -1, > +.oneline= "cease to send I/O to the device", > +.help = drain_help > +}; > + > +static int undrain_f(BlockBackend *blk, int argc, char **argv) > +{ > +BlockDriverState *bs = blk_bs(blk); > +if (!bs->quiesce_counter) { > +printf("undrain failed: device is not drained!\n"); > +return 1; > +} > +bdrv_drained_end(bs); > +printf("undrain successful\n"); > +return 0; > +} > + > +static const cmdinfo_t undrain_cmd = { > +.name = "undrain", > +.cfunc = undrain_f, > +.oneline= "continue I/O to a drained device", > +}; > + > static int truncate_f(BlockBackend *blk, int argc, char **argv) > { > int64_t offset; > @@ -2296,6 +2372,8 @@ static void __attribute((constructor)) > init_qemuio_commands(void) > qemuio_add_command(&aio_write_cmd); > qemuio_add_command(&aio_flush_cmd); > qemuio_add_command(&flush_cmd); > +qemuio_add_command(&drain_cmd); > +qemuio_add_command(&undrain_cmd); > qemuio_add_command(&truncate_cmd); > qemuio_add_command(&length_cmd); > qemuio_add_command(&info_cmd); > -- > 1.9.1 > >
Re: [Qemu-devel] [PATCH v3 0/7] curl: locking cleanups/fixes, coroutine conversion, remove aio_poll
Hi, This series seems to have some coding style problems. See output below for more information: Subject: [Qemu-devel] [PATCH v3 0/7] curl: locking cleanups/fixes, coroutine conversion, remove aio_poll Message-id: 20170515100059.15795-1-pbonz...@redhat.com Type: series === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 git config --local diff.renamelimit 0 git config --local diff.renames True commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20170515104543.32044-1-kra...@redhat.com -> patchew/20170515104543.32044-1-kra...@redhat.com Switched to a new branch 'test' e926adb curl: do not do aio_poll when waiting for a free CURLState 4eebc68 curl: convert readv to coroutines f4b49bc curl: convert CURLAIOCB to byte values 610980d curl: split curl_find_state/curl_init_state c5ca4dc curl: avoid recursive locking of BDRVCURLState mutex 3ace9ce curl: never invoke callbacks with s->mutex held b31fcbf curl: strengthen assertion in curl_clean_state === OUTPUT BEGIN === Checking PATCH 1/7: curl: strengthen assertion in curl_clean_state... ERROR: spaces required around that '=' (ctx:VxV) #24: FILE: block/curl.c:536: +for (j=0; j < CURL_NUM_ACB; j++) { ^ total: 1 errors, 0 warnings, 11 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 2/7: curl: never invoke callbacks with s->mutex held... Checking PATCH 3/7: curl: avoid recursive locking of BDRVCURLState mutex... Checking PATCH 4/7: curl: split curl_find_state/curl_init_state... ERROR: spaces required around that '=' (ctx:VxV) #40: FILE: block/curl.c:463: +for (i=0; i < CURL_NUM_STATES; i++) { ^ total: 1 errors, 0 warnings, 92 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 5/7: curl: convert CURLAIOCB to byte values... Checking PATCH 6/7: curl: convert readv to coroutines... Checking PATCH 7/7: curl: do not do aio_poll when waiting for a free CURLState... === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-de...@freelists.org
Re: [Qemu-devel] [PATCH 05/12] migration: Move colo.h to migration/
"Dr. David Alan Gilbert" wrote: D> * Juan Quintela (quint...@redhat.com) wrote: >> There are functions only used by migration code. > > That's only mostly true; see the current 'integrate colo frame with > block replication and net compare' series (posted 22nd April). > That adds colo_handle_shutdown to this header and calls it from vl.c > ( https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03901.html ) > where should that go? Dropped. It compiled, but who knows O:-) > There's also a net/colo.h as well, so using the > #include "colo.h" in migration is correct but that's > really scary when there are two files of the same name. Yeap, it is scary ...
Re: [Qemu-devel] [PATCH 09/12] migration: Split vmstate-types.c from vmstate.c
"Dr. David Alan Gilbert" wrote: > * Juan Quintela (quint...@redhat.com) wrote: >> Now one just has the interperter, and the other has the basic types. >> Once there, add copyright boilerplate. >> >> Signed-off-by: Juan Quintela > > I think this is generally OK, but as discussed on IRC, I think > you need to check the licenses. Changed to GPLv2 or later. I just copied that from savevm.c, I should have known better and read it O:-) Later, Juan.
Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote: > On Wed, 26 Apr 2017 18:12:04 +0800 > "Liu, Yi L" wrote: > > > From: "Liu, Yi L" > > > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > > invalidate request from guest to host. > > > > In the case of SVM virtualization on VT-d, host IOMMU driver has > > no knowledge of caching structure updates unless the guest > > invalidation activities are passed down to the host. So a new > > IOCTL is needed to propagate the guest cache invalidation through > > VFIO. > > > > Signed-off-by: Liu, Yi L > > --- > > include/uapi/linux/vfio.h | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 6b97987..50c51f8 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > > > +/* For IOMMU TLB Invalidation Propagation */ > > +struct vfio_iommu_tlb_invalidate { > > + __u32 argsz; > > + __u32 length; > > + __u8data[]; > > +}; > > + > > +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) > > I'm kind of wondering why this isn't just a new flag bit on > vfio_device_svm, the data structure is so similar. Of course data > needs to be fully specified in uapi. Hi Alex, For this part, it depends on using opaque structure or not. The following link mentioned it in [Open] session. http://www.spinics.net/lists/kvm/msg148798.html If we pick the full opaque solution for iommu tlb invalidate propagation. Then I may add a flag bit on vfio_device_svm and also add definition in uapi as you suggested. Thanks, Yi L > > + > > /* Additional API for SPAPR TCE (Server POWERPC) IOMMU */ > > > > /* >
Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
On Fri, May 12, 2017 at 03:59:14PM -0600, Alex Williamson wrote: > On Wed, 26 Apr 2017 18:11:58 +0800 > "Liu, Yi L" wrote: > > > From: Jacob Pan > > > > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use > > case in the guest: > > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html > > > > As part of the proposed architecture, when a SVM capable PCI > > device is assigned to a guest, nested mode is turned on. Guest owns the > > first level page tables (request with PASID) and performs GVA->GPA > > translation. Second level page tables are owned by the host for GPA->HPA > > translation for both request with and without PASID. > > > > A new IOMMU driver interface is therefore needed to perform tasks as > > follows: > > * Enable nested translation and appropriate translation type > > * Assign guest PASID table pointer (in GPA) and size to host IOMMU > > > > This patch introduces new functions called iommu_(un)bind_pasid_table() > > to IOMMU APIs. Architecture specific IOMMU function can be added later > > to perform the specific steps for binding pasid table of assigned devices. > > > > This patch also adds model definition in iommu.h. It would be used to > > check if the bind request is from a compatible entity. e.g. a bind > > request from an intel_iommu emulator may not be supported by an ARM SMMU > > driver. > > > > Signed-off-by: Jacob Pan > > Signed-off-by: Liu, Yi L > > --- > > drivers/iommu/iommu.c | 19 +++ > > include/linux/iommu.h | 31 +++ > > 2 files changed, 50 insertions(+) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index dbe7f65..f2da636 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, > > struct device *dev) > > } > > EXPORT_SYMBOL_GPL(iommu_attach_device); > > > > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, > > + struct pasid_table_info *pasidt_binfo) > > +{ > > + if (unlikely(!domain->ops->bind_pasid_table)) > > + return -EINVAL; > > + > > + return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo); > > +} > > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table); > > + > > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device > > *dev) > > +{ > > + if (unlikely(!domain->ops->unbind_pasid_table)) > > + return -EINVAL; > > + > > + return domain->ops->unbind_pasid_table(domain, dev); > > +} > > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table); > > + > > static void __iommu_detach_device(struct iommu_domain *domain, > > struct device *dev) > > { > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > > index 0ff5111..491a011 100644 > > --- a/include/linux/iommu.h > > +++ b/include/linux/iommu.h > > @@ -131,6 +131,15 @@ struct iommu_dm_region { > > int prot; > > }; > > > > +struct pasid_table_info { > > + __u64 ptr;/* PASID table ptr */ > > + __u64 size; /* PASID table size*/ > > + __u32 model; /* magic number */ > > +#define INTEL_IOMMU(1 << 0) > > +#define ARM_SMMU (1 << 1) > > + __u8opaque[];/* IOMMU-specific details */ > > +}; > > This needs to be in uapi since you're expecting a user to pass it yes, it is. Thx for the correction. Thanks, Yi L > > + > > #ifdef CONFIG_IOMMU_API > > > > /** > > @@ -159,6 +168,8 @@ struct iommu_dm_region { > > * @domain_get_windows: Return the number of windows for a domain > > * @of_xlate: add OF master IDs to iommu grouping > > * @pgsize_bitmap: bitmap of all possible supported page sizes > > + * @bind_pasid_table: bind pasid table pointer for guest SVM > > + * @unbind_pasid_table: unbind pasid table pointer and restore defaults > > */ > > struct iommu_ops { > > bool (*capable)(enum iommu_cap); > > @@ -200,6 +211,10 @@ struct iommu_ops { > > u32 (*domain_get_windows)(struct iommu_domain *domain); > > > > int (*of_xlate)(struct device *dev, struct of_phandle_args *args); > > + int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev, > > + struct pasid_table_info *pasidt_binfo); > > + int (*unbind_pasid_table)(struct iommu_domain *domain, > > + struct device *dev); > > > > unsigned long pgsize_bitmap; > > }; > > @@ -221,6 +236,10 @@ extern int iommu_attach_device(struct iommu_domain > > *domain, > >struct device *dev); > > extern void iommu_detach_device(struct iommu_domain *domain, > > struct device *dev); > > +extern int iommu_bind_pasid_table(struct iommu_domain *domain, > > + struct device *dev, struct pasid_table_info *pasidt_binfo); > > +extern int iommu_unbind_pasid_table(struct iommu_domain *domain, > > + struct device *dev); > > extern st
Re: [Qemu-devel] [PATCH 12/12] migration: migration.h was not needed
"Dr. David Alan Gilbert" wrote: > * Juan Quintela (quint...@redhat.com) wrote: >> This files don't use any function from migration.h, so drop it. >> >> Signed-off-by: Juan Quintela >> --- >> block/qed.c | 1 - >> hw/i386/pc_q35.c| 1 - >> hw/virtio/vhost-user.c | 1 - >> hw/virtio/vhost-vsock.c | 1 - >> hw/virtio/virtio.c | 1 - >> monitor.c | 1 - >> 6 files changed, 6 deletions(-) >> >> diff --git a/block/qed.c b/block/qed.c >> index fd76817..8d899fd 100644 >> --- a/block/qed.c >> +++ b/block/qed.c >> @@ -19,7 +19,6 @@ >> #include "trace.h" >> #include "qed.h" >> #include "qapi/qmp/qerror.h" >> -#include "migration/migration.h" >> #include "sysemu/block-backend.h" >> >> static const AIOCBInfo qed_aiocb_info = { >> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c >> index dd792a8..76b08f8 100644 >> --- a/hw/i386/pc_q35.c >> +++ b/hw/i386/pc_q35.c >> @@ -46,7 +46,6 @@ >> #include "hw/ide/ahci.h" >> #include "hw/usb.h" >> #include "qemu/error-report.h" >> -#include "migration/migration.h" >> >> /* ICH9 AHCI has 6 ports */ >> #define MAX_SATA_PORTS 6 >> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >> index 9334a8a..ebc8ccf 100644 >> --- a/hw/virtio/vhost-user.c >> +++ b/hw/virtio/vhost-user.c >> @@ -17,7 +17,6 @@ >> #include "sysemu/kvm.h" >> #include "qemu/error-report.h" >> #include "qemu/sockets.h" >> -#include "migration/migration.h" >> >> #include >> #include >> diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c >> index b481562..49e0022 100644 >> --- a/hw/virtio/vhost-vsock.c >> +++ b/hw/virtio/vhost-vsock.c >> @@ -17,7 +17,6 @@ >> #include "qapi/error.h" >> #include "hw/virtio/virtio-bus.h" >> #include "hw/virtio/virtio-access.h" >> -#include "migration/migration.h" >> #include "qemu/error-report.h" >> #include "hw/virtio/vhost-vsock.h" >> #include "qemu/iov.h" > > Aren't these including it to get vmstate macros? > but have they picked up that instead somewhere? hw/hw.h hw/virtio/vhost-vsock.h -> hw/virtio/virtio.h -> hw/hw.h Dropping vmstate.h from that file would have mean have to add one hundred new includes or so. And I didn't see any good reason for that. My understanding is that all devices end including hw/hw.h one way or another, so the only reason to add migration include files right now is: - you use old registartion functions - you use blockers - you use migration notifiers Otherwise, if you just use vmstate macros, they are there by hw/hw.h. Later, Juan.
Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd
Am 15.05.2017 um 12:50 schrieb Fam Zheng: On Mon, 05/15 12:02, Peter Lieven wrote: Hi Block developers, I would like to add a feature to Qemu to drain all traffic from a block so that I can take external snaphosts without the risk to that in the middle of a write operation. Its meant for cases where where QGA freeze/thaw is not available. For me its enough to have this through qemu-io, but Kevin asked me to check if its not worth to have a stable API for it and present it via QMP/HMP. What are your thoughts? For debugging purpose or a "hacky" usage where you know what you are doing, it may be fine to have this. The only issue is it should be a separate flag, like BlockJob.user_paused. How can I add, remove such a flag? What happens from guest perspective? In the case of virtio, the request queue is not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still handled, the command is not effective (or rather the implementation is not complete). That it only works with virtio is fine. However, the thing it does not work correctly apply then also to all other users of the drained_begin/end functions, right? As for the timeout I only plan to drain the device for about 1 second. Thanks, Peter
[Qemu-devel] [PATCH 2/6] spapr: fix error path of required kernel-irqchip
QEMU should exit if the user explicitely asked for kernel-irqchip support and "xics-kvm" initialization fails. The changelog of commit 34f2af3d3edf ("spapr: Clean up misuse of qdev_init() in xics-kvm creation") reads: While there, improve the error message when we can't satisfy an explicit user request for "xics-kvm", and exit(1) instead of abort(). Simplify the abort when we can't create "xics". This patch adds the missing call to exit(). Signed-off-by: Greg Kurz --- hw/ppc/spapr.c |1 + 1 file changed, 1 insertion(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index abfb99b71b7d..f477d7b8a210 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -133,6 +133,7 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp) if (machine_kernel_irqchip_required(machine) && !spapr->ics) { error_reportf_err(err, "kernel_irqchip requested but unavailable: "); +exit(EXIT_FAILURE); } else { error_free(err); }
[Qemu-devel] [PATCH 4/6] spapr: sanitize error handling in spapr_ics_create()
The spapr_ics_create() function handles errors in a rather convoluted way, with two local Error * variables. Moreover, failing to parent the ICS object to the machine should be considered as a bug but it is currently ignored. This patch addresses both issues. Signed-off-by: Greg Kurz --- hw/ppc/spapr.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 44f7dc7f40e9..c53989bb10b1 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -101,21 +101,26 @@ static ICSState *spapr_ics_create(sPAPRMachineState *spapr, const char *type_ics, int nr_irqs, Error **errp) { -Error *err = NULL, *local_err = NULL; +Error *local_err = NULL; Object *obj; obj = object_new(type_ics); -object_property_add_child(OBJECT(spapr), "ics", obj, NULL); +object_property_add_child(OBJECT(spapr), "ics", obj, &error_abort); object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort); -object_property_set_int(obj, nr_irqs, "nr-irqs", &err); +object_property_set_int(obj, nr_irqs, "nr-irqs", &local_err); +if (local_err) { +goto error; +} object_property_set_bool(obj, true, "realized", &local_err); -error_propagate(&err, local_err); -if (err) { -error_propagate(errp, err); -return NULL; +if (local_err) { +goto error; } return ICS_SIMPLE(obj); + +error: +error_propagate(errp, local_err); +return NULL; } static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
[Qemu-devel] [PATCH 0/6] spapr/xics: fix migration of older machine types
Recent work on XICS broke migration of older machine types. The target fails with: qemu-system-ppc64: Unknown savevm section or instance 'icp/server' 1 qemu-system-ppc64: load of migration failed: Invalid argument This happens because the current code no longer pre-allocates ICP objects at machine init time. This series restore the previous behavior for older machine types (patch 6). Since the fix adds an error path to cover the case when realization of the ICP objects fails, the series first does some cleanup on the way errors are handled (patch 1-5). I could successfully migrate older machines back and forth with QEMU 2.9: - pseries-2.9 - pseries-2.9 with hotplugged CPUs - pseries-2.6 (pre CPU hotplug support) This series is based on: https://github.com/dgibson/qemu.git ppc-for-2.10 -- Greg --- Greg Kurz (6): ppc/xics: simplify prototype of xics_spapr_init() spapr: fix error path of required kernel-irqchip spapr: fix error reporting in xics_system_init() spapr: sanitize error handling in spapr_ics_create() spapr-cpu-core: release ICP object when realization fails spapr: fix migration of ICP objects from/to older QEMU hw/intc/xics_spapr.c|3 +- hw/ppc/spapr.c | 75 ++- hw/ppc/spapr_cpu_core.c | 40 - include/hw/ppc/spapr.h |2 + include/hw/ppc/xics.h |2 + 5 files changed, 90 insertions(+), 32 deletions(-)
[Qemu-devel] [PATCH 6/6] spapr: fix migration of ICP objects from/to older QEMU
Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under sPAPRCPUCore") moved ICP objects from the machine to CPU cores. This is an improvement since we no longer allocate ICP objects that will never be used. But it has the side-effect of breaking migration of older machine types from older QEMU versions. This patch introduces a compat flag in the sPAPR machine class so that all pseries machine up to 2.9 go on with the previous behavior of pre-allocating ICP objects. While here, we also ensure that object_property_add_child() errors cause QEMU to abort for newer machines. Signed-off-by: Greg Kurz --- hw/ppc/spapr.c | 36 hw/ppc/spapr_cpu_core.c | 28 include/hw/ppc/spapr.h |2 ++ 3 files changed, 58 insertions(+), 8 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index c53989bb10b1..ab3683bcd677 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -126,6 +126,7 @@ error: static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp) { sPAPRMachineState *spapr = SPAPR_MACHINE(machine); +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); Error *local_err = NULL; if (kvm_enabled()) { @@ -151,6 +152,38 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp) &local_err); } +if (!spapr->ics) { +goto out; +} + +if (smc->must_pre_allocate_icps) { +int smt = kvmppc_smt_threads(); +int nr_servers = DIV_ROUND_UP(max_cpus * smt, smp_threads); +int i; + +spapr->legacy_icps = g_malloc0(nr_servers * sizeof(ICPState)); + +for (i = 0; i < nr_servers; i++) { +void* obj = &spapr->legacy_icps[i]; + +object_initialize(obj, sizeof(ICPState), spapr->icp_type); +object_property_add_child(OBJECT(spapr), "icp[*]", obj, + &error_abort); +object_unref(obj); +object_property_add_const_link(obj, "xics", OBJECT(spapr), + &error_abort); +object_property_set_bool(obj, true, "realized", &local_err); +if (local_err) { +while (i--) { +object_unparent(obj); +} +g_free(spapr->legacy_icps); +break; +} +} +} + +out: error_propagate(errp, local_err); } @@ -3256,8 +3289,11 @@ static void spapr_machine_2_9_instance_options(MachineState *machine) static void spapr_machine_2_9_class_options(MachineClass *mc) { +sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc); + spapr_machine_2_10_class_options(mc); SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9); +smc->must_pre_allocate_icps = true; } DEFINE_SPAPR_MACHINE(2_9, "2.9", false); diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 63d160f7e010..5476647efa06 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -119,6 +119,7 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, Error **errp) size_t size = object_type_get_instance_size(typename); CPUCore *cc = CPU_CORE(dev); int i; +sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); for (i = 0; i < cc->nr_threads; i++) { void *obj = sc->threads + i * size; @@ -127,7 +128,9 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, Error **errp) PowerPCCPU *cpu = POWERPC_CPU(cs); spapr_cpu_destroy(cpu); -object_unparent(cpu->intc); +if (!spapr->legacy_icps) { +object_unparent(cpu->intc); +} cpu_remove_sync(cs); object_unparent(obj); } @@ -142,12 +145,19 @@ static void spapr_cpu_core_realize_child(Object *child, Error **errp) PowerPCCPU *cpu = POWERPC_CPU(cs); Object *obj; -obj = object_new(spapr->icp_type); -object_property_add_child(OBJECT(cpu), "icp", obj, NULL); -object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort); -object_property_set_bool(obj, true, "realized", &local_err); -if (local_err) { -goto error; +if (spapr->legacy_icps) { +int index = cpu->parent_obj.cpu_index; + +obj = OBJECT(&spapr->legacy_icps[index]); +} else { +obj = object_new(spapr->icp_type); +object_property_add_child(OBJECT(cpu), "icp", obj, &error_abort); +object_property_add_const_link(obj, "xics", OBJECT(spapr), + &error_abort); +object_property_set_bool(obj, true, "realized", &local_err); +if (local_err) { +goto error; +} } object_property_set_bool(child, true, "realized", &local_err); @@ -164,7 +174,9 @@ static void spapr_cpu_core_realize_child(Object *child, Error **errp) return; error: -object_unparent(obj); +if (!spapr->legacy
[Qemu-devel] [PATCH 1/6] ppc/xics: simplify prototype of xics_spapr_init()
This function only does hypercall and RTAS-call registration, and thus never returns an error. This patch adapt the prototype to reflect that. Signed-off-by: Greg Kurz --- hw/intc/xics_spapr.c |3 +-- hw/ppc/spapr.c|2 +- include/hw/ppc/xics.h |2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c index f05308b897f2..d98ea8b13068 100644 --- a/hw/intc/xics_spapr.c +++ b/hw/intc/xics_spapr.c @@ -229,7 +229,7 @@ static void rtas_int_on(PowerPCCPU *cpu, sPAPRMachineState *spapr, rtas_st(rets, 0, RTAS_OUT_SUCCESS); } -int xics_spapr_init(sPAPRMachineState *spapr, Error **errp) +void xics_spapr_init(sPAPRMachineState *spapr) { /* Registration of global state belongs into realize */ spapr_rtas_register(RTAS_IBM_SET_XIVE, "ibm,set-xive", rtas_set_xive); @@ -243,7 +243,6 @@ int xics_spapr_init(sPAPRMachineState *spapr, Error **errp) spapr_register_hypercall(H_XIRR_X, h_xirr_x); spapr_register_hypercall(H_EOI, h_eoi); spapr_register_hypercall(H_IPOLL, h_ipoll); -return 0; } #define ICS_IRQ_FREE(ics, srcno) \ diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 1b7cadab0cdf..abfb99b71b7d 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -139,7 +139,7 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp) } if (!spapr->ics) { -xics_spapr_init(spapr, errp); +xics_spapr_init(spapr); spapr->icp_type = TYPE_ICP; spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, errp); } diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h index 05e6acbb3533..d6cb51f3ad5d 100644 --- a/include/hw/ppc/xics.h +++ b/include/hw/ppc/xics.h @@ -206,6 +206,6 @@ void icp_resend(ICPState *ss); typedef struct sPAPRMachineState sPAPRMachineState; int xics_kvm_init(sPAPRMachineState *spapr, Error **errp); -int xics_spapr_init(sPAPRMachineState *spapr, Error **errp); +void xics_spapr_init(sPAPRMachineState *spapr); #endif /* XICS_H */
[Qemu-devel] [PATCH 3/6] spapr: fix error reporting in xics_system_init()
The xics_system_init() function passes its errp argument to xics_kvm_init(). If the call fails and the user requested in-kernel irqchip, it then ends up passing a NULL Error * to error_reportf_err() and the error message is silently dropped. Passing an errp argument is generally wrong, unless you really just need to convey an error to the caller. This patch converts xics_system_init() to *only* pass a pointer to its own Error * and then to propagate it with error_propagate(), as recommended in error.h. The local_err name is used for consistency with the rest of the code. Signed-off-by: Greg Kurz --- hw/ppc/spapr.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index f477d7b8a210..44f7dc7f40e9 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -121,29 +121,32 @@ static ICSState *spapr_ics_create(sPAPRMachineState *spapr, static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp) { sPAPRMachineState *spapr = SPAPR_MACHINE(machine); +Error *local_err = NULL; if (kvm_enabled()) { -Error *err = NULL; - if (machine_kernel_irqchip_allowed(machine) && -!xics_kvm_init(spapr, errp)) { +!xics_kvm_init(spapr, &local_err)) { spapr->icp_type = TYPE_KVM_ICP; -spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, &err); +spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, + &local_err); } if (machine_kernel_irqchip_required(machine) && !spapr->ics) { -error_reportf_err(err, +error_reportf_err(local_err, "kernel_irqchip requested but unavailable: "); exit(EXIT_FAILURE); } else { -error_free(err); +error_free(local_err); } } if (!spapr->ics) { xics_spapr_init(spapr); spapr->icp_type = TYPE_ICP; -spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, errp); +spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, + &local_err); } + +error_propagate(errp, local_err); } static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
[Qemu-devel] [PATCH 5/6] spapr-cpu-core: release ICP object when realization fails
While here we introduce a single error path to avoid code duplication. Signed-off-by: Greg Kurz --- hw/ppc/spapr_cpu_core.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 4389ef4c2aef..63d160f7e010 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -147,25 +147,25 @@ static void spapr_cpu_core_realize_child(Object *child, Error **errp) object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort); object_property_set_bool(obj, true, "realized", &local_err); if (local_err) { -error_propagate(errp, local_err); -return; +goto error; } object_property_set_bool(child, true, "realized", &local_err); if (local_err) { -object_unparent(obj); -error_propagate(errp, local_err); -return; +goto error; } spapr_cpu_init(spapr, cpu, &local_err); if (local_err) { -object_unparent(obj); -error_propagate(errp, local_err); -return; +goto error; } xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj)); +return; + +error: +object_unparent(obj); +error_propagate(errp, local_err); } static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
Re: [Qemu-devel] [PATCH 2/6] spapr: fix error path of required kernel-irqchip
On 05/15/2017 01:39 PM, Greg Kurz wrote: > QEMU should exit if the user explicitely asked for kernel-irqchip support > and "xics-kvm" initialization fails. > > The changelog of commit 34f2af3d3edf ("spapr: Clean up misuse of qdev_init() > in xics-kvm creation") reads: > > While there, improve the error message when we can't satisfy an > explicit user request for "xics-kvm", and exit(1) instead of abort(). > Simplify the abort when we can't create "xics". > > This patch adds the missing call to exit(). > > Signed-off-by: Greg Kurz Reviewed-by: Cédric Le Goater > --- > hw/ppc/spapr.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index abfb99b71b7d..f477d7b8a210 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -133,6 +133,7 @@ static void xics_system_init(MachineState *machine, int > nr_irqs, Error **errp) > if (machine_kernel_irqchip_required(machine) && !spapr->ics) { > error_reportf_err(err, >"kernel_irqchip requested but unavailable: "); > +exit(EXIT_FAILURE); > } else { > error_free(err); > } >
Re: [Qemu-devel] [PATCH 3/6] spapr: fix error reporting in xics_system_init()
On 05/15/2017 01:39 PM, Greg Kurz wrote: > The xics_system_init() function passes its errp argument to xics_kvm_init(). > If the call fails and the user requested in-kernel irqchip, it then ends up > passing a NULL Error * to error_reportf_err() and the error message is > silently dropped. > > Passing an errp argument is generally wrong, unless you really just need > to convey an error to the caller. > > This patch converts xics_system_init() to *only* pass a pointer to its own > Error * and then to propagate it with error_propagate(), as recommended > in error.h. > > The local_err name is used for consistency with the rest of the code. > > Signed-off-by: Greg Kurz Thanks for the fix. Reviewed-by: Cédric Le Goater C. > --- > hw/ppc/spapr.c | 17 ++--- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index f477d7b8a210..44f7dc7f40e9 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -121,29 +121,32 @@ static ICSState *spapr_ics_create(sPAPRMachineState > *spapr, > static void xics_system_init(MachineState *machine, int nr_irqs, Error > **errp) > { > sPAPRMachineState *spapr = SPAPR_MACHINE(machine); > +Error *local_err = NULL; > > if (kvm_enabled()) { > -Error *err = NULL; > - > if (machine_kernel_irqchip_allowed(machine) && > -!xics_kvm_init(spapr, errp)) { > +!xics_kvm_init(spapr, &local_err)) { > spapr->icp_type = TYPE_KVM_ICP; > -spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, > &err); > +spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, > + &local_err); > } > if (machine_kernel_irqchip_required(machine) && !spapr->ics) { > -error_reportf_err(err, > +error_reportf_err(local_err, >"kernel_irqchip requested but unavailable: "); > exit(EXIT_FAILURE); > } else { > -error_free(err); > +error_free(local_err); > } > } > > if (!spapr->ics) { > xics_spapr_init(spapr); > spapr->icp_type = TYPE_ICP; > -spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, errp); > +spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, > + &local_err); > } > + > +error_propagate(errp, local_err); > } > > static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu, >
Re: [Qemu-devel] [PATCH] Disable the not fully implemented warning for e1000
Hi Julien, On 05/05/2017 09:57 AM, Julien Duponchelle wrote: Otherwise for image like CISCO IOSv you got a lot of warning on the console preventing you to use the VM because it's slow down the machine. This fix: https://bugs.launchpad.net/qemu/+bug/1673722 Signed-off-by: Julien Duponchelle --- hw/net/e1000.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index f2e5072d27..095fd0133f 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -35,6 +35,7 @@ #include "sysemu/dma.h" #include "qemu/iov.h" #include "qemu/range.h" +#include "qemu/log.h" #include "e1000x_common.h" @@ -1264,8 +1265,9 @@ e1000_mmio_write(void *opaque, hwaddr addr, uint64_t val, if (!(mac_reg_access[index] & MAC_ACCESS_FLAG_NEEDED) || (s->compat_flags & (mac_reg_access[index] >> 2))) { if (mac_reg_access[index] & MAC_ACCESS_PARTIAL) { -DBGOUT(GENERAL, "Writing to register at offset: 0x%08x. " - "It is not fully implemented.\n", index<<2); +qemu_log_mask(LOG_UNIMP, "e1000: " +"Writing to register at offset: 0x%08x. " +"It is not fully implemented.\n", index << 2); trailing newline is no more necessary. I mistaken with error_report(), qemu_log_mask() do use trailing newline, sorry! what about: qemu_log_mask(LOG_UNIMP, "%s: write to unimplemented " "register addr=0x%05x val=0x%08" PRIx64, __func__, index << 2, val); } macreg_writeops[index](s, index, val); } else {/* "flag needed" bit is set, but the flag is not active */ @@ -1291,8 +1293,9 @@ e1000_mmio_read(void *opaque, hwaddr addr, unsigned size) if (!(mac_reg_access[index] & MAC_ACCESS_FLAG_NEEDED) || (s->compat_flags & (mac_reg_access[index] >> 2))) { if (mac_reg_access[index] & MAC_ACCESS_PARTIAL) { -DBGOUT(GENERAL, "Reading register at offset: 0x%08x. " - "It is not fully implemented.\n", index<<2); +qemu_log_mask(LOG_UNIMP, "e1000: " +"Reading register at offset: 0x%08x. " +"It is not fully implemented.\n", index << 2); qemu_log_mask(LOG_UNIMP, "%s: read to unimplemented " "register addr=0x%05x" PRIx64, __func__, index << 2); } return macreg_readops[index](s, index); } else {/* "flag needed" bit is set, but the flag is not active */ Regard, Phil.
Re: [Qemu-devel] [Qemu-arm] [Qemu-devel PATCH 1/5] msf2: Add Smartfusion2 System timer
Hi Subbaraya, +if (value & TIMER_MODE) { +qemu_log_mask(LOG_UNIMP, "64-bit mode not supported\n"); No need of trailing '\n', be more specific, something like: qemu_log_mask(LOG_UNIMP, TYPE_MSF2_TIMER ": 64-bit mode not supported"); I mistaken with error_report(), qemu_log_mask() does use trailing newline, sorry! Phil.
Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd
On Mon, 05/15 13:26, Peter Lieven wrote: > Am 15.05.2017 um 12:50 schrieb Fam Zheng: > > On Mon, 05/15 12:02, Peter Lieven wrote: > > > Hi Block developers, > > > > > > I would like to add a feature to Qemu to drain all traffic from a block > > > so that > > > I can take external snaphosts without the risk to that in the middle of a > > > write > > > operation. Its meant for cases where where QGA freeze/thaw is not > > > available. > > > > > > For me its enough to have this through qemu-io, but Kevin asked me to > > > check > > > if its not worth to have a stable API for it and present it via QMP/HMP. > > > > > > What are your thoughts? > > For debugging purpose or a "hacky" usage where you know what you are doing, > > it > > may be fine to have this. The only issue is it should be a separate flag, > > like > > BlockJob.user_paused. > > How can I add, remove such a flag? Like bs->user_drained. Set it in "drain" command, then increment bs->quiesce_counter if toggled; vise versa. > > > > > What happens from guest perspective? In the case of virtio, the request > > queue is > > not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still > > handled, > > the command is not effective (or rather the implementation is not complete). > > That it only works with virtio is fine. However, the thing it does not work > correctly > apply then also to all other users of the drained_begin/end functions, right? > As for the timeout I only plan to drain the device for about 1 second. It didn't matter because for IDE, the invariant (staying quiesced as long as necessary) is already ensured by BQL. Virtio is different because it supports ioeventfd and data plane. Fam
Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd
Am 15.05.2017 um 13:53 schrieb Fam Zheng: On Mon, 05/15 13:26, Peter Lieven wrote: Am 15.05.2017 um 12:50 schrieb Fam Zheng: On Mon, 05/15 12:02, Peter Lieven wrote: Hi Block developers, I would like to add a feature to Qemu to drain all traffic from a block so that I can take external snaphosts without the risk to that in the middle of a write operation. Its meant for cases where where QGA freeze/thaw is not available. For me its enough to have this through qemu-io, but Kevin asked me to check if its not worth to have a stable API for it and present it via QMP/HMP. What are your thoughts? For debugging purpose or a "hacky" usage where you know what you are doing, it may be fine to have this. The only issue is it should be a separate flag, like BlockJob.user_paused. How can I add, remove such a flag? Like bs->user_drained. Set it in "drain" command, then increment bs->quiesce_counter if toggled; vise versa. Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these functions the counter is incremented already. What happens from guest perspective? In the case of virtio, the request queue is not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still handled, the command is not effective (or rather the implementation is not complete). That it only works with virtio is fine. However, the thing it does not work correctly apply then also to all other users of the drained_begin/end functions, right? As for the timeout I only plan to drain the device for about 1 second. It didn't matter because for IDE, the invariant (staying quiesced as long as necessary) is already ensured by BQL. Virtio is different because it supports ioeventfd and data plane. Okay understood. So my use of bdrv_drained_begin/end is more an abuse of these functions? Do you have another idea how to achieve what I want? I was thinking of throttle the I/O to zero. It would be enough to do this for writes, reading doesn't hurt in my case. Peter
Re: [Qemu-devel] [PATCH 4/6] spapr: sanitize error handling in spapr_ics_create()
On 05/15/2017 01:39 PM, Greg Kurz wrote: > The spapr_ics_create() function handles errors in a rather convoluted > way, with two local Error * variables. Moreover, failing to parent the > ICS object to the machine should be considered as a bug but it is > currently ignored. I am not sure what should be done for object_property_add_child() errors but QEMU generally uses NULL for 'Error **'. It might be wrong though. As for the local error handling, it is following what is described in qapi/error.h. Isn't it ? Cheers, C. > This patch addresses both issues. > > Signed-off-by: Greg Kurz > --- > hw/ppc/spapr.c | 19 --- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 44f7dc7f40e9..c53989bb10b1 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -101,21 +101,26 @@ static ICSState *spapr_ics_create(sPAPRMachineState > *spapr, >const char *type_ics, >int nr_irqs, Error **errp) > { > -Error *err = NULL, *local_err = NULL; > +Error *local_err = NULL; > Object *obj; > > obj = object_new(type_ics); > -object_property_add_child(OBJECT(spapr), "ics", obj, NULL); > +object_property_add_child(OBJECT(spapr), "ics", obj, &error_abort); > object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort); > -object_property_set_int(obj, nr_irqs, "nr-irqs", &err); > +object_property_set_int(obj, nr_irqs, "nr-irqs", &local_err); > +if (local_err) { > +goto error; > +} > object_property_set_bool(obj, true, "realized", &local_err); > -error_propagate(&err, local_err); > -if (err) { > -error_propagate(errp, err); > -return NULL; > +if (local_err) { > +goto error; > } > > return ICS_SIMPLE(obj); > + > +error: > +error_propagate(errp, local_err); > +return NULL; > } > > static void xics_system_init(MachineState *machine, int nr_irqs, Error > **errp) >
Re: [Qemu-devel] [PATCH 5/6] spapr-cpu-core: release ICP object when realization fails
On 05/15/2017 01:39 PM, Greg Kurz wrote: > While here we introduce a single error path to avoid code duplication. > > Signed-off-by: Greg Kurz > --- > hw/ppc/spapr_cpu_core.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index 4389ef4c2aef..63d160f7e010 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -147,25 +147,25 @@ static void spapr_cpu_core_realize_child(Object *child, > Error **errp) > object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort); > object_property_set_bool(obj, true, "realized", &local_err); > if (local_err) { > -error_propagate(errp, local_err); > -return; > +goto error; the error: statement below adds an object_unparent(). is that ok ? C. > } > > object_property_set_bool(child, true, "realized", &local_err); > if (local_err) { > -object_unparent(obj); > -error_propagate(errp, local_err); > -return; > +goto error; > } > > spapr_cpu_init(spapr, cpu, &local_err); > if (local_err) { > -object_unparent(obj); > -error_propagate(errp, local_err); > -return; > +goto error; > } > > xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj)); > +return; > + > +error: > +object_unparent(obj); > +error_propagate(errp, local_err); > } > > static void spapr_cpu_core_realize(DeviceState *dev, Error **errp) >
Re: [Qemu-devel] [PATCH 4/6] spapr: sanitize error handling in spapr_ics_create()
On Mon, 15 May 2017 13:59:33 +0200 Cédric Le Goater wrote: > On 05/15/2017 01:39 PM, Greg Kurz wrote: > > The spapr_ics_create() function handles errors in a rather convoluted > > way, with two local Error * variables. Moreover, failing to parent the > > ICS object to the machine should be considered as a bug but it is > > currently ignored. > > I am not sure what should be done for object_property_add_child() > errors but QEMU generally uses NULL for 'Error **'. It might be > wrong though. > > As for the local error handling, it is following what is described in > qapi/error.h. Isn't it ? > Yes, it does follow the "Receive and accumulate multiple errors" recommandation, but does it make sense to realize the ICS object if we failed to set nr-irqs ? > Cheers, > > C. > > > > This patch addresses both issues. > > > > Signed-off-by: Greg Kurz > > --- > > hw/ppc/spapr.c | 19 --- > > 1 file changed, 12 insertions(+), 7 deletions(-) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index 44f7dc7f40e9..c53989bb10b1 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -101,21 +101,26 @@ static ICSState *spapr_ics_create(sPAPRMachineState > > *spapr, > >const char *type_ics, > >int nr_irqs, Error **errp) > > { > > -Error *err = NULL, *local_err = NULL; > > +Error *local_err = NULL; > > Object *obj; > > > > obj = object_new(type_ics); > > -object_property_add_child(OBJECT(spapr), "ics", obj, NULL); > > +object_property_add_child(OBJECT(spapr), "ics", obj, &error_abort); > > object_property_add_const_link(obj, "xics", OBJECT(spapr), > > &error_abort); > > -object_property_set_int(obj, nr_irqs, "nr-irqs", &err); > > +object_property_set_int(obj, nr_irqs, "nr-irqs", &local_err); > > +if (local_err) { > > +goto error; > > +} > > object_property_set_bool(obj, true, "realized", &local_err); > > -error_propagate(&err, local_err); > > -if (err) { > > -error_propagate(errp, err); > > -return NULL; > > +if (local_err) { > > +goto error; > > } > > > > return ICS_SIMPLE(obj); > > + > > +error: > > +error_propagate(errp, local_err); > > +return NULL; > > } > > > > static void xics_system_init(MachineState *machine, int nr_irqs, Error > > **errp) > > > pgpArqsYEeP80.pgp Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 1/6] ppc/xics: simplify prototype of xics_spapr_init()
On 05/15/2017 01:39 PM, Greg Kurz wrote: > This function only does hypercall and RTAS-call registration, and thus > never returns an error. This patch adapt the prototype to reflect that. > > Signed-off-by: Greg Kurz Reviewed-by: Cédric Le Goater > --- > hw/intc/xics_spapr.c |3 +-- > hw/ppc/spapr.c|2 +- > include/hw/ppc/xics.h |2 +- > 3 files changed, 3 insertions(+), 4 deletions(-) > > diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c > index f05308b897f2..d98ea8b13068 100644 > --- a/hw/intc/xics_spapr.c > +++ b/hw/intc/xics_spapr.c > @@ -229,7 +229,7 @@ static void rtas_int_on(PowerPCCPU *cpu, > sPAPRMachineState *spapr, > rtas_st(rets, 0, RTAS_OUT_SUCCESS); > } > > -int xics_spapr_init(sPAPRMachineState *spapr, Error **errp) > +void xics_spapr_init(sPAPRMachineState *spapr) > { > /* Registration of global state belongs into realize */ > spapr_rtas_register(RTAS_IBM_SET_XIVE, "ibm,set-xive", rtas_set_xive); > @@ -243,7 +243,6 @@ int xics_spapr_init(sPAPRMachineState *spapr, Error > **errp) > spapr_register_hypercall(H_XIRR_X, h_xirr_x); > spapr_register_hypercall(H_EOI, h_eoi); > spapr_register_hypercall(H_IPOLL, h_ipoll); > -return 0; > } > > #define ICS_IRQ_FREE(ics, srcno) \ > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 1b7cadab0cdf..abfb99b71b7d 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -139,7 +139,7 @@ static void xics_system_init(MachineState *machine, int > nr_irqs, Error **errp) > } > > if (!spapr->ics) { > -xics_spapr_init(spapr, errp); > +xics_spapr_init(spapr); > spapr->icp_type = TYPE_ICP; > spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, errp); > } > diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h > index 05e6acbb3533..d6cb51f3ad5d 100644 > --- a/include/hw/ppc/xics.h > +++ b/include/hw/ppc/xics.h > @@ -206,6 +206,6 @@ void icp_resend(ICPState *ss); > typedef struct sPAPRMachineState sPAPRMachineState; > > int xics_kvm_init(sPAPRMachineState *spapr, Error **errp); > -int xics_spapr_init(sPAPRMachineState *spapr, Error **errp); > +void xics_spapr_init(sPAPRMachineState *spapr); > > #endif /* XICS_H */ >
Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
On 14/05/17 11:12, Liu, Yi L wrote: > On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote: >> Hi Yi, >> >> On 26/04/17 11:12, Liu, Yi L wrote: >>> From: "Liu, Yi L" >>> >>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB >>> invalidate request from guest to host. >>> >>> In the case of SVM virtualization on VT-d, host IOMMU driver has >>> no knowledge of caching structure updates unless the guest >>> invalidation activities are passed down to the host. So a new >>> IOCTL is needed to propagate the guest cache invalidation through >>> VFIO. >>> >>> Signed-off-by: Liu, Yi L >>> --- >>> include/uapi/linux/vfio.h | 9 + >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >>> index 6b97987..50c51f8 100644 >>> --- a/include/uapi/linux/vfio.h >>> +++ b/include/uapi/linux/vfio.h >>> @@ -564,6 +564,15 @@ struct vfio_device_svm { >>> >>> #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) >>> >>> +/* For IOMMU TLB Invalidation Propagation */ >>> +struct vfio_iommu_tlb_invalidate { >>> + __u32 argsz; >>> + __u32 length; >>> + __u8data[]; >>> +}; >> >> We initially discussed something a little more generic than this, with >> most info explicitly described and only pIOMMU-specific quirks and hints >> in an opaque structure. Out of curiosity, why the change? I'm not against >> a fully opaque structure, but there seem to be a large overlap between TLB >> invalidations across architectures. > > Hi Jean, > > As my cover letter mentioned, it is an open on the iommu tlb invalidate > propagation. Paste it here since it's in the cover letter for Qemu part > changes. Pls refer to the [Open] session in the following link. > > http://www.spinics.net/lists/kvm/msg148798.html > > I want to see if community wants to have opaque structure or not > on iommu tlb invalidate propagation. Personally, I incline to use > opaque structure. But it's better to gather the comments before > deciding it. To assist the discussion, I put the full opaque structure > here. Once community gets consensus on using opaque structure for > iommu tlb invalidate propagation, I'm glad to work with you on a > structure with partial opaque since there seems to be overlap across > arch. I see, thanks for the pointer. I'm not fan of using the pIOMMU format in invalidations, but I understand the appeal in your case, where you can shave off the overhead of parsing/re-assembling the pIOMMU format. It's not suitable for generic io-pgtable, where different pIOMMUs may offer the same page table format but different invalidation methods. I guess I could make it work by negotiating invalidation method at bind time, falling back to the generic format for virtio-iommu. We already have to do some related negotiation for SVM on SMMU, where in embedded implementations the guest doesn't need to send invalidation requests via IOMMU queue, but can do it using CPU instructions instead. >> For what it's worth, when prototyping the paravirtualized IOMMU I came up >> with the following. >> >> (From the paravirtualized POV, the SMMU also has to swizzle endianess >> after unpacking an opaque structure, since userspace doesn't know what's >> in it and guest might use a different endianess. So we need to force all >> opaque data to be e.g. little-endian.) >> >> struct vfio_iommu_tlb_invalidate { >> __u32 argsz; >> __u32 scope; >> __u32 flags; >> __u32 pasid; >> __u64 vaddr; >> __u64 size; >> __u8data[]; >> }; >> >> Scope is a bitfields restricting the invalidation scope. By default >> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr >> and @size are unused. >> >> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation >> scope to the pasid described by @pasid. >> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation >> scope to the address range described by (@vaddr, @size). >> >> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA >> range for *all* pasids (as well as no_pasid). Setting scope = >> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate >> the VA range only for @pasid. >> > > Besides VA range flusing, there is PASID Cache flushing on VT-d. How about > SMMU? So I think besides the two scope you defined, may need one more to > indicate if it's PASID Cache flushing. Yes, invalidating all TLB entries associated to a PASID would be done by setting scope = VFIO_IOMMU_INVALIDATE_PASID. In which case field @pasid is valid, and fields (@vaddr, @size) aren't used. Thanks, Jean >> Flags depend on the selected scope: >> >> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either >> without scope or with INVALIDATE_VADDR) targets non-pasid mappings >> exclusively (some architectures, e.g. SMMU, allow this) >> >> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't nee
Re: [Qemu-devel] [PATCH] block migration: Allow compile time disable
* Eric Blake (ebl...@redhat.com) wrote: > On 05/03/2017 05:42 AM, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" > > > > Many users now prefer to use drive_mirror over NBD as an > > alternative to the older migrate -b option; drive_mirror is > > more complex to setup but gives you more options (e.g. only > > migrating some of the disks if some of them are shared). > > > > Allow the large chunk of block migration code to be compiled > > out for those who don't use it. > > > > Based on a downstream-patch we've had for a while by Jeff Cody. > > > > Signed-off-by: Dr. David Alan Gilbert > > --- > > @@ -1222,6 +1226,14 @@ void qmp_migrate(const char *uri, bool has_blk, bool > > blk, > > params.blk = has_blk && blk; > > params.shared = has_inc && inc; > > > > +#ifndef CONFIG_LIVE_BLOCK_MIGRATION > > +if (params.blk || params.shared) { > > +error_setg(errp, "QEMU compiled without old-style block migration. > > " > > + "Use drive_mirror+NBD."); > > error_setg() should not be used with '.' (it should be a single phrase, > here you are trying to stuff in two sentences). error_append_hint() can > be used to supply the advice about using drive_mirror+NBD as the > alternative. > > Otherwise this looks reasonable to me. Done as: error_setg(errp, "QEMU compiled without old-style block migration"); error_append_hint(errp, "Use drive_mirror+NBD.\n"); (All the places I can see seem to have . and \n in append_hint) Dave > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3266 > Virtualization: qemu.org | libvirt.org > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 0/6] spapr/xics: fix migration of older machine types
Hi, This series seems to have some coding style problems. See output below for more information: Subject: [Qemu-devel] [PATCH 0/6] spapr/xics: fix migration of older machine types Message-id: 149484833874.20089.4164801378197848306.st...@bahia.lan Type: series === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 git config --local diff.renamelimit 0 git config --local diff.renames True commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/149484833874.20089.4164801378197848306.st...@bahia.lan -> patchew/149484833874.20089.4164801378197848306.st...@bahia.lan Switched to a new branch 'test' 3c3824a spapr: fix migration of ICP objects from/to older QEMU 280164f spapr-cpu-core: release ICP object when realization fails 5bc72eb spapr: sanitize error handling in spapr_ics_create() fa15d00 spapr: fix error reporting in xics_system_init() ecbe234 spapr: fix error path of required kernel-irqchip b7cf73d ppc/xics: simplify prototype of xics_spapr_init() === OUTPUT BEGIN === Checking PATCH 1/6: ppc/xics: simplify prototype of xics_spapr_init()... Checking PATCH 2/6: spapr: fix error path of required kernel-irqchip... Checking PATCH 3/6: spapr: fix error reporting in xics_system_init()... Checking PATCH 4/6: spapr: sanitize error handling in spapr_ics_create()... Checking PATCH 5/6: spapr-cpu-core: release ICP object when realization fails... Checking PATCH 6/6: spapr: fix migration of ICP objects from/to older QEMU... ERROR: "foo* bar" should be "foo *bar" #50: FILE: hw/ppc/spapr.c:167: +void* obj = &spapr->legacy_icps[i]; total: 1 errors, 0 warnings, 122 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-de...@freelists.org
Re: [Qemu-devel] [PATCH 2/4] exec: rename resolve_subpage
On Mon, May 15, 2017 at 11:03:49AM +0200, Paolo Bonzini wrote: > > > On 15/05/2017 10:50, Peter Xu wrote: > > It is not easy for people to know "what this parameter does" before > > knowing "what is subpage". Let's use "is_mmio" to make it easier to > > understand. > > > > Signed-off-by: Peter Xu > > Maybe invert the direction and call it for_iotlb? I wasn't aware that tcg is using resolve_subpage==false always. Then at least is_mmio does not suite here. I am not sure whether for-iotlb suites here either... So maybe I will just drop this patch as well, just like patch 3. Thanks for reviewing it! -- Peter Xu
Re: [Qemu-devel] [PATCH 5/6] spapr-cpu-core: release ICP object when realization fails
On Mon, 15 May 2017 14:02:34 +0200 Cédric Le Goater wrote: > On 05/15/2017 01:39 PM, Greg Kurz wrote: > > While here we introduce a single error path to avoid code duplication. > > > > Signed-off-by: Greg Kurz > > --- > > hw/ppc/spapr_cpu_core.c | 16 > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > > index 4389ef4c2aef..63d160f7e010 100644 > > --- a/hw/ppc/spapr_cpu_core.c > > +++ b/hw/ppc/spapr_cpu_core.c > > @@ -147,25 +147,25 @@ static void spapr_cpu_core_realize_child(Object > > *child, Error **errp) > > object_property_add_const_link(obj, "xics", OBJECT(spapr), > > &error_abort); > > object_property_set_bool(obj, true, "realized", &local_err); > > if (local_err) { > > -error_propagate(errp, local_err); > > -return; > > +goto error; > > the error: statement below adds an object_unparent(). is that ok ? > Yes, this is the purpose of the patch actually. The ICP object gets parented to the CPU a few lines above, so I guess we should unparent in this error path as well. Maybe I should rename the patch to "unparent ICP object when realization fails" ? > C. > > > } > > > > object_property_set_bool(child, true, "realized", &local_err); > > if (local_err) { > > -object_unparent(obj); > > -error_propagate(errp, local_err); > > -return; > > +goto error; > > } > > > > spapr_cpu_init(spapr, cpu, &local_err); > > if (local_err) { > > -object_unparent(obj); > > -error_propagate(errp, local_err); > > -return; > > +goto error; > > } > > > > xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj)); > > +return; > > + > > +error: > > +object_unparent(obj); > > +error_propagate(errp, local_err); > > } > > > > static void spapr_cpu_core_realize(DeviceState *dev, Error **errp) > > > pgpeRLCOV4XNH.pgp Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 0/6] spapr/xics: fix migration of older machine types
On Mon, 15 May 2017 05:13:34 -0700 (PDT) no-re...@patchew.org wrote: > Hi, > > This series seems to have some coding style problems. See output below for > more information: > > Subject: [Qemu-devel] [PATCH 0/6] spapr/xics: fix migration of older machine > types > Message-id: 149484833874.20089.4164801378197848306.st...@bahia.lan > Type: series > > === TEST SCRIPT BEGIN === > #!/bin/bash > > BASE=base > n=1 > total=$(git log --oneline $BASE.. | wc -l) > failed=0 > > git config --local diff.renamelimit 0 > git config --local diff.renames True > > commits="$(git log --format=%H --reverse $BASE..)" > for c in $commits; do > echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." > if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; > then > failed=1 > echo > fi > n=$((n+1)) > done > > exit $failed > === TEST SCRIPT END === > > Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 > From https://github.com/patchew-project/qemu > - [tag update] > patchew/149484833874.20089.4164801378197848306.st...@bahia.lan -> > patchew/149484833874.20089.4164801378197848306.st...@bahia.lan > Switched to a new branch 'test' > 3c3824a spapr: fix migration of ICP objects from/to older QEMU > 280164f spapr-cpu-core: release ICP object when realization fails > 5bc72eb spapr: sanitize error handling in spapr_ics_create() > fa15d00 spapr: fix error reporting in xics_system_init() > ecbe234 spapr: fix error path of required kernel-irqchip > b7cf73d ppc/xics: simplify prototype of xics_spapr_init() > > === OUTPUT BEGIN === > Checking PATCH 1/6: ppc/xics: simplify prototype of xics_spapr_init()... > Checking PATCH 2/6: spapr: fix error path of required kernel-irqchip... > Checking PATCH 3/6: spapr: fix error reporting in xics_system_init()... > Checking PATCH 4/6: spapr: sanitize error handling in spapr_ics_create()... > Checking PATCH 5/6: spapr-cpu-core: release ICP object when realization > fails... > Checking PATCH 6/6: spapr: fix migration of ICP objects from/to older QEMU... > ERROR: "foo* bar" should be "foo *bar" > #50: FILE: hw/ppc/spapr.c:167: > +void* obj = &spapr->legacy_icps[i]; > Oops, I'll fix that. > total: 1 errors, 0 warnings, 122 lines checked > > Your patch has style problems, please review. If any of these errors > are false positives report them to the maintainer, see > CHECKPATCH in MAINTAINERS. > > === OUTPUT END === > > Test command exited with code: 1 > > > --- > Email generated automatically by Patchew [http://patchew.org/]. > Please send your feedback to patchew-de...@freelists.org pgpRuKMbEsumW.pgp Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 6/6] spapr: fix migration of ICP objects from/to older QEMU
On 05/15/2017 01:40 PM, Greg Kurz wrote: > Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under > sPAPRCPUCore") moved ICP objects from the machine to CPU cores. This > is an improvement since we no longer allocate ICP objects that will > never be used. But it has the side-effect of breaking migration of > older machine types from older QEMU versions. > > This patch introduces a compat flag in the sPAPR machine class so > that all pseries machine up to 2.9 go on with the previous behavior > of pre-allocating ICP objects. I think this is a quite elegant way to a handle the migration regression. Thanks for taking care of it. Have you tried to simply reparent the ICPs objects to OBJECT(spapr) instead of the OBJECT(cpu) ? See some minor comments below. > While here, we also ensure that object_property_add_child() errors cause > QEMU to abort for newer machines. > > Signed-off-by: Greg Kurz > --- > hw/ppc/spapr.c | 36 > hw/ppc/spapr_cpu_core.c | 28 > include/hw/ppc/spapr.h |2 ++ > 3 files changed, 58 insertions(+), 8 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index c53989bb10b1..ab3683bcd677 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -126,6 +126,7 @@ error: > static void xics_system_init(MachineState *machine, int nr_irqs, Error > **errp) > { > sPAPRMachineState *spapr = SPAPR_MACHINE(machine); > +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); > Error *local_err = NULL; > > if (kvm_enabled()) { > @@ -151,6 +152,38 @@ static void xics_system_init(MachineState *machine, int > nr_irqs, Error **errp) >&local_err); > } > > +if (!spapr->ics) { > +goto out; > +} > + > +if (smc->must_pre_allocate_icps) { I am not sure I like 'must', I think 'pre_allocate_icps' should be enough ? or simply 'allocate_legacy_icps' ? > +int smt = kvmppc_smt_threads(); > +int nr_servers = DIV_ROUND_UP(max_cpus * smt, smp_threads); may be we should reintroduce nr_servers at the machine level ? > +int i; > + > +spapr->legacy_icps = g_malloc0(nr_servers * sizeof(ICPState)); > + > +for (i = 0; i < nr_servers; i++) { > +void* obj = &spapr->legacy_icps[i]; 'void *' > + > +object_initialize(obj, sizeof(ICPState), spapr->icp_type); > +object_property_add_child(OBJECT(spapr), "icp[*]", obj, > + &error_abort); David does not like the "icp[*]" syntax. > +object_unref(obj); > +object_property_add_const_link(obj, "xics", OBJECT(spapr), > + &error_abort); > +object_property_set_bool(obj, true, "realized", &local_err); > +if (local_err) { > +while (i--) { > +object_unparent(obj); > +} > +g_free(spapr->legacy_icps); > +break; > +} > +} > +} > + > +out: > error_propagate(errp, local_err); > } > > @@ -3256,8 +3289,11 @@ static void > spapr_machine_2_9_instance_options(MachineState *machine) > > static void spapr_machine_2_9_class_options(MachineClass *mc) > { > +sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc); > + > spapr_machine_2_10_class_options(mc); > SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9); > +smc->must_pre_allocate_icps = true; > } > > DEFINE_SPAPR_MACHINE(2_9, "2.9", false); > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index 63d160f7e010..5476647efa06 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -119,6 +119,7 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, > Error **errp) > size_t size = object_type_get_instance_size(typename); > CPUCore *cc = CPU_CORE(dev); > int i; > +sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); > > for (i = 0; i < cc->nr_threads; i++) { > void *obj = sc->threads + i * size; > @@ -127,7 +128,9 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, > Error **errp) > PowerPCCPU *cpu = POWERPC_CPU(cs); > > spapr_cpu_destroy(cpu); > -object_unparent(cpu->intc); > +if (!spapr->legacy_icps) { > +object_unparent(cpu->intc); > +} > cpu_remove_sync(cs); > object_unparent(obj); > } > @@ -142,12 +145,19 @@ static void spapr_cpu_core_realize_child(Object *child, > Error **errp) > PowerPCCPU *cpu = POWERPC_CPU(cs); > Object *obj; > > -obj = object_new(spapr->icp_type); > -object_property_add_child(OBJECT(cpu), "icp", obj, NULL); > -object_property_add_const_link(obj, "xics", OBJECT(spapr), &error_abort); > -object_property_set_bool(obj, true, "realized", &local_err); > -if (local_err) { > -goto error; > +
Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd
On Mon, 05/15 13:58, Peter Lieven wrote: > Am 15.05.2017 um 13:53 schrieb Fam Zheng: > > On Mon, 05/15 13:26, Peter Lieven wrote: > > > Am 15.05.2017 um 12:50 schrieb Fam Zheng: > > > > On Mon, 05/15 12:02, Peter Lieven wrote: > > > > > Hi Block developers, > > > > > > > > > > I would like to add a feature to Qemu to drain all traffic from a > > > > > block so that > > > > > I can take external snaphosts without the risk to that in the middle > > > > > of a write > > > > > operation. Its meant for cases where where QGA freeze/thaw is not > > > > > available. > > > > > > > > > > For me its enough to have this through qemu-io, but Kevin asked me to > > > > > check > > > > > if its not worth to have a stable API for it and present it via > > > > > QMP/HMP. > > > > > > > > > > What are your thoughts? > > > > For debugging purpose or a "hacky" usage where you know what you are > > > > doing, it > > > > may be fine to have this. The only issue is it should be a separate > > > > flag, like > > > > BlockJob.user_paused. > > > How can I add, remove such a flag? > > Like bs->user_drained. Set it in "drain" command, then increment > > bs->quiesce_counter if toggled; vise versa. > > Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these functions > the counter is incremented already. You're right, calling bdrv_drained_begin() is better. > > > > > > > > > What happens from guest perspective? In the case of virtio, the request > > > > queue is > > > > not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still > > > > handled, > > > > the command is not effective (or rather the implementation is not > > > > complete). > > > That it only works with virtio is fine. However, the thing it does not > > > work correctly > > > apply then also to all other users of the drained_begin/end functions, > > > right? > > > As for the timeout I only plan to drain the device for about 1 second. > > It didn't matter because for IDE, the invariant (staying quiesced as long as > > necessary) is already ensured by BQL. Virtio is different because it > > supports > > ioeventfd and data plane. > > Okay understood. So my use of bdrv_drained_begin/end is more an abuse of > these functions? Sort of. But it's not unreasonable to "extend" bdrv_drained_begin/end to cover IDE, I just haven't thought about "how". > > Do you have another idea how to achieve what I want? I was thinking of > throttle > the I/O to zero. It would be enough to do this for writes, reading doesn't > hurt in > my case. Maybe add a block filter on top of the drained node, drain it when doing so, then queue all further requests with a CoQueue until "undrain". (It is then not quite to "drain" but to "halt" or "pause", though.) Fam
Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd
Am 15.05.2017 um 14:28 schrieb Fam Zheng: On Mon, 05/15 13:58, Peter Lieven wrote: Am 15.05.2017 um 13:53 schrieb Fam Zheng: On Mon, 05/15 13:26, Peter Lieven wrote: Am 15.05.2017 um 12:50 schrieb Fam Zheng: On Mon, 05/15 12:02, Peter Lieven wrote: Hi Block developers, I would like to add a feature to Qemu to drain all traffic from a block so that I can take external snaphosts without the risk to that in the middle of a write operation. Its meant for cases where where QGA freeze/thaw is not available. For me its enough to have this through qemu-io, but Kevin asked me to check if its not worth to have a stable API for it and present it via QMP/HMP. What are your thoughts? For debugging purpose or a "hacky" usage where you know what you are doing, it may be fine to have this. The only issue is it should be a separate flag, like BlockJob.user_paused. How can I add, remove such a flag? Like bs->user_drained. Set it in "drain" command, then increment bs->quiesce_counter if toggled; vise versa. Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these functions the counter is incremented already. You're right, calling bdrv_drained_begin() is better. What happens from guest perspective? In the case of virtio, the request queue is not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still handled, the command is not effective (or rather the implementation is not complete). That it only works with virtio is fine. However, the thing it does not work correctly apply then also to all other users of the drained_begin/end functions, right? As for the timeout I only plan to drain the device for about 1 second. It didn't matter because for IDE, the invariant (staying quiesced as long as necessary) is already ensured by BQL. Virtio is different because it supports ioeventfd and data plane. Okay understood. So my use of bdrv_drained_begin/end is more an abuse of these functions? Sort of. But it's not unreasonable to "extend" bdrv_drained_begin/end to cover IDE, I just haven't thought about "how". Do you have another idea how to achieve what I want? I was thinking of throttle the I/O to zero. It would be enough to do this for writes, reading doesn't hurt in my case. Maybe add a block filter on top of the drained node, drain it when doing so, then queue all further requests with a CoQueue until "undrain". (It is then not quite to "drain" but to "halt" or "pause", though.) To get the drain for free was why I was looking at this approach. If I read correctly if I keep using bdrv_drained_begin/end its too hacky to implement it in QMP? If yes, would support adding it to qemu-io? Thanks, Peter
Re: [Qemu-devel] [PATCH] stream: fix crash in stream_start() when block_job_create() fails
On Mon, May 15, 2017 at 12:34:24PM +0300, Alberto Garcia wrote: > The code that tries to reopen a BlockDriverState in stream_start() > when the creation of a new block job fails crashes because it attempts > to dereference a pointer that is known to be NULL. > > This is a regression introduced in a170a91fd3eab6155da39e740381867e, > likely because the code was copied from stream_complete(). > > Signed-off-by: Alberto Garcia > --- > block/stream.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/stream.c b/block/stream.c > index 0113710845..52d329f5c6 100644 > --- a/block/stream.c > +++ b/block/stream.c > @@ -280,6 +280,6 @@ void stream_start(const char *job_id, BlockDriverState > *bs, > > fail: > if (orig_bs_flags != bdrv_get_flags(bs)) { > -bdrv_reopen(bs, s->bs_flags, NULL); > +bdrv_reopen(bs, orig_bs_flags, NULL); Tested-by: Kashyap Chamarthy Yep, this fixes it. It gracefully throws an error when a 'job-id' was not specified: --- (QEMU) block-stream device=#block890 base=disk1.qcow2 { "execute": "block-stream", "arguments": { "device": "#block890", "base": "disk1.qcow2" } } { "error": { "class": "GenericError", "desc": "An explicit job ID is required for this node" } } (QEMU) block-stream device=#block890 base=disk1.qcow2 job-id=job0 { "execute": "block-stream", "arguments": { "device": "#block890", "job-id": "job0", "base": "disk1.qcow2" } } { "return": {} } (QEMU) --- [...] -- /kashyap
[Qemu-devel] [PATCH] qemu-iotests: Test streaming with missing job ID
This adds a small test for the image streaming error path for failing block_job_create(), which would have found the null pointer dereference in commit a170a91f. Signed-off-by: Kevin Wolf --- tests/qemu-iotests/030 | 4 tests/qemu-iotests/030.out | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030 index e00c11b..feee861 100755 --- a/tests/qemu-iotests/030 +++ b/tests/qemu-iotests/030 @@ -147,6 +147,10 @@ class TestSingleDrive(iotests.QMPTestCase): result = self.vm.qmp('block-stream', device='nonexistent') self.assert_qmp(result, 'error/class', 'GenericError') +def test_job_id_missing(self): +result = self.vm.qmp('block-stream', device='mid') +self.assert_qmp(result, 'error/class', 'GenericError') + class TestParallelOps(iotests.QMPTestCase): num_ops = 4 # Number of parallel block-stream operations diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out index 84bfd63..391c857 100644 --- a/tests/qemu-iotests/030.out +++ b/tests/qemu-iotests/030.out @@ -1,5 +1,5 @@ -.. +... -- -Ran 22 tests +Ran 23 tests OK -- 1.8.3.1
Re: [Qemu-devel] [PATCH 05/12] migration: Move colo.h to migration/
On 2017/5/15 19:04, Juan Quintela wrote: "Dr. David Alan Gilbert" wrote: D> * Juan Quintela (quint...@redhat.com) wrote: There are functions only used by migration code. That's only mostly true; see the current 'integrate colo frame with block replication and net compare' series (posted 22nd April). That adds colo_handle_shutdown to this header and calls it from vl.c ( https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03901.html ) where should that go? Dropped. It compiled, but who knows O:-) There's also a net/colo.h as well, so using the #include "colo.h" in migration is correct but that's really scary when there are two files of the same name. Ha, I'd like to suggest it be renamed as colo-proxy.h (colo-net.h ? ) for net/colo.h, which is saner, hmm, together with colo.c, colo-proxy.c ? colo-net.c ? Or any other proper name ? ( I didn't notice that until it went into upstream. O:-) ) Cc: Zhang Chen Yeap, it is scary ...
Re: [Qemu-devel] [PATCH] qemu-iotests: Test streaming with missing job ID
On Mon 15 May 2017 02:39:40 PM CEST, Kevin Wolf wrote: > This adds a small test for the image streaming error path for failing > block_job_create(), which would have found the null pointer dereference > in commit a170a91f. > > Signed-off-by: Kevin Wolf > --- > tests/qemu-iotests/030 | 4 > tests/qemu-iotests/030.out | 4 ++-- > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030 > index e00c11b..feee861 100755 > --- a/tests/qemu-iotests/030 > +++ b/tests/qemu-iotests/030 > @@ -147,6 +147,10 @@ class TestSingleDrive(iotests.QMPTestCase): > result = self.vm.qmp('block-stream', device='nonexistent') > self.assert_qmp(result, 'error/class', 'GenericError') > > +def test_job_id_missing(self): > +result = self.vm.qmp('block-stream', device='mid') > +self.assert_qmp(result, 'error/class', 'GenericError') Mmm... but does that trigger the bug? The bug happens if the user tries to create a stream job on a BDS that: a) exists b) is not the active node (i.e. needs reopening in read-write mode) c) its node name is not valid for a block job (e.g. it contains a '#') Berto
Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd
On Mon, 05/15 14:32, Peter Lieven wrote: > Am 15.05.2017 um 14:28 schrieb Fam Zheng: > > On Mon, 05/15 13:58, Peter Lieven wrote: > > > Am 15.05.2017 um 13:53 schrieb Fam Zheng: > > > > On Mon, 05/15 13:26, Peter Lieven wrote: > > > > > Am 15.05.2017 um 12:50 schrieb Fam Zheng: > > > > > > On Mon, 05/15 12:02, Peter Lieven wrote: > > > > > > > Hi Block developers, > > > > > > > > > > > > > > I would like to add a feature to Qemu to drain all traffic from a > > > > > > > block so that > > > > > > > I can take external snaphosts without the risk to that in the > > > > > > > middle of a write > > > > > > > operation. Its meant for cases where where QGA freeze/thaw is not > > > > > > > available. > > > > > > > > > > > > > > For me its enough to have this through qemu-io, but Kevin asked > > > > > > > me to check > > > > > > > if its not worth to have a stable API for it and present it via > > > > > > > QMP/HMP. > > > > > > > > > > > > > > What are your thoughts? > > > > > > For debugging purpose or a "hacky" usage where you know what you > > > > > > are doing, it > > > > > > may be fine to have this. The only issue is it should be a separate > > > > > > flag, like > > > > > > BlockJob.user_paused. > > > > > How can I add, remove such a flag? > > > > Like bs->user_drained. Set it in "drain" command, then increment > > > > bs->quiesce_counter if toggled; vise versa. > > > Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these > > > functions > > > the counter is incremented already. > > You're right, calling bdrv_drained_begin() is better. > > > > > > > > > > > > > > What happens from guest perspective? In the case of virtio, the > > > > > > request queue is > > > > > > not handled and -ETIMEDOUT may happen. With IDE, I/O commands are > > > > > > still handled, > > > > > > the command is not effective (or rather the implementation is not > > > > > > complete). > > > > > That it only works with virtio is fine. However, the thing it does > > > > > not work correctly > > > > > apply then also to all other users of the drained_begin/end > > > > > functions, right? > > > > > As for the timeout I only plan to drain the device for about 1 second. > > > > It didn't matter because for IDE, the invariant (staying quiesced as > > > > long as > > > > necessary) is already ensured by BQL. Virtio is different because it > > > > supports > > > > ioeventfd and data plane. > > > Okay understood. So my use of bdrv_drained_begin/end is more an abuse of > > > these functions? > > Sort of. But it's not unreasonable to "extend" bdrv_drained_begin/end to > > cover > > IDE, I just haven't thought about "how". > > > > > Do you have another idea how to achieve what I want? I was thinking of > > > throttle > > > the I/O to zero. It would be enough to do this for writes, reading > > > doesn't hurt in > > > my case. > > Maybe add a block filter on top of the drained node, drain it when doing so, > > then queue all further requests with a CoQueue until "undrain". (It is > > then not > > quite to "drain" but to "halt" or "pause", though.) > > To get the drain for free was why I was looking at this approach. If I read > correctly > if I keep using bdrv_drained_begin/end its too hacky to implement it in QMP? I think so. > If yes, would support adding it to qemu-io? I'm under the impression that you are looking to a real use case, I don't think I like the idea. Also, accessing the image from other processes while QEMU is using it is strongly discouraged, and there is the coming image locking mechanism to prevent this from happening. Why is the blockdev-snapshot command not enough? Fam
Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/2] Enhance block status when crossing EOF
On Thu, May 04, 2017 at 09:14:58PM -0500, Eric Blake wrote: > As mentioned in my 'qcow2 zero-cluster tweaks' cover letter > https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00722.html > I hit a case where we were not fully optimizing a write-zeroes > operation when we have a backing file that is shorter than the > active layer, and where the request crosses that boundary. I don't > know that the situation will be all that common (most qcow2 files > happen to be cluster-aligned in size; and most backing chains use > the same length at each layer of the chain), but for RFC purposes, > I wanted to explore how easy it would be to optimize the case > anyways, and to see if we can think of any other cases where > knowing that a block status crossed an end-of-file boundary can > be exploited for less work. > > Thus, this is a followup to that series, but I'm also okay if we > think it is too much maintenance compared to the potential gains, > and decide to drop it after all. > > Another potential use that I thought of: we may someday want to > track BDS length by bytes. Right now, bdrv_getlength() returns a > value in bytes, but which is sector-aligned, by adding padding; > but raw-format.c is able to see the transition between data up to > EOF followed by a hole after EOF. Because of that, my series to > convert bdrv_get_block_status() into a byte-based interface has to > fudge things - any time we see a mid-sector hole being reported, > we know it is because we hit EOF, and can round the result up to > the next sector boundary. Having the BDRV_BLOCK_EOF flag set in > that case may alert clients that the rounding took place, or help > us when we quit doing the rounding and actually start tracking BDS > lengths by bytes even where it is not sector-aligned. > > Eric Blake (2): > block: Add BDRV_BLOCK_EOF to bdrv_get_block_status() > block: Exploit BDRV_BLOCK_EOF for larger zero blocks > > include/block/block.h | 2 ++ > block/io.c | 42 +- > tests/qemu-iotests/154 | 4 > tests/qemu-iotests/154.out | 12 ++-- > 4 files changed, 41 insertions(+), 19 deletions(-) > > -- > 2.9.3 > > Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [Qemu-devel] [Qemu-block] [RFC PATCH 2/2] block: Exploit BDRV_BLOCK_EOF for larger zero blocks
On Thu, May 04, 2017 at 09:15:00PM -0500, Eric Blake wrote: > When we have a BSD with unallocated clusters, but asking the status s/BSD/BDS/ signature.asc Description: PGP signature
Re: [Qemu-devel] [PULL 0/9] pci, virtio, vhost: fixes
On Wed, May 10, 2017 at 10:07:58PM +0300, Michael S. Tsirkin wrote: > The following changes since commit 76d20ea0f1b26ebd5da2f5fb2fdf3250cde887bb: > > Merge remote-tracking branch 'armbru/tags/pull-qapi-2017-05-04-v3' into > staging (2017-05-09 15:49:14 -0400) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream > > for you to fetch changes up to 8b12e48950a3d59188489b2ff6c5ad9cc09e9866: > > acpi-defs: clean up open brace usage (2017-05-10 22:04:23 +0300) > > > pci, virtio, vhost: fixes > > A bunch of fixes that missed the release. > > Signed-off-by: Michael S. Tsirkin Please resend with the checkpatch.pl issue fixed. signature.asc Description: PGP signature
Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd
Am 15.05.2017 um 14:52 schrieb Fam Zheng: On Mon, 05/15 14:32, Peter Lieven wrote: Am 15.05.2017 um 14:28 schrieb Fam Zheng: On Mon, 05/15 13:58, Peter Lieven wrote: Am 15.05.2017 um 13:53 schrieb Fam Zheng: On Mon, 05/15 13:26, Peter Lieven wrote: Am 15.05.2017 um 12:50 schrieb Fam Zheng: On Mon, 05/15 12:02, Peter Lieven wrote: Hi Block developers, I would like to add a feature to Qemu to drain all traffic from a block so that I can take external snaphosts without the risk to that in the middle of a write operation. Its meant for cases where where QGA freeze/thaw is not available. For me its enough to have this through qemu-io, but Kevin asked me to check if its not worth to have a stable API for it and present it via QMP/HMP. What are your thoughts? For debugging purpose or a "hacky" usage where you know what you are doing, it may be fine to have this. The only issue is it should be a separate flag, like BlockJob.user_paused. How can I add, remove such a flag? Like bs->user_drained. Set it in "drain" command, then increment bs->quiesce_counter if toggled; vise versa. Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these functions the counter is incremented already. You're right, calling bdrv_drained_begin() is better. What happens from guest perspective? In the case of virtio, the request queue is not handled and -ETIMEDOUT may happen. With IDE, I/O commands are still handled, the command is not effective (or rather the implementation is not complete). That it only works with virtio is fine. However, the thing it does not work correctly apply then also to all other users of the drained_begin/end functions, right? As for the timeout I only plan to drain the device for about 1 second. It didn't matter because for IDE, the invariant (staying quiesced as long as necessary) is already ensured by BQL. Virtio is different because it supports ioeventfd and data plane. Okay understood. So my use of bdrv_drained_begin/end is more an abuse of these functions? Sort of. But it's not unreasonable to "extend" bdrv_drained_begin/end to cover IDE, I just haven't thought about "how". Do you have another idea how to achieve what I want? I was thinking of throttle the I/O to zero. It would be enough to do this for writes, reading doesn't hurt in my case. Maybe add a block filter on top of the drained node, drain it when doing so, then queue all further requests with a CoQueue until "undrain". (It is then not quite to "drain" but to "halt" or "pause", though.) To get the drain for free was why I was looking at this approach. If I read correctly if I keep using bdrv_drained_begin/end its too hacky to implement it in QMP? I think so. If yes, would support adding it to qemu-io? I'm under the impression that you are looking to a real use case, I don't think I like the idea. Also, accessing the image from other processes while QEMU is using it is strongly discouraged, and there is the coming image locking mechanism to prevent this from happening. Why is the blockdev-snapshot command not enough? blockdev-snapshot is enough, but I still fear the case there is suddenly too much I/O for the live-commit. And that the whole snapshot / commit code is more senstive than just stopping I/O for a second or two. do you have a pointer to the image locking mechanism? Peter
Re: [Qemu-devel] [RFC PATCH] qemu-io: add drain/undrain cmd
Am 15.05.2017 um 14:52 hat Fam Zheng geschrieben: > On Mon, 05/15 14:32, Peter Lieven wrote: > > Am 15.05.2017 um 14:28 schrieb Fam Zheng: > > > On Mon, 05/15 13:58, Peter Lieven wrote: > > > > Am 15.05.2017 um 13:53 schrieb Fam Zheng: > > > > > On Mon, 05/15 13:26, Peter Lieven wrote: > > > > > > Am 15.05.2017 um 12:50 schrieb Fam Zheng: > > > > > > > On Mon, 05/15 12:02, Peter Lieven wrote: > > > > > > > > Hi Block developers, > > > > > > > > > > > > > > > > I would like to add a feature to Qemu to drain all traffic from > > > > > > > > a block so that > > > > > > > > I can take external snaphosts without the risk to that in the > > > > > > > > middle of a write > > > > > > > > operation. Its meant for cases where where QGA freeze/thaw is > > > > > > > > not available. > > > > > > > > > > > > > > > > For me its enough to have this through qemu-io, but Kevin asked > > > > > > > > me to check > > > > > > > > if its not worth to have a stable API for it and present it via > > > > > > > > QMP/HMP. > > > > > > > > > > > > > > > > What are your thoughts? > > > > > > > For debugging purpose or a "hacky" usage where you know what you > > > > > > > are doing, it > > > > > > > may be fine to have this. The only issue is it should be a > > > > > > > separate flag, like > > > > > > > BlockJob.user_paused. > > > > > > How can I add, remove such a flag? > > > > > Like bs->user_drained. Set it in "drain" command, then increment > > > > > bs->quiesce_counter if toggled; vise versa. > > > > Ah okay. You wouldn't use bdrv_drained_begin/end? Because in these > > > > functions > > > > the counter is incremented already. > > > You're right, calling bdrv_drained_begin() is better. > > > > > > > > > > > > > > > > > > What happens from guest perspective? In the case of virtio, the > > > > > > > request queue is > > > > > > > not handled and -ETIMEDOUT may happen. With IDE, I/O commands are > > > > > > > still handled, > > > > > > > the command is not effective (or rather the implementation is not > > > > > > > complete). > > > > > > That it only works with virtio is fine. However, the thing it does > > > > > > not work correctly > > > > > > apply then also to all other users of the drained_begin/end > > > > > > functions, right? > > > > > > As for the timeout I only plan to drain the device for about 1 > > > > > > second. > > > > > It didn't matter because for IDE, the invariant (staying quiesced as > > > > > long as > > > > > necessary) is already ensured by BQL. Virtio is different because it > > > > > supports > > > > > ioeventfd and data plane. > > > > Okay understood. So my use of bdrv_drained_begin/end is more an abuse of > > > > these functions? > > > Sort of. But it's not unreasonable to "extend" bdrv_drained_begin/end to > > > cover > > > IDE, I just haven't thought about "how". > > > > > > > Do you have another idea how to achieve what I want? I was thinking of > > > > throttle > > > > the I/O to zero. It would be enough to do this for writes, reading > > > > doesn't hurt in > > > > my case. > > > Maybe add a block filter on top of the drained node, drain it when doing > > > so, > > > then queue all further requests with a CoQueue until "undrain". (It is > > > then not > > > quite to "drain" but to "halt" or "pause", though.) > > > > To get the drain for free was why I was looking at this approach. If I read > > correctly > > if I keep using bdrv_drained_begin/end its too hacky to implement it in QMP? > > I think so. > > > If yes, would support adding it to qemu-io? > > I'm under the impression that you are looking to a real use case, I don't > think > I like the idea. Also, accessing the image from other processes while QEMU is > using it is strongly discouraged, and there is the coming image locking > mechanism to prevent this from happening. Thinking a bit more about this, it looks to me as if what we really want is inactivating the image. Should we add an option to the 'stop' command (or introduce another command to be used on an already stopped VM) that inactivates all/some images? And then 'cont' regains control over the images, just like after migration. Automatically stopping the VM while the snapshot is taken also makes it work with IDE and prevents guests running into timeouts, which makes it look much more like a proper solution to me. But then, stop/cont would already achieve something very similar today (as 'stop' calls bdrv_drain_all(); just the locking part wouldn't be covered), so maybe there is a reason not to use it in your case, Peter? Kevin