Re: [PATCH v2 5/7] 9pfs: fix 'Twalk' to only send error if no component walked

2022-03-13 Thread Christian Schoenebeck
On Samstag, 12. März 2022 12:06:47 CET Christian Schoenebeck wrote:
> Current implementation of 'Twalk' request handling always sends an 'Rerror'
> response if any error occured. The 9p2000 protocol spec says though:
> 
>   "
>   If the first element cannot be walked for any reason, Rerror is returned.
>   Otherwise, the walk will return an Rwalk message containing nwqid qids
>   corresponding, in order, to the files that are visited by the nwqid
>   successful elementwise walks; nwqid is therefore either nwname or the
> index of the first elementwise walk that failed.
>   "
> 
>   http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor33
> 
> For that reason we are no longer leaving from an error path in function
> v9fs_walk(), unless really no path component could be walked successfully or
> if the request has been interrupted.
> 
> Local variable 'nwalked' counts and reflects the number of path components
> successfully processed by background I/O thread, whereas local variable
> 'name_idx' subsequently counts and reflects the number of path components
> eventually accepted successfully by 9p server controller portion.
> 
> New local variable 'any_err' is an aggregate variable reflecting whether any
> error occurred at all, while already existing variable 'err' only reflects
> the last error.
> 
> Despite QIDs being delivered to client in a more relaxed way now, it is
> important to note though that fid still must remain unaffected if any error
> occurred.
> 
> Signed-off-by: Christian Schoenebeck 
> ---
>  hw/9pfs/9p.c | 29 +
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index 298f4e6548..e8d838f6fd 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -1766,7 +1766,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
>  {
>  int name_idx, nwalked;
>  g_autofree V9fsQID *qids = NULL;
> -int i, err = 0;
> +int i, err = 0, any_err = 0;
>  V9fsPath dpath, path;
>  P9ARRAY_REF(V9fsPath) pathes = NULL;
>  uint16_t nwnames;
> @@ -1832,6 +1832,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
>   * driver code altogether inside the following block.
>   */
>  v9fs_co_run_in_worker({
> +nwalked = 0;
>  if (v9fs_request_cancelled(pdu)) {
>  err = -EINTR;
>  break;
> @@ -1842,7 +1843,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
>  break;
>  }
>  stbuf = fidst;
> -for (nwalked = 0; nwalked < nwnames; nwalked++) {
> +for (; nwalked < nwnames; nwalked++) {
>  if (v9fs_request_cancelled(pdu)) {
>  err = -EINTR;
>  break;
> @@ -1874,12 +1875,13 @@ static void coroutine_fn v9fs_walk(void *opaque)
>  /*
>   * Handle all the rest of this Twalk request on main thread ...
>   */
> -if (err < 0) {
> +if ((err < 0 && !nwalked) || err == -EINTR) {
>  goto out;
>  }
> 
> +any_err |= err;
>  err = stat_to_qid(pdu, &fidst, &qid);
> -if (err < 0) {
> +if (err < 0 && !nwalked) {
>  goto out;
>  }

That's missing to update 'any_err' variable here, i.e. if nwalked > 0 and 
stat_to_qid() fails here, then ...

>  stbuf = fidst;
> @@ -1888,20 +1890,30 @@ static void coroutine_fn v9fs_walk(void *opaque)
>  v9fs_path_copy(&dpath, &fidp->path);
>  v9fs_path_copy(&path, &fidp->path);
> 
> -for (name_idx = 0; name_idx < nwnames; name_idx++) {
> +for (name_idx = 0; name_idx < nwalked; name_idx++) {
>  if (!same_stat_id(&pdu->s->root_st, &stbuf) ||
>  strcmp("..", wnames[name_idx].data))
>  {
>  stbuf = stbufs[name_idx];
>  err = stat_to_qid(pdu, &stbuf, &qid);

... this stat_to_qid() call ^ here would cause the previous stat_to_qid() 
error to not make into 'any_err' variable.

For that reason I will change the 'any_err' code like this at all places:

any_err |= err = ...

Because that will make the code significantly less error prone and it will be 
less lines of code as well.

>  if (err < 0) {
> -goto out;
> +break;
>  }
>  v9fs_path_copy(&path, &pathes[name_idx]);
>  v9fs_path_copy(&dpath, &path);
>  }
>  memcpy(&qids[name_idx], &qid, sizeof(qid));
>  }
> +any_err |= err;
> +if (any_err < 0) {
> +if (!name_idx) {
> +/* don't send any QIDs, send Rlerror instead */
> +goto out;
> +} else {
> +/* send QIDs (not Rlerror), but fid MUST remain unaffected */
> +goto send_qids;
> +}
> +}
>  if (fid == newfid) {
>  if (fidp->fid_type != P9_FID_NONE) {
>  err = -EINVAL;
> @@ -1919,8 +1931,9 @@ static void coroutine_fn v9fs_walk(void *opaque)
>  newfidp->uid = fidp->uid;
>  v9fs_path_copy(&newfidp->path, &path);
>  }
> -err = v

[PATCH v3 0/7] 9pfs: fix 'Twalk' protocol violation

2022-03-13 Thread Christian Schoenebeck
Currently the implementation of 'Twalk' does not behave exactly as specified
by the 9p2000 protocol specification. Actual fix is patch 5; see the
description of that patch for details of what this overall fix and series is
about.

Patch 4 is a preparatory (pure) refactoring change to make patch 5 better
readable.

All the other patches are just additional test cases for guarding 'Twalk'
behaviour.

v2 -> v3:

  * Use 'any_err |= err = ...' at all occurrences to make v9fs_walk() function
less error prone, which also reduces lines of code. [patch 5]

Christian Schoenebeck (7):
  tests/9pfs: walk to non-existent dir
  tests/9pfs: Twalk with nwname=0
  tests/9pfs: compare QIDs in fs_walk_none() test
  9pfs: refactor 'name_idx' -> 'nwalked' in v9fs_walk()
  9pfs: fix 'Twalk' to only send error if no component walked
  tests/9pfs: guard recent 'Twalk' behaviour fix
  tests/9pfs: check fid being unaffected in fs_walk_2nd_nonexistent

 hw/9pfs/9p.c |  57 ++-
 tests/qtest/virtio-9p-test.c | 191 ++-
 2 files changed, 221 insertions(+), 27 deletions(-)

-- 
2.30.2




[PATCH v3 2/7] tests/9pfs: Twalk with nwname=0

2022-03-13 Thread Christian Schoenebeck
Send Twalk request with nwname=0. In this case no QIDs should
be returned by 9p server; this is equivalent to walking to dot.

Signed-off-by: Christian Schoenebeck 
Reviewed-by: Greg Kurz 
---
 tests/qtest/virtio-9p-test.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
index 22bdd74bc1..6c00da03f4 100644
--- a/tests/qtest/virtio-9p-test.c
+++ b/tests/qtest/virtio-9p-test.c
@@ -1002,6 +1002,27 @@ static void fs_walk_nonexistent(void *obj, void *data, 
QGuestAllocator *t_alloc)
 do_walk_expect_error(v9p, "non-existent", ENOENT);
 }
 
+static void fs_walk_none(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+QVirtio9P *v9p = obj;
+alloc = t_alloc;
+v9fs_qid root_qid;
+g_autofree v9fs_qid *wqid = NULL;
+P9Req *req;
+
+do_version(v9p);
+req = v9fs_tattach(v9p, 0, getuid(), 0);
+v9fs_req_wait_for_reply(req, NULL);
+v9fs_rattach(req, &root_qid);
+
+req = v9fs_twalk(v9p, 0, 1, 0, NULL, 0);
+v9fs_req_wait_for_reply(req, NULL);
+v9fs_rwalk(req, NULL, &wqid);
+
+/* special case: no QID is returned if nwname=0 was sent */
+g_assert(wqid == NULL);
+}
+
 static void fs_walk_dotdot(void *obj, void *data, QGuestAllocator *t_alloc)
 {
 QVirtio9P *v9p = obj;
@@ -1435,6 +1456,7 @@ static void register_virtio_9p_test(void)
 qos_add_test("synth/walk/basic", "virtio-9p", fs_walk,  &opts);
 qos_add_test("synth/walk/no_slash", "virtio-9p", fs_walk_no_slash,
   &opts);
+qos_add_test("synth/walk/none", "virtio-9p", fs_walk_none, &opts);
 qos_add_test("synth/walk/dotdot_from_root", "virtio-9p",
  fs_walk_dotdot,  &opts);
 qos_add_test("synth/walk/non_existent", "virtio-9p", fs_walk_nonexistent,
-- 
2.30.2




[PATCH v3 3/7] tests/9pfs: compare QIDs in fs_walk_none() test

2022-03-13 Thread Christian Schoenebeck
Extend previously added fs_walk_none() test by comparing the QID
of the root fid with the QID of the cloned fid. They should be
equal.

Signed-off-by: Christian Schoenebeck 
---
 tests/qtest/virtio-9p-test.c | 87 
 1 file changed, 87 insertions(+)

diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
index 6c00da03f4..a1160f4659 100644
--- a/tests/qtest/virtio-9p-test.c
+++ b/tests/qtest/virtio-9p-test.c
@@ -371,8 +371,15 @@ static P9Req *v9fs_tattach(QVirtio9P *v9p, uint32_t fid, 
uint32_t n_uname,
 return req;
 }
 
+/* type[1] version[4] path[8] */
 typedef char v9fs_qid[13];
 
+static inline bool is_same_qid(v9fs_qid a, v9fs_qid b)
+{
+/* don't compare QID version for checking for file ID equalness */
+return a[0] == b[0] && memcmp(&a[5], &b[5], 8) == 0;
+}
+
 /* size[4] Rattach tag[2] qid[13] */
 static void v9fs_rattach(P9Req *req, v9fs_qid *qid)
 {
@@ -425,6 +432,79 @@ static void v9fs_rwalk(P9Req *req, uint16_t *nwqid, 
v9fs_qid **wqid)
 v9fs_req_free(req);
 }
 
+/* size[4] Tgetattr tag[2] fid[4] request_mask[8] */
+static P9Req *v9fs_tgetattr(QVirtio9P *v9p, uint32_t fid, uint64_t 
request_mask,
+uint16_t tag)
+{
+P9Req *req;
+
+req = v9fs_req_init(v9p, 4 + 8, P9_TGETATTR, tag);
+v9fs_uint32_write(req, fid);
+v9fs_uint64_write(req, request_mask);
+v9fs_req_send(req);
+return req;
+}
+
+typedef struct v9fs_attr {
+uint64_t valid;
+v9fs_qid qid;
+uint32_t mode;
+uint32_t uid;
+uint32_t gid;
+uint64_t nlink;
+uint64_t rdev;
+uint64_t size;
+uint64_t blksize;
+uint64_t blocks;
+uint64_t atime_sec;
+uint64_t atime_nsec;
+uint64_t mtime_sec;
+uint64_t mtime_nsec;
+uint64_t ctime_sec;
+uint64_t ctime_nsec;
+uint64_t btime_sec;
+uint64_t btime_nsec;
+uint64_t gen;
+uint64_t data_version;
+} v9fs_attr;
+
+#define P9_GETATTR_BASIC0x07ffULL /* Mask for fields up to BLOCKS */
+
+/*
+ * size[4] Rgetattr tag[2] valid[8] qid[13] mode[4] uid[4] gid[4] nlink[8]
+ *  rdev[8] size[8] blksize[8] blocks[8]
+ *  atime_sec[8] atime_nsec[8] mtime_sec[8] mtime_nsec[8]
+ *  ctime_sec[8] ctime_nsec[8] btime_sec[8] btime_nsec[8]
+ *  gen[8] data_version[8]
+ */
+static void v9fs_rgetattr(P9Req *req, v9fs_attr *attr)
+{
+v9fs_req_recv(req, P9_RGETATTR);
+
+v9fs_uint64_read(req, &attr->valid);
+v9fs_memread(req, &attr->qid, 13);
+v9fs_uint32_read(req, &attr->mode);
+v9fs_uint32_read(req, &attr->uid);
+v9fs_uint32_read(req, &attr->gid);
+v9fs_uint64_read(req, &attr->nlink);
+v9fs_uint64_read(req, &attr->rdev);
+v9fs_uint64_read(req, &attr->size);
+v9fs_uint64_read(req, &attr->blksize);
+v9fs_uint64_read(req, &attr->blocks);
+v9fs_uint64_read(req, &attr->atime_sec);
+v9fs_uint64_read(req, &attr->atime_nsec);
+v9fs_uint64_read(req, &attr->mtime_sec);
+v9fs_uint64_read(req, &attr->mtime_nsec);
+v9fs_uint64_read(req, &attr->ctime_sec);
+v9fs_uint64_read(req, &attr->ctime_nsec);
+v9fs_uint64_read(req, &attr->btime_sec);
+v9fs_uint64_read(req, &attr->btime_nsec);
+v9fs_uint64_read(req, &attr->gen);
+v9fs_uint64_read(req, &attr->data_version);
+
+v9fs_req_free(req);
+}
+
 /* size[4] Treaddir tag[2] fid[4] offset[8] count[4] */
 static P9Req *v9fs_treaddir(QVirtio9P *v9p, uint32_t fid, uint64_t offset,
 uint32_t count, uint16_t tag)
@@ -1009,6 +1089,7 @@ static void fs_walk_none(void *obj, void *data, 
QGuestAllocator *t_alloc)
 v9fs_qid root_qid;
 g_autofree v9fs_qid *wqid = NULL;
 P9Req *req;
+struct v9fs_attr attr;
 
 do_version(v9p);
 req = v9fs_tattach(v9p, 0, getuid(), 0);
@@ -1021,6 +1102,12 @@ static void fs_walk_none(void *obj, void *data, 
QGuestAllocator *t_alloc)
 
 /* special case: no QID is returned if nwname=0 was sent */
 g_assert(wqid == NULL);
+
+req = v9fs_tgetattr(v9p, 1, P9_GETATTR_BASIC, 0);
+v9fs_req_wait_for_reply(req, NULL);
+v9fs_rgetattr(req, &attr);
+
+g_assert(is_same_qid(root_qid, attr.qid));
 }
 
 static void fs_walk_dotdot(void *obj, void *data, QGuestAllocator *t_alloc)
-- 
2.30.2




[PATCH v3 5/7] 9pfs: fix 'Twalk' to only send error if no component walked

2022-03-13 Thread Christian Schoenebeck
Current implementation of 'Twalk' request handling always sends an 'Rerror'
response if any error occured. The 9p2000 protocol spec says though:

  "
  If the first element cannot be walked for any reason, Rerror is returned.
  Otherwise, the walk will return an Rwalk message containing nwqid qids
  corresponding, in order, to the files that are visited by the nwqid
  successful elementwise walks; nwqid is therefore either nwname or the index
  of the first elementwise walk that failed.
  "

  http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor33

For that reason we are no longer leaving from an error path in function
v9fs_walk(), unless really no path component could be walked successfully or
if the request has been interrupted.

Local variable 'nwalked' counts and reflects the number of path components
successfully processed by background I/O thread, whereas local variable
'name_idx' subsequently counts and reflects the number of path components
eventually accepted successfully by 9p server controller portion.

New local variable 'any_err' is an aggregate variable reflecting whether any
error occurred at all, while already existing variable 'err' only reflects
the last error.

Despite QIDs being delivered to client in a more relaxed way now, it is
important to note though that fid still must remain unaffected if any error
occurred.

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 43 +++
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 298f4e6548..e770972a71 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1766,7 +1766,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
 {
 int name_idx, nwalked;
 g_autofree V9fsQID *qids = NULL;
-int i, err = 0;
+int i, err = 0, any_err = 0;
 V9fsPath dpath, path;
 P9ARRAY_REF(V9fsPath) pathes = NULL;
 uint16_t nwnames;
@@ -1832,19 +1832,20 @@ static void coroutine_fn v9fs_walk(void *opaque)
  * driver code altogether inside the following block.
  */
 v9fs_co_run_in_worker({
+nwalked = 0;
 if (v9fs_request_cancelled(pdu)) {
-err = -EINTR;
+any_err |= err = -EINTR;
 break;
 }
 err = s->ops->lstat(&s->ctx, &dpath, &fidst);
 if (err < 0) {
-err = -errno;
+any_err |= err = -errno;
 break;
 }
 stbuf = fidst;
-for (nwalked = 0; nwalked < nwnames; nwalked++) {
+for (; nwalked < nwnames; nwalked++) {
 if (v9fs_request_cancelled(pdu)) {
-err = -EINTR;
+any_err |= err = -EINTR;
 break;
 }
 if (!same_stat_id(&pdu->s->root_st, &stbuf) ||
@@ -1854,16 +1855,16 @@ static void coroutine_fn v9fs_walk(void *opaque)
wnames[nwalked].data,
&pathes[nwalked]);
 if (err < 0) {
-err = -errno;
+any_err |= err = -errno;
 break;
 }
 if (v9fs_request_cancelled(pdu)) {
-err = -EINTR;
+any_err |= err = -EINTR;
 break;
 }
 err = s->ops->lstat(&s->ctx, &pathes[nwalked], &stbuf);
 if (err < 0) {
-err = -errno;
+any_err |= err = -errno;
 break;
 }
 stbufs[nwalked] = stbuf;
@@ -1874,12 +1875,12 @@ static void coroutine_fn v9fs_walk(void *opaque)
 /*
  * Handle all the rest of this Twalk request on main thread ...
  */
-if (err < 0) {
+if ((err < 0 && !nwalked) || err == -EINTR) {
 goto out;
 }
 
-err = stat_to_qid(pdu, &fidst, &qid);
-if (err < 0) {
+any_err |= err = stat_to_qid(pdu, &fidst, &qid);
+if (err < 0 && !nwalked) {
 goto out;
 }
 stbuf = fidst;
@@ -1888,20 +1889,29 @@ static void coroutine_fn v9fs_walk(void *opaque)
 v9fs_path_copy(&dpath, &fidp->path);
 v9fs_path_copy(&path, &fidp->path);
 
-for (name_idx = 0; name_idx < nwnames; name_idx++) {
+for (name_idx = 0; name_idx < nwalked; name_idx++) {
 if (!same_stat_id(&pdu->s->root_st, &stbuf) ||
 strcmp("..", wnames[name_idx].data))
 {
 stbuf = stbufs[name_idx];
-err = stat_to_qid(pdu, &stbuf, &qid);
+any_err |= err = stat_to_qid(pdu, &stbuf, &qid);
 if (err < 0) {
-goto out;
+break;
 }
 v9fs_path_copy(&path, &pathes[name_idx]);
 v9fs_path_copy(&dpath, &path);
 }
 memcpy(&qids[name_idx], &qid, sizeof(qid));
 }
+if (any_err < 0) {
+if (!name_idx) {
+/* don't send any QIDs, send Rlerror instead */
+goto out;
+} el

[PATCH v3 1/7] tests/9pfs: walk to non-existent dir

2022-03-13 Thread Christian Schoenebeck
Expect ENOENT Rlerror response when trying to walk to a
non-existent directory.

Signed-off-by: Christian Schoenebeck 
Reviewed-by: Greg Kurz 
---
 tests/qtest/virtio-9p-test.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
index 01ca076afe..22bdd74bc1 100644
--- a/tests/qtest/virtio-9p-test.c
+++ b/tests/qtest/virtio-9p-test.c
@@ -606,6 +606,25 @@ static uint32_t do_walk(QVirtio9P *v9p, const char *path)
 return fid;
 }
 
+/* utility function: walk to requested dir and expect passed error response */
+static void do_walk_expect_error(QVirtio9P *v9p, const char *path, uint32_t 
err)
+{
+char **wnames;
+P9Req *req;
+uint32_t _err;
+const uint32_t fid = genfid();
+
+int nwnames = split(path, "/", &wnames);
+
+req = v9fs_twalk(v9p, 0, fid, nwnames, wnames, 0);
+v9fs_req_wait_for_reply(req, NULL);
+v9fs_rlerror(req, &_err);
+
+g_assert_cmpint(_err, ==, err);
+
+split_free(&wnames);
+}
+
 static void fs_version(void *obj, void *data, QGuestAllocator *t_alloc)
 {
 alloc = t_alloc;
@@ -974,6 +993,15 @@ static void fs_walk_no_slash(void *obj, void *data, 
QGuestAllocator *t_alloc)
 g_free(wnames[0]);
 }
 
+static void fs_walk_nonexistent(void *obj, void *data, QGuestAllocator 
*t_alloc)
+{
+QVirtio9P *v9p = obj;
+alloc = t_alloc;
+
+do_attach(v9p);
+do_walk_expect_error(v9p, "non-existent", ENOENT);
+}
+
 static void fs_walk_dotdot(void *obj, void *data, QGuestAllocator *t_alloc)
 {
 QVirtio9P *v9p = obj;
@@ -1409,6 +1437,8 @@ static void register_virtio_9p_test(void)
   &opts);
 qos_add_test("synth/walk/dotdot_from_root", "virtio-9p",
  fs_walk_dotdot,  &opts);
+qos_add_test("synth/walk/non_existent", "virtio-9p", fs_walk_nonexistent,
+  &opts);
 qos_add_test("synth/lopen/basic", "virtio-9p", fs_lopen,  &opts);
 qos_add_test("synth/write/basic", "virtio-9p", fs_write,  &opts);
 qos_add_test("synth/flush/success", "virtio-9p", fs_flush_success,
-- 
2.30.2




[PATCH v3 7/7] tests/9pfs: check fid being unaffected in fs_walk_2nd_nonexistent

2022-03-13 Thread Christian Schoenebeck
Extend previously added test case by checking that fid is unaffected
by 'Twalk' request (i.e. when 2nd path component of request being
invalid). Do that by comparing the QID of root fid with QID of walked
fid; they should be identical.

Signed-off-by: Christian Schoenebeck 
---
 tests/qtest/virtio-9p-test.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
index f6e78d388e..b9c6819d01 100644
--- a/tests/qtest/virtio-9p-test.c
+++ b/tests/qtest/virtio-9p-test.c
@@ -721,14 +721,19 @@ static void fs_version(void *obj, void *data, 
QGuestAllocator *t_alloc)
 do_version(obj);
 }
 
-static void do_attach(QVirtio9P *v9p)
+static void do_attach_rqid(QVirtio9P *v9p, v9fs_qid *qid)
 {
 P9Req *req;
 
 do_version(v9p);
 req = v9fs_tattach(v9p, 0, getuid(), 0);
 v9fs_req_wait_for_reply(req, NULL);
-v9fs_rattach(req, NULL);
+v9fs_rattach(req, qid);
+}
+
+static void do_attach(QVirtio9P *v9p)
+{
+do_attach_rqid(v9p, NULL);
 }
 
 static void fs_attach(void *obj, void *data, QGuestAllocator *t_alloc)
@@ -1101,19 +1106,22 @@ static void fs_walk_2nd_nonexistent(void *obj, void 
*data,
 {
 QVirtio9P *v9p = obj;
 alloc = t_alloc;
+v9fs_qid root_qid;
 uint16_t nwqid;
 g_autofree v9fs_qid *wqid = NULL;
 g_autofree char *path = g_strdup_printf(
 QTEST_V9FS_SYNTH_WALK_FILE "/non-existent", 0
 );
 
-do_attach(v9p);
+do_attach_rqid(v9p, &root_qid);
 do_walk_rqids(v9p, path, &nwqid, &wqid);
 /*
  * The 9p2000 protocol spec says: "nwqid is therefore either nwname or the
  * index of the first elementwise walk that failed."
  */
 assert(nwqid == 1);
+/* expect fid being unaffected by walk */
+g_assert(wqid && wqid[0] && is_same_qid(root_qid, wqid[0]));
 }
 
 static void fs_walk_none(void *obj, void *data, QGuestAllocator *t_alloc)
-- 
2.30.2




[PATCH v3 4/7] 9pfs: refactor 'name_idx' -> 'nwalked' in v9fs_walk()

2022-03-13 Thread Christian Schoenebeck
The local variable 'name_idx' is used in two loops in function v9fs_walk().
Let the first loop use its own variable 'nwalked' instead, which we will
use in subsequent patch as the number of (requested) path components
successfully walked by background I/O thread.

Signed-off-by: Christian Schoenebeck 
Reviewed-by: Greg Kurz 
---
 hw/9pfs/9p.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index a6d6b3f835..298f4e6548 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1764,7 +1764,7 @@ static bool same_stat_id(const struct stat *a, const 
struct stat *b)
 
 static void coroutine_fn v9fs_walk(void *opaque)
 {
-int name_idx;
+int name_idx, nwalked;
 g_autofree V9fsQID *qids = NULL;
 int i, err = 0;
 V9fsPath dpath, path;
@@ -1842,17 +1842,17 @@ static void coroutine_fn v9fs_walk(void *opaque)
 break;
 }
 stbuf = fidst;
-for (name_idx = 0; name_idx < nwnames; name_idx++) {
+for (nwalked = 0; nwalked < nwnames; nwalked++) {
 if (v9fs_request_cancelled(pdu)) {
 err = -EINTR;
 break;
 }
 if (!same_stat_id(&pdu->s->root_st, &stbuf) ||
-strcmp("..", wnames[name_idx].data))
+strcmp("..", wnames[nwalked].data))
 {
 err = s->ops->name_to_path(&s->ctx, &dpath,
-   wnames[name_idx].data,
-   &pathes[name_idx]);
+   wnames[nwalked].data,
+   &pathes[nwalked]);
 if (err < 0) {
 err = -errno;
 break;
@@ -1861,13 +1861,13 @@ static void coroutine_fn v9fs_walk(void *opaque)
 err = -EINTR;
 break;
 }
-err = s->ops->lstat(&s->ctx, &pathes[name_idx], &stbuf);
+err = s->ops->lstat(&s->ctx, &pathes[nwalked], &stbuf);
 if (err < 0) {
 err = -errno;
 break;
 }
-stbufs[name_idx] = stbuf;
-v9fs_path_copy(&dpath, &pathes[name_idx]);
+stbufs[nwalked] = stbuf;
+v9fs_path_copy(&dpath, &pathes[nwalked]);
 }
 }
 });
-- 
2.30.2




[PATCH v3 6/7] tests/9pfs: guard recent 'Twalk' behaviour fix

2022-03-13 Thread Christian Schoenebeck
Previous 9p patch fixed 'Twalk' request handling, which was previously not
behaving as specified by the 9p2000 protocol spec. This patch adds a new test
case which guards the new 'Twalk' behaviour in question.

More specifically: it sends a 'Twalk' request where the 1st path component
is valid, whereas the 2nd path component transmitted to server does not
exist. The expected behaviour is that 9p server would respond by sending
a 'Rwalk' response with exactly 1 QID (instead of 'Rlerror' response).

Signed-off-by: Christian Schoenebeck 
Reviewed-by: Greg Kurz 
---
 tests/qtest/virtio-9p-test.c | 42 +---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
index a1160f4659..f6e78d388e 100644
--- a/tests/qtest/virtio-9p-test.c
+++ b/tests/qtest/virtio-9p-test.c
@@ -669,8 +669,12 @@ static void do_version(QVirtio9P *v9p)
 g_assert_cmpmem(server_version, server_len, version, strlen(version));
 }
 
-/* utility function: walk to requested dir and return fid for that dir */
-static uint32_t do_walk(QVirtio9P *v9p, const char *path)
+/*
+ * utility function: walk to requested dir and return fid for that dir and
+ * the QIDs of server response
+ */
+static uint32_t do_walk_rqids(QVirtio9P *v9p, const char *path, uint16_t 
*nwqid,
+  v9fs_qid **wqid)
 {
 char **wnames;
 P9Req *req;
@@ -680,12 +684,18 @@ static uint32_t do_walk(QVirtio9P *v9p, const char *path)
 
 req = v9fs_twalk(v9p, 0, fid, nwnames, wnames, 0);
 v9fs_req_wait_for_reply(req, NULL);
-v9fs_rwalk(req, NULL, NULL);
+v9fs_rwalk(req, nwqid, wqid);
 
 split_free(&wnames);
 return fid;
 }
 
+/* utility function: walk to requested dir and return fid for that dir */
+static uint32_t do_walk(QVirtio9P *v9p, const char *path)
+{
+return do_walk_rqids(v9p, path, NULL, NULL);
+}
+
 /* utility function: walk to requested dir and expect passed error response */
 static void do_walk_expect_error(QVirtio9P *v9p, const char *path, uint32_t 
err)
 {
@@ -1079,9 +1089,33 @@ static void fs_walk_nonexistent(void *obj, void *data, 
QGuestAllocator *t_alloc)
 alloc = t_alloc;
 
 do_attach(v9p);
+/*
+ * The 9p2000 protocol spec says: "If the first element cannot be walked
+ * for any reason, Rerror is returned."
+ */
 do_walk_expect_error(v9p, "non-existent", ENOENT);
 }
 
+static void fs_walk_2nd_nonexistent(void *obj, void *data,
+QGuestAllocator *t_alloc)
+{
+QVirtio9P *v9p = obj;
+alloc = t_alloc;
+uint16_t nwqid;
+g_autofree v9fs_qid *wqid = NULL;
+g_autofree char *path = g_strdup_printf(
+QTEST_V9FS_SYNTH_WALK_FILE "/non-existent", 0
+);
+
+do_attach(v9p);
+do_walk_rqids(v9p, path, &nwqid, &wqid);
+/*
+ * The 9p2000 protocol spec says: "nwqid is therefore either nwname or the
+ * index of the first elementwise walk that failed."
+ */
+assert(nwqid == 1);
+}
+
 static void fs_walk_none(void *obj, void *data, QGuestAllocator *t_alloc)
 {
 QVirtio9P *v9p = obj;
@@ -1548,6 +1582,8 @@ static void register_virtio_9p_test(void)
  fs_walk_dotdot,  &opts);
 qos_add_test("synth/walk/non_existent", "virtio-9p", fs_walk_nonexistent,
   &opts);
+qos_add_test("synth/walk/2nd_non_existent", "virtio-9p",
+ fs_walk_2nd_nonexistent, &opts);
 qos_add_test("synth/lopen/basic", "virtio-9p", fs_lopen,  &opts);
 qos_add_test("synth/write/basic", "virtio-9p", fs_write,  &opts);
 qos_add_test("synth/flush/success", "virtio-9p", fs_flush_success,
-- 
2.30.2




Re: [PATCH v3 7/7] tests/9pfs: check fid being unaffected in fs_walk_2nd_nonexistent

2022-03-13 Thread Christian Schoenebeck
On Sonntag, 13. März 2022 10:28:32 CET Christian Schoenebeck wrote:
> Extend previously added test case by checking that fid is unaffected
> by 'Twalk' request (i.e. when 2nd path component of request being
> invalid). Do that by comparing the QID of root fid with QID of walked
> fid; they should be identical.
> 
> Signed-off-by: Christian Schoenebeck 
> ---
>  tests/qtest/virtio-9p-test.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> index f6e78d388e..b9c6819d01 100644
> --- a/tests/qtest/virtio-9p-test.c
> +++ b/tests/qtest/virtio-9p-test.c
> @@ -721,14 +721,19 @@ static void fs_version(void *obj, void *data,
> QGuestAllocator *t_alloc) do_version(obj);
>  }
> 
> -static void do_attach(QVirtio9P *v9p)
> +static void do_attach_rqid(QVirtio9P *v9p, v9fs_qid *qid)
>  {
>  P9Req *req;
> 
>  do_version(v9p);
>  req = v9fs_tattach(v9p, 0, getuid(), 0);
>  v9fs_req_wait_for_reply(req, NULL);
> -v9fs_rattach(req, NULL);
> +v9fs_rattach(req, qid);
> +}
> +
> +static void do_attach(QVirtio9P *v9p)
> +{
> +do_attach_rqid(v9p, NULL);
>  }
> 
>  static void fs_attach(void *obj, void *data, QGuestAllocator *t_alloc)
> @@ -1101,19 +1106,22 @@ static void fs_walk_2nd_nonexistent(void *obj, void
> *data, {
>  QVirtio9P *v9p = obj;
>  alloc = t_alloc;
> +v9fs_qid root_qid;
>  uint16_t nwqid;
>  g_autofree v9fs_qid *wqid = NULL;
>  g_autofree char *path = g_strdup_printf(
>  QTEST_V9FS_SYNTH_WALK_FILE "/non-existent", 0
>  );
> 
> -do_attach(v9p);
> +do_attach_rqid(v9p, &root_qid);
>  do_walk_rqids(v9p, path, &nwqid, &wqid);
>  /*
>   * The 9p2000 protocol spec says: "nwqid is therefore either nwname or
> the * index of the first elementwise walk that failed."
>   */
>  assert(nwqid == 1);
> +/* expect fid being unaffected by walk */
> +g_assert(wqid && wqid[0] && is_same_qid(root_qid, wqid[0]));

Mmm, that's actually not checking whether fid was unaffected by the walk. It 
just checks whether the QID returned by Rwalk equals the root QID, period.

I suggest I leave this check here (just remove the false comment there), it's 
still OK to do that check, but additionally I would send a Tgetatrr on the 
walked fid to do the actual "fid unaffected" check?

I'll wait to see if you spot something else before posting any v4.

>  }
> 
>  static void fs_walk_none(void *obj, void *data, QGuestAllocator *t_alloc)





RE: [PATCH v5 41/48] target/nios2: Introduce shadow register sets

2022-03-13 Thread Amir Gonnen
(Continue discussion from Re: [PATCH v4 24/33] target/nios2: Introduce shadow 
register sets)

> > How does "cpu_crs_R" work?
> > ... Otherwise, each gpr access would be indirect. I'm probably missing 
> > something here.

> They are indirect, but with some optimization.

Why not always access directly?  
With an EIC each interrupt handler is associated with a specific shadow 
register set, so we can expect that (on a sane use case) each block always 
executes on the same register set.  
If we update cpu_get_tb_cpu_state to translate differently based on STATUS.CRS 
we would still end up with a single translation for each block.  
This way the translator could emit direct registers access for shadow 
registers, and we won't need to rely on optimizations to lower indirect access.

Thanks,
Amir





Re: [PATCH v3 7/7] tests/9pfs: check fid being unaffected in fs_walk_2nd_nonexistent

2022-03-13 Thread Christian Schoenebeck
On Sonntag, 13. März 2022 11:27:56 CET Christian Schoenebeck wrote:
> On Sonntag, 13. März 2022 10:28:32 CET Christian Schoenebeck wrote:
> > Extend previously added test case by checking that fid is unaffected
> > by 'Twalk' request (i.e. when 2nd path component of request being
> > invalid). Do that by comparing the QID of root fid with QID of walked
> > fid; they should be identical.
> > 
> > Signed-off-by: Christian Schoenebeck 
> > ---
> > 
> >  tests/qtest/virtio-9p-test.c | 14 +++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> > index f6e78d388e..b9c6819d01 100644
> > --- a/tests/qtest/virtio-9p-test.c
> > +++ b/tests/qtest/virtio-9p-test.c
> > @@ -721,14 +721,19 @@ static void fs_version(void *obj, void *data,
> > QGuestAllocator *t_alloc) do_version(obj);
> > 
> >  }
> > 
> > -static void do_attach(QVirtio9P *v9p)
> > +static void do_attach_rqid(QVirtio9P *v9p, v9fs_qid *qid)
> > 
> >  {
> >  
> >  P9Req *req;
> >  
> >  do_version(v9p);
> >  req = v9fs_tattach(v9p, 0, getuid(), 0);
> >  v9fs_req_wait_for_reply(req, NULL);
> > 
> > -v9fs_rattach(req, NULL);
> > +v9fs_rattach(req, qid);
> > +}
> > +
> > +static void do_attach(QVirtio9P *v9p)
> > +{
> > +do_attach_rqid(v9p, NULL);
> > 
> >  }
> >  
> >  static void fs_attach(void *obj, void *data, QGuestAllocator *t_alloc)
> > 
> > @@ -1101,19 +1106,22 @@ static void fs_walk_2nd_nonexistent(void *obj,
> > void
> > *data, {
> > 
> >  QVirtio9P *v9p = obj;
> >  alloc = t_alloc;
> > 
> > +v9fs_qid root_qid;
> > 
> >  uint16_t nwqid;
> >  g_autofree v9fs_qid *wqid = NULL;
> >  g_autofree char *path = g_strdup_printf(
> >  
> >  QTEST_V9FS_SYNTH_WALK_FILE "/non-existent", 0
> >  
> >  );
> > 
> > -do_attach(v9p);
> > +do_attach_rqid(v9p, &root_qid);
> > 
> >  do_walk_rqids(v9p, path, &nwqid, &wqid);
> >  /*
> >  
> >   * The 9p2000 protocol spec says: "nwqid is therefore either nwname
> >   or
> > 
> > the * index of the first elementwise walk that failed."
> > 
> >   */
> >  
> >  assert(nwqid == 1);
> > 
> > +/* expect fid being unaffected by walk */
> > +g_assert(wqid && wqid[0] && is_same_qid(root_qid, wqid[0]));
> 
> Mmm, that's actually not checking whether fid was unaffected by the walk. It
> just checks whether the QID returned by Rwalk equals the root QID, period.
> 
> I suggest I leave this check here (just remove the false comment there),
> it's still OK to do that check, but additionally I would send a Tgetatrr on
> the walked fid to do the actual "fid unaffected" check?
> 
> I'll wait to see if you spot something else before posting any v4.

There is something more cheesy: wqid[0] (i.e. first subdir) should actually 
*NOT* be equal to the root QID. The g_assert check above however does not 
fail.

After debugging this, the root cause seems to be that the 'synth' driver's 
root node has the same inode number (zero) as the first subdirectory. The 
following fixes this synth driver bug for me:

--- a/hw/9pfs/9p-synth.c
+++ b/hw/9pfs/9p-synth.c
@@ -92,7 +92,7 @@ int qemu_v9fs_synth_mkdir(V9fsSynthNode *parent, int mode,
 }
 }
 /* Add the name */
-node = v9fs_add_dir_node(parent, mode, name, NULL, synth_node_count++);
+node = v9fs_add_dir_node(parent, mode, name, NULL, ++synth_node_count);
 v9fs_add_dir_node(node, parent->attr->mode, "..",
   parent->attr, parent->attr->inode);
 v9fs_add_dir_node(node, node->attr->mode, ".",
@@ -130,7 +130,7 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int 
mode,
 mode = ((mode & 0777) | S_IFREG);
 node = g_malloc0(sizeof(V9fsSynthNode));
 node->attr = &node->actual_attr;
-node->attr->inode  = synth_node_count++;
+node->attr->inode  = ++synth_node_count;
 node->attr->nlink  = 1;
 node->attr->read   = read;
 node->attr->write  = write;

That way root node would have inode number zero, 1st subdir inode 1, and so 
on.

Best regards,
Christian Schoenebeck





Re: [PATCH v2 04/11] edk2: .git can be a file

2022-03-13 Thread Philippe Mathieu-Daudé

On 11/3/22 06:37, Gerd Hoffmann wrote:

Signed-off-by: Gerd Hoffmann 
---
  roms/Makefile.edk2 | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 09/11] edk2/docker: install python3

2022-03-13 Thread Philippe Mathieu-Daudé

On 11/3/22 06:37, Gerd Hoffmann wrote:

python2 is not supported any more,
so go install python3 instead.

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Alex Bennée 
---
  .gitlab-ci.d/edk2/Dockerfile | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 





Re: [PATCH v2 10/11] edk2/docker: use ubuntu 18.04

2022-03-13 Thread Philippe Mathieu-Daudé

On 11/3/22 06:37, Gerd Hoffmann wrote:

Upstream CI uses ubuntu 18.04 too, so pick
that version (instead of something newer).

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Alex Bennée 
---
  .gitlab-ci.d/edk2/Dockerfile | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [RFC PATCH] mailmap/gitdm: more fixes for bad tags and authors

2022-03-13 Thread Philippe Mathieu-Daudé

On 11/3/22 14:52, Alex Bennée wrote:

I was running some historical tags for the last 10 years and got the
following warnings:

   git log --use-mailmap --numstat --since "June 2010" | ~/src/gitdm.git/gitdm 
-n -l 5
   alar...@ddci.com is an author name, probably not what you want
   bad utf-8 ('utf-8' codec can't decode byte 0xe4 in position 552: invalid 
continuation byte) in patchm skipping
   bad utf-8 ('utf-8' codec can't decode byte 0xe4 in position 342: invalid 
continuation byte) in patchm skipping
   mich...@ozlabs.org  is an author name, probably not what you want
   Oops...funky email nicta.com.au
   bad utf-8 ('utf-8' codec can't decode byte 0xe9 in position 232: invalid 
continuation byte) in patchm skipping
   Oops...funky email andreas.faerber
   Grabbing changesets...done
   Processed 76422 csets from 1902 developers

The following fixes try and alleviate that although I still get a
warning for Aaron which I think is from 9743cd5736.

Signed-off-by: Alex Bennée 
Cc: Aaron Larson 
Cc: Andreas Färber 
Cc: Jason Wang 
Cc: Michael Ellerman 
Cc: Peter Chubb 
---
  .mailmap  | 6 ++
  contrib/gitdm/aliases | 5 -
  2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/contrib/gitdm/aliases b/contrib/gitdm/aliases
index 4792413ce7..5b31635c15 100644
--- a/contrib/gitdm/aliases
+++ b/contrib/gitdm/aliases
@@ -34,8 +34,11 @@ malc@c046a42c-6fe2-441c-8c8c-71466251a162 av1...@comtv.ru
  # canonical emails
  liq...@163.com liq...@gmail.com
  
-# some broken tags

+# some broken DCO tags



+alar...@ddci.com alar...@ddci.com


Is this one useful? Otherwise:
Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH] ppc/xive2: Make type Xive2EndSource not user creatable

2022-03-13 Thread Philippe Mathieu-Daudé

On 11/3/22 08:43, Cédric Le Goater wrote:

Xive2EndSource objects can only be instantiated through a Xive2Router
(PnvXive2).

Suggested-by: Thomas Huth 


Reported-by?


Signed-off-by: Cédric Le Goater 
---
  hw/intc/xive2.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH experiment 00/35] stackless coroutine backend

2022-03-13 Thread Paolo Bonzini

On 3/11/22 13:17, Daniel P. Berrangé wrote:

Only files that define or use a coroutine_fn (which includes callers of
qemu_coroutine_create) would have to be compiled as C++.

Unless I'm misunderstanding what you mean, "define a coroutine_fn"
is a very large number of functions/files

   $ git grep coroutine_fn | wc -l
   806
   $ git grep -l coroutine_fn | wc -l
   132

Dominated by the block layer of course, but tentacles spreading
out into alot of other code.


The main other user is 9pfs, then there is:

hw/remote/message.c
io/channel.c
job.c
migration/savevm.c
monitor/hmp-cmds.c
monitor/monitor-internal.h
monitor/qmp.c
nbd/client-connection.c
nbd/client.c
nbd/server.c
net/colo-compare.c
net/filter-mirror.c
scsi/pr-manager.c
scsi/qemu-pr-helper.c
ui/console.c
util/vhost-user-server.c


Feels like identifying all callers would be tedious/unpleasant enough,
that practically speaking we would have to just compile all of QEMU
as C++.


Yes, it's a large amount of code, but it's relatively self-contained. 
In io/ for example only three functions would have to become C++ 
(qio_channel_readv_full_all_eof, qio_channel_writev_full_all, 
qio_channel_yield), and it's easy to move them to a separate file 
io/channel-coroutine.cc.


Likewise for e.g. util/async.c or util/thread-pool.c (one function each).

The block layer would almost entirely move to C++, that's for sure.  The 
monitor would be a bit more in the middle, but hardware emulation can 
remain 100% C.


I haven't gotten the thing to compile or run yet, and I'm not sure how 
much time I'll have this week, but the change for test-coroutine.c to 
run should be in the ballpark of this:


 include/qemu/coroutine.h |  26
 tests/unit/meson.build   |   6
 tests/unit/{test-coroutine.c => test-coroutine.cc}   | 106
 util/meson.build |   4
 util/{qemu-coroutine-lock.c => qemu-coroutine-lock.cc}   |  65
 util/{qemu-coroutine-sleep.c => qemu-coroutine-sleep.cc} |  10

where the changes are for a good part mechanical: switching from "x 
coroutine_fn" to CoroutineFn entirely so, while adding co_await in 
front of coroutine calls is half mechanical.  For non-void functions, 
the compiler can identify all callers (because the old type "int" is not 
compatible with the new type CoroutineFn).  For void function one 
could use warn_unused_result.


The question is what is easier to maintain, stack switching code that is 
becoming less and less portable (status quo with SafeStack, CET and the 
TLS issues that Stefan has worked on), a mixed C/C++ codebase (C++ 
coroutines), a custom source-to-source translator (this series).  The 
third might be more fun, but it would be quite a large enterprise and 
the C++ compiler writers have already done the work.


A part of the changes is common in both cases, since you cannot have 
code that can run both inside or outside a coroutine.


Paolo



Re: Question about atomics

2022-03-13 Thread Richard Henderson

On 3/12/22 20:59, Warner Losh wrote:

FreeBSD's pthread_mutex is shared between the kernel and user land.
So it does a compare and set to take the lock. Uncontested and unheld
locks will mean we've taken the lock and return. Contested locks
are kicked to the kernel to wait. When userland releases the lock
it signals the kernel to wakeup via a system call. The kernel then
does a cas to try to acquire the lock. It either returns with the lock
held, or goes back to sleep. This we have atomics operating both in
the kernel (via standard host atomics) and userland atomics done
via start/end_exclusive.


You need to use standard host atomics for this case.

r~



Re: [PATCH v5 41/48] target/nios2: Introduce shadow register sets

2022-03-13 Thread Richard Henderson

On 3/13/22 04:55, Amir Gonnen wrote:

(Continue discussion from Re: [PATCH v4 24/33] target/nios2: Introduce shadow 
register sets)


How does "cpu_crs_R" work?
... Otherwise, each gpr access would be indirect. I'm probably missing 
something here.



They are indirect, but with some optimization.


Why not always access directly?
With an EIC each interrupt handler is associated with a specific shadow 
register set, so we can expect that (on a sane use case) each block always 
executes on the same register set.
If we update cpu_get_tb_cpu_state to translate differently based on STATUS.CRS 
we would still end up with a single translation for each block.
This way the translator could emit direct registers access for shadow 
registers, and we won't need to rely on optimizations to lower indirect access.


We could do that if we support fewer than 64 shadow register sets:

#define TCG_MAX_TEMPS 512

Global temps (e.g. cpu_R[]) count against that limit, and 64 * 32 == 2048.
The maximum number of shadow reg sets you could support like this would be 
about 8.


r~



Re: Question about atomics

2022-03-13 Thread Warner Losh
On Sun, Mar 13, 2022, 10:47 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 3/12/22 20:59, Warner Losh wrote:
> > FreeBSD's pthread_mutex is shared between the kernel and user land.
> > So it does a compare and set to take the lock. Uncontested and unheld
> > locks will mean we've taken the lock and return. Contested locks
> > are kicked to the kernel to wait. When userland releases the lock
> > it signals the kernel to wakeup via a system call. The kernel then
> > does a cas to try to acquire the lock. It either returns with the lock
> > held, or goes back to sleep. This we have atomics operating both in
> > the kernel (via standard host atomics) and userland atomics done
> > via start/end_exclusive.
>
> You need to use standard host atomics for this case.
>

Or use the start/end_exclusive for both by emulating the kernel call, I
presume? It's the mixing that's the problem, right?

Warner

>


Re: Question about atomics

2022-03-13 Thread Richard Henderson

On 3/13/22 09:57, Warner Losh wrote:



On Sun, Mar 13, 2022, 10:47 AM Richard Henderson > wrote:


On 3/12/22 20:59, Warner Losh wrote:
 > FreeBSD's pthread_mutex is shared between the kernel and user land.
 > So it does a compare and set to take the lock. Uncontested and unheld
 > locks will mean we've taken the lock and return. Contested locks
 > are kicked to the kernel to wait. When userland releases the lock
 > it signals the kernel to wakeup via a system call. The kernel then
 > does a cas to try to acquire the lock. It either returns with the lock
 > held, or goes back to sleep. This we have atomics operating both in
 > the kernel (via standard host atomics) and userland atomics done
 > via start/end_exclusive.

You need to use standard host atomics for this case.


Or use the start/end_exclusive for both by emulating the kernel call, I presume? It's the 
mixing that's the problem, right?


Well, preferably no.  Use start/end_exclusive only when you have no alternative, which for 
a simple CAS should not be the case on any FreeBSD host.


Using start/end_exclusive is entirely local to the current process, and means you don't 
have atomicity across processes.  Which can cause problems when emulating an entire chroot.



r~



Re: Question about atomics

2022-03-13 Thread Warner Losh
On Sun, Mar 13, 2022 at 11:03 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 3/13/22 09:57, Warner Losh wrote:
> >
> >
> > On Sun, Mar 13, 2022, 10:47 AM Richard Henderson <
> richard.hender...@linaro.org
> > > wrote:
> >
> > On 3/12/22 20:59, Warner Losh wrote:
> >  > FreeBSD's pthread_mutex is shared between the kernel and user
> land.
> >  > So it does a compare and set to take the lock. Uncontested and
> unheld
> >  > locks will mean we've taken the lock and return. Contested locks
> >  > are kicked to the kernel to wait. When userland releases the lock
> >  > it signals the kernel to wakeup via a system call. The kernel then
> >  > does a cas to try to acquire the lock. It either returns with the
> lock
> >  > held, or goes back to sleep. This we have atomics operating both
> in
> >  > the kernel (via standard host atomics) and userland atomics done
> >  > via start/end_exclusive.
> >
> > You need to use standard host atomics for this case.
> >
> >
> > Or use the start/end_exclusive for both by emulating the kernel call, I
> presume? It's the
> > mixing that's the problem, right?
>
> Well, preferably no.  Use start/end_exclusive only when you have no
> alternative, which for
> a simple CAS should not be the case on any FreeBSD host.
>
> Using start/end_exclusive is entirely local to the current process, and
> means you don't
> have atomicity across processes.  Which can cause problems when emulating
> an entire chroot.
>

So I was assuming that the cas instructions for arm use start/end_exclsive
under the covers.
Is that the case? Or is there something clever there I've overlooked and
the start/end_exclusive
stuff is only used for fallbacks?

Warner


Re: Question about atomics

2022-03-13 Thread Richard Henderson

On 3/13/22 11:29, Warner Losh wrote:

So I was assuming that the cas instructions for arm use start/end_exclsive 
under the covers.
Is that the case?


Nope.  The store exclusive guest insn is implemented with a host cmpxchg.

Oh, I'd forgotten about the old arm cmpxchg64 syscall, which is still implemented with 
start/end_exclusive.  That should get fixed...



r~



Re: Question about atomics

2022-03-13 Thread Warner Losh
On Sun, Mar 13, 2022 at 2:19 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 3/13/22 11:29, Warner Losh wrote:
> > So I was assuming that the cas instructions for arm use
> start/end_exclsive under the covers.
> > Is that the case?
>
> Nope.  The store exclusive guest insn is implemented with a host cmpxchg.
>

Oh? Out of paranoia, how can I verify that this is the case when compiled
on FreeBSD?
Perhaps the atomic sequence FreeBSD uses differs a little from Linux and we
don't trigger
that code? Or there's some adjustment that I've not made yet... the code
seems to work
on 3.1 but not on latest, and there's been a lot of changes to tcg, so I'd
like to rule it
out since there's a lot of other change too and there's too many
variables...


> Oh, I'd forgotten about the old arm cmpxchg64 syscall, which is still
> implemented with
> start/end_exclusive.  That should get fixed...
>

FreeBSD doesn't have this helper. So bsd-user doesn't implement it... The
only
good thing about that is that there's nothing for me to fix :/...

Warner


>
> r~
>


Re: Question about atomics

2022-03-13 Thread Richard Henderson

On 3/13/22 21:09, Warner Losh wrote:

Oh? Out of paranoia, how can I verify that this is the case when compiled on 
FreeBSD?
Perhaps the atomic sequence FreeBSD uses differs a little from Linux and we 
don't trigger
that code?


$ objdump -dr libqemu-arm-*-user.fa.p/accel_tcg_user-exec.c.o

1490 :
...
14b7:   e8 04 ec ff ff  callq  c0 

14bc:   48 89 c2mov%rax,%rdx
14bf:   44 89 e0mov%r12d,%eax
14c2:   f0 44 0f b1 32  lock cmpxchg %r14d,(%rdx)


r~



[PATCH 0/3] linux-user/arm: Improvements for commpage atomics

2022-03-13 Thread Richard Henderson
Only the memory_barrer one is a bug; the rest improve system-wide
interaction, and would only affect pre-armv6.


r~


Richard Henderson (3):
  linux-user/arm: Implement __kernel_memory_barrier
  linux-user/arm: Implement __kernel_cmpxchg with host atomics
  linux-user/arm: Implement __kernel_cmpxchg64 with host atomics

 linux-user/arm/cpu_loop.c | 166 +++---
 1 file changed, 99 insertions(+), 67 deletions(-)

-- 
2.25.1




[PATCH 3/3] linux-user/arm: Implement __kernel_cmpxchg64 with host atomics

2022-03-13 Thread Richard Henderson
If CONFIG_ATOMIC64, we can use a host cmpxchg and provide
atomicity across processes; otherwise we have no choice but
to continue using start/end_exclusive.

Signed-off-by: Richard Henderson 
---
 linux-user/arm/cpu_loop.c | 79 +++
 1 file changed, 38 insertions(+), 41 deletions(-)

diff --git a/linux-user/arm/cpu_loop.c b/linux-user/arm/cpu_loop.c
index 0122bb34f7..d9651f199f 100644
--- a/linux-user/arm/cpu_loop.c
+++ b/linux-user/arm/cpu_loop.c
@@ -136,7 +136,7 @@ static void arm_kernel_cmpxchg32_helper(CPUARMState *env)
 }
 
 /*
- * See the Linux kernel's Documentation/arm/kernel_user_helpers.txt
+ * See the Linux kernel's Documentation/arm/kernel_user_helpers.rst
  * Input:
  * r0 = pointer to oldval
  * r1 = pointer to newval
@@ -153,57 +153,54 @@ static void arm_kernel_cmpxchg64_helper(CPUARMState *env)
 {
 uint64_t oldval, newval, val;
 uint32_t addr, cpsr;
+uint64_t *host_addr;
 
-/* Based on the 32 bit code in do_kernel_trap */
+addr = env->regs[0];
+if (get_user_u64(oldval, addr)) {
+goto segv;
+}
 
-/* XXX: This only works between threads, not between processes.
-   It's probably possible to implement this with native host
-   operations. However things like ldrex/strex are much harder so
-   there's not much point trying.  */
-start_exclusive();
-cpsr = cpsr_read(env);
+addr = env->regs[1];
+if (get_user_u64(newval, addr)) {
+goto segv;
+}
+
+mmap_lock();
 addr = env->regs[2];
-
-if (get_user_u64(oldval, env->regs[0])) {
-env->exception.vaddress = env->regs[0];
-goto segv;
-};
-
-if (get_user_u64(newval, env->regs[1])) {
-env->exception.vaddress = env->regs[1];
-goto segv;
-};
-
-if (get_user_u64(val, addr)) {
-env->exception.vaddress = addr;
-goto segv;
+host_addr = atomic_mmu_lookup(env, addr, 8);
+if (!host_addr) {
+mmap_unlock();
+return;
 }
 
+#ifdef CONFIG_ATOMIC64
+val = qatomic_cmpxchg__nocheck(host_addr, oldval, newval);
+cpsr = (val == oldval) * CPSR_C;
+#else
+/*
+ * This only works between threads, not between processes, but since
+ * the host has no 64-bit cmpxchg, it is the best that we can do.
+ */
+start_exclusive();
+val = *host_addr;
 if (val == oldval) {
-val = newval;
-
-if (put_user_u64(val, addr)) {
-env->exception.vaddress = addr;
-goto segv;
-};
-
-env->regs[0] = 0;
-cpsr |= CPSR_C;
+*host_addr = newval;
+cpsr = CPSR_C;
 } else {
-env->regs[0] = -1;
-cpsr &= ~CPSR_C;
+cpsr = 0;
 }
-cpsr_write(env, cpsr, CPSR_C, CPSRWriteByInstr);
 end_exclusive();
+#endif
+mmap_unlock();
+
+cpsr_write(env, cpsr, CPSR_C, CPSRWriteByInstr);
+env->regs[0] = cpsr ? 0 : -1;
 return;
 
-segv:
-end_exclusive();
-/* We get the PC of the entry address - which is as good as anything,
-   on a real kernel what you get depends on which mode it uses. */
-/* XXX: check env->error_code */
-force_sig_fault(TARGET_SIGSEGV, TARGET_SEGV_MAPERR,
-env->exception.vaddress);
+ segv:
+force_sig_fault(TARGET_SIGSEGV,
+page_get_flags(addr) & PAGE_VALID ?
+TARGET_SEGV_ACCERR : TARGET_SEGV_MAPERR, addr);
 }
 
 /* Handle a jump to the kernel code page.  */
-- 
2.25.1




[PATCH 1/3] linux-user/arm: Implement __kernel_memory_barrier

2022-03-13 Thread Richard Henderson
This fallback syscall was stubbed out.
It would only matter for emulating pre-armv6.

Signed-off-by: Richard Henderson 
---
 linux-user/arm/cpu_loop.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/arm/cpu_loop.c b/linux-user/arm/cpu_loop.c
index 032e1ffddf..a0e43b261c 100644
--- a/linux-user/arm/cpu_loop.c
+++ b/linux-user/arm/cpu_loop.c
@@ -158,7 +158,7 @@ do_kernel_trap(CPUARMState *env)
 
 switch (env->regs[15]) {
 case 0x0fa0: /* __kernel_memory_barrier */
-/* ??? No-op. Will need to do better for SMP.  */
+smp_mb();
 break;
 case 0x0fc0: /* __kernel_cmpxchg */
  /* XXX: This only works between threads, not between processes.
-- 
2.25.1




Re: Question about atomics

2022-03-13 Thread Richard Henderson

On 3/13/22 21:09, Warner Losh wrote:

Oh? Out of paranoia, how can I verify that this is the case when compiled on 
FreeBSD?
Perhaps the atomic sequence FreeBSD uses differs a little from Linux and we 
don't trigger
that code? Or there's some adjustment that I've not made yet... the code seems 
to work
on 3.1 but not on latest, and there's been a lot of changes to tcg, so I'd like 
to rule it
out since there's a lot of other change too and there's too many variables...


Can you point me at this code on your branch?


r~



[PATCH 2/3] linux-user/arm: Implement __kernel_cmpxchg with host atomics

2022-03-13 Thread Richard Henderson
The existing implementation using start/end_exclusive
does not provide atomicity across processes.

Signed-off-by: Richard Henderson 
---
 linux-user/arm/cpu_loop.c | 85 +++
 1 file changed, 60 insertions(+), 25 deletions(-)

diff --git a/linux-user/arm/cpu_loop.c b/linux-user/arm/cpu_loop.c
index a0e43b261c..0122bb34f7 100644
--- a/linux-user/arm/cpu_loop.c
+++ b/linux-user/arm/cpu_loop.c
@@ -75,7 +75,65 @@
 put_user_u16(__x, (gaddr)); \
 })
 
-/* Commpage handling -- there is no commpage for AArch64 */
+/*
+ * Similar to code in accel/tcg/user-exec.c, but outside the execution loop.
+ * Must be called with mmap_lock.
+ */
+static void *atomic_mmu_lookup(CPUArchState *env, uint32_t addr, int size)
+{
+int need_flags = PAGE_READ | PAGE_WRITE_ORG | PAGE_VALID;
+int page_flags;
+
+/* Enforce guest required alignment.  */
+if (unlikely(addr & (size - 1))) {
+force_sig_fault(TARGET_SIGBUS, TARGET_BUS_ADRALN, addr);
+return NULL;
+}
+
+page_flags = page_get_flags(addr);
+if (unlikely((page_flags & need_flags) != need_flags)) {
+force_sig_fault(TARGET_SIGSEGV,
+page_flags & PAGE_VALID ?
+TARGET_SEGV_ACCERR : TARGET_SEGV_MAPERR, addr);
+return NULL;
+}
+
+return g2h(env_cpu(env), addr);
+}
+
+/*
+ * See the Linux kernel's Documentation/arm/kernel_user_helpers.rst
+ * Input:
+ * r0 = oldval
+ * r1 = newval
+ * r2 = pointer to target value
+ *
+ * Output:
+ * r0 = 0 if *ptr was changed, non-0 if no exchange happened
+ * C set if *ptr was changed, clear if no exchange happened
+ */
+static void arm_kernel_cmpxchg32_helper(CPUARMState *env)
+{
+uint32_t oldval, newval, val, addr, cpsr, *host_addr;
+
+oldval = env->regs[0];
+newval = env->regs[1];
+addr = env->regs[2];
+
+mmap_lock();
+host_addr = atomic_mmu_lookup(env, addr, 8);
+if (!host_addr) {
+mmap_unlock();
+return;
+}
+
+val = qatomic_cmpxchg__nocheck(host_addr, oldval, newval);
+mmap_unlock();
+
+cpsr = (val == oldval) * CPSR_C;
+cpsr_write(env, cpsr, CPSR_C, CPSRWriteByInstr);
+env->regs[0] = cpsr ? 0 : -1;
+}
 
 /*
  * See the Linux kernel's Documentation/arm/kernel_user_helpers.txt
@@ -153,36 +211,13 @@ static int
 do_kernel_trap(CPUARMState *env)
 {
 uint32_t addr;
-uint32_t cpsr;
-uint32_t val;
 
 switch (env->regs[15]) {
 case 0x0fa0: /* __kernel_memory_barrier */
 smp_mb();
 break;
 case 0x0fc0: /* __kernel_cmpxchg */
- /* XXX: This only works between threads, not between processes.
-It's probably possible to implement this with native host
-operations. However things like ldrex/strex are much harder so
-there's not much point trying.  */
-start_exclusive();
-cpsr = cpsr_read(env);
-addr = env->regs[2];
-/* FIXME: This should SEGV if the access fails.  */
-if (get_user_u32(val, addr))
-val = ~env->regs[0];
-if (val == env->regs[0]) {
-val = env->regs[1];
-/* FIXME: Check for segfaults.  */
-put_user_u32(val, addr);
-env->regs[0] = 0;
-cpsr |= CPSR_C;
-} else {
-env->regs[0] = -1;
-cpsr &= ~CPSR_C;
-}
-cpsr_write(env, cpsr, CPSR_C, CPSRWriteByInstr);
-end_exclusive();
+arm_kernel_cmpxchg32_helper(env);
 break;
 case 0x0fe0: /* __kernel_get_tls */
 env->regs[0] = cpu_get_tls(env);
-- 
2.25.1




Re: Question about atomics

2022-03-13 Thread Warner Losh
On Sun, Mar 13, 2022 at 10:43 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 3/13/22 21:09, Warner Losh wrote:
> > Oh? Out of paranoia, how can I verify that this is the case when
> compiled on FreeBSD?
> > Perhaps the atomic sequence FreeBSD uses differs a little from Linux and
> we don't trigger
> > that code? Or there's some adjustment that I've not made yet... the code
> seems to work
> > on 3.1 but not on latest, and there's been a lot of changes to tcg, so
> I'd like to rule it
> > out since there's a lot of other change too and there's too many
> variables...
>
> Can you point me at this code on your branch?
>

I just pushed this to gitlab:

https://gitlab.com/bsdimp/qemu/-/tree/blitz

Thanks for any insight that you can provide...

Warner


Re: Question about atomics

2022-03-13 Thread Warner Losh
On Sun, Mar 13, 2022 at 10:36 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 3/13/22 21:09, Warner Losh wrote:
> > Oh? Out of paranoia, how can I verify that this is the case when
> compiled on FreeBSD?
> > Perhaps the atomic sequence FreeBSD uses differs a little from Linux and
> we don't trigger
> > that code?
>
> $ objdump -dr libqemu-arm-*-user.fa.p/accel_tcg_user-exec.c.o
>
> 1490 :
> ...
>  14b7:   e8 04 ec ff ff  callq  c0
> 
>  14bc:   48 89 c2mov%rax,%rdx
>  14bf:   44 89 e0mov%r12d,%eax
>  14c2:   f0 44 0f b1 32  lock cmpxchg %r14d,(%rdx)
>

Looks like this compiles correctly on FreeBSD... We have something similar:

1f69:   41 89 f1mov%esi,%r9d
1f6c:   48 8b 3d 00 00 00 00mov0x0(%rip),%rdi# 1f73

1f73:   64 48 8b 34 25 00 00mov%fs:0x0,%rsi
1f7a:   00 00
1f7c:   48 89 8e 00 00 00 00mov%rcx,0x0(%rsi)
1f83:   89 d0   mov%edx,%eax
1f85:   f0 46 0f b1 04 0f   lock cmpxchg %r8d,(%rdi,%r9,1)

Warner


> r~
>


Re: [RFC v4 01/21] vfio-user: introduce vfio-user protocol specification

2022-03-13 Thread John Johnson


> On Mar 9, 2022, at 2:34 PM, Alex Williamson  
> wrote:
> 
>> 
>> +
>> +VFIO region info type cap
>> +"
>> +
>> +The VFIO region info type is defined in 
>> +(``struct vfio_region_info_cap_type``).
>> +
>> ++-++--+
>> +| Name| Offset | Size |
>> ++=++==+
>> +| type| 0  | 4|
>> ++-++--+
>> +| subtype | 4  | 4|
>> ++-++--+
>> +
>> +The only device-specific region type and subtype supported by vfio-user is
>> +``VFIO_REGION_TYPE_MIGRATION`` (3) and ``VFIO_REGION_SUBTYPE_MIGRATION`` 
>> (1).
> 
> These should be considered deprecated from the kernel interface.  I
> hope there are plans for vfio-user to adopt the new interface that's
> currently available in linux-next and intended for v5.18.
> 


We will follow the interfaces that QEMU uses.  Do you have an idea
when the VFIO v2 changes will be pulled into QEMU?  Will the v2 code be
experimental like v1 was?



> ...
>> +Unused VFIO ``ioctl()`` commands
>> +
>> +
>> +The following VFIO commands do not have an equivalent vfio-user command:
>> +
>> +* ``VFIO_GET_API_VERSION``
>> +* ``VFIO_CHECK_EXTENSION``
>> +* ``VFIO_SET_IOMMU``
>> +* ``VFIO_GROUP_GET_STATUS``
>> +* ``VFIO_GROUP_SET_CONTAINER``
>> +* ``VFIO_GROUP_UNSET_CONTAINER``
>> +* ``VFIO_GROUP_GET_DEVICE_FD``
>> +* ``VFIO_IOMMU_GET_INFO``
>> +
>> +However, once support for live migration for VFIO devices is finalized some
>> +of the above commands may have to be handled by the client in their
>> +corresponding vfio-user form. This will be addressed in a future protocol
>> +version.
> 
> As above, I'd go ahead and drop the migration region interface support,
> it's being removed from the kernel.  Dirty page handling might also be
> something you want to pull back on as we're expecting in-kernel vfio to
> essentially deprecate its iommu backends in favor of a new shared
> userspace iommufd interface.  We expect to have backwards compatibility
> via that interface, but as QEMU migration support for vfio-pci devices
> is experimental and there are desires not to consolidate dirty page
> tracking behind the iommu interface in the new model, it's not clear if
> the kernel will continue to expose the current dirty page tracking.
> 
> AIUI, we're expecting to see patches officially proposing the iommufd
> interface in the kernel "soon".  Thanks,
> 


I’m not very concerned about which host kernel drivers QEMU will
interact with, as vfio-user doesn’t talk to any of them.  I am interested
in the data structures being used.  Will they remain dirty bitmap based,
or is a different structure being entertained?

JJ




Re: [PATCH v2 0/3] hw/arm/virt: Fix CPU's default NUMA node ID

2022-03-13 Thread Gavin Shan

Hi Igor,

On 3/3/22 11:11 AM, Gavin Shan wrote:

When the CPU-to-NUMA association isn't provided by user, the default NUMA
node ID for the specific CPU is returned from virt_get_default_cpu_node_id().
Unfortunately, the default NUMA node ID breaks socket boundary and leads to
the broken CPU topology warning message in Linux guest. This series intends
to fix the issue.

PATCH[1/3]: Fixes the broken CPU topology by considering the socket boundary
 when the default NUMA node ID is calculated.
PATCH[2/3]: Use the existing CPU topology to build PPTT table. However, the
 cluster ID has to be calculated dynamically because there is no
 corresponding information in CPU instance properties.
PATCH[3/3]: Take thread ID as the ACPI processor ID in MDAT and SRAT tables.

Changelog
=
v2:
* Populate the CPU topology in virt_possible_cpu_arch_ids() so that it
  can be reused in virt_get_default_cpu_node_id()  
(Igor)
* Added PATCH[2/3] to use the existing CPU topology when PPTT table
  is built 
(Igor)
* Added PATCH[3/3] to take thread ID as ACPI processor ID in MADT and
  SRAT table   
(Gavin)



Kindly ping. Could you help to review when you have free cycles? :)

Thanks,
Gavin