Hello all, We have found an issue in the 9p implementation of QEMU, with how qid paths are generated, which can cause qid path collisions and several issues caused by them. In our use case (running containers under VMs) these have proven to be critical.
In particular, stat_to_qid in hw/9pfs/9p.c generates a qid path using the inode number of the file as input. According to the 9p spec the path should be able to uniquely identify a file, distinct files should not share a path value. The current implementation that defines qid.path = inode nr works fine as long as there are not files from multiple partitions visible under the 9p share. In that case, distinct files from different devices are allowed to have the same inode number. So with multiple partitions, we have a very high probability of qid path collisions. How to demonstrate the issue: 1) Prepare a problematic share: - mount one partition under share/p1/ with some files inside - mount another one *with identical contents* under share/p2/ - confirm that both partitions have files with same inode nr, size, etc 2) Demonstrate breakage: - start a VM with a virtio-9p pointing to the share - mount 9p share with FSCACHE on - keep open share/p1/file - open and write to share/p2/file What should happen is, the guest will consider share/p1/file and share/p2/file to be the same file, and since we are using the cache it will not reopen it. We intended to write to partition 2, but we just wrote to partition 1. This is just one example on how the guest may rely on qid paths being unique. In the use case of containers where we commonly have a few containers per VM, all based on similar images, these kind of qid path collisions are very common and they seem to cause all kinds of funny behavior (sometimes very subtle). To avoid this situation, the device id of a file needs to be also taken as input for generating a qid path. Unfortunately, the size of both inode nr + device id together would be 96 bits, while we have only 64 bits for the qid path, so we can't just append them and call it a day :( We have thought of a few approaches, but we would definitely like to hear what the upstream maintainers and community think: * Full fix: Change the 9p protocol We would need to support a longer qid path, based on a virtio feature flag. This would take reworking of host and guest parts of virtio-9p, so both QEMU and Linux for most users. * Fallback and/or interim solutions A virtio feature flag may be refused by the guest, so we think we still need to make collisions less likely even with 64 bit paths. E.g. 1. XOR the device id with inode nr to produce the qid path (we attach a proof of concept patch) 2. 64 bit hash of device id and inode nr 3. other ideas, such as allocating new qid paths and keep a look up table of some sorts (possibly too expensive) With our proof of concept patch, the issues caused by qid path collisions go away, so it can be seen as an interim solution of sorts. However, the chance of collisions is not eliminated, we are just replacing the current strategy, which is almost guaranteed to cause collisions in certain use cases, with one that makes them more rare. We think that a virtio feature flag for longer qid paths is the only way to eliminate these issues completely. This is the extent that we were able to analyze the issue from our side, we would like to hear if there are some better ideas, or which approach would be more appropriate for upstream. Best regards, -- Antonios Motakis Virtualization Engineer Huawei Technologies Duesseldorf GmbH European Research Center Riesstrasse 25, 80992 München
From bd59f504e6806dac5b3c1bd9c626de085987f1e0 Mon Sep 17 00:00:00 2001 From: Veaceslav Falico <veaceslav.fal...@huawei.com> Date: Fri, 12 Jan 2018 19:26:18 +0800 Subject: [PATCH] 9pfs: stat_to_qid: use device id as input to qid.path Currently, only the inode number of a file is being used as input to the qid.path field. The 9p RFC specifies that the path needs to be unique per file in the directory hierarchy, however on the host side the inode alone does not suffice to uniquely identify a file, as another file on a different device may have the same inode number. To avoid qid path collisions, we take the device id as input as well. Signed-off-by: Veaceslav Falico <veaceslav.fal...@huawei.com> Signed-off-by: Antonios Motakis <antonios.mota...@huawei.com> --- hw/9pfs/9p.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 393a2b2..a810d13 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -583,11 +583,7 @@ static void virtfs_reset(V9fsPDU *pdu) /* This is the algorithm from ufs in spfs */ static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) { - size_t size; - - memset(&qidp->path, 0, sizeof(qidp->path)); - size = MIN(sizeof(stbuf->st_ino), sizeof(qidp->path)); - memcpy(&qidp->path, &stbuf->st_ino, size); + qidp->path = stbuf->st_ino ^ ((int64_t)stbuf->st_dev << 16); qidp->version = stbuf->st_mtime ^ (stbuf->st_size << 8); qidp->type = 0; if (S_ISDIR(stbuf->st_mode)) { -- 1.8.3.1