On Fri, 9 Feb 2018 10:06:05 -0600 Eric Blake <ebl...@redhat.com> wrote:
> On 02/09/2018 09:13 AM, Greg Kurz wrote: > > On Thu, 8 Feb 2018 19:00:18 +0100 > > <antonios.mota...@huawei.com> wrote: > > > >> From: Antonios Motakis <antonios.mota...@huawei.com> > >> > >> To support multiple devices on the 9p share, and avoid > >> qid path collisions we take the device id as input > >> to generate a unique QID path. The lowest 48 bits of > >> the path will be set equal to the file inode, and the > >> top bits will be uniquely assigned based on the top > >> 16 bits of the inode and the device id. > > >> +/* stat_to_qid needs to map inode number (64 bits) and device id (32 bits) > >> + * to a unique QID path (64 bits). To avoid having to map and keep track > >> + * of up to 2^64 objects, we map only the 16 highest bits of the inode > >> plus > >> + * the device id to the 16 highest bits of the QID path. The 48 lowest > >> bits > >> + * of the QID path equal to the lowest bits of the inode number. > >> + * > >> + * This takes advantage of the fact that inode number are usually not > >> + * random but allocated sequentially, so we have fewer items to keep > >> + * track of. > >> + */ > >> +static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, > >> + uint64_t *path) > >> +{ > >> + QppEntry lookup = { > >> + .dev = stbuf->st_dev, > >> + .ino_prefix = (uint16_t) (stbuf->st_ino >> 48) > >> + }, *val; > >> + uint32_t hash = qpp_hash(lookup); > >> + > >> + val = qht_lookup(&pdu->s->qpp_table, qpp_lookup_func, &lookup, hash); > >> + > >> + if (!val) { > >> + if (pdu->s->qp_prefix_next == 0) { > >> + /* we ran out of prefixes */ > >> + return -ENFILE; > > > > Not sure this errno would make sense for guest syscalls that don't open > > file descriptors... Maybe ENOENT ? > > > > Cc'ing Eric for insights. > > Actually, it makes sense to me: > > ENFILE 23 Too many open files in system > > You only get here if the hash table filled up, which means there are > indeed too many open files (even if this syscall wasn't opening a file). > In fact, ENFILE is usually associated with running into ulimit > restrictions, and come to think of it, you are more likely to hit your > ulimit than you are to run into a hash collision (so this code may be > very hard to reach in practice, but if you do reach it, it behaves > similarly to what you were more likely to hit in the first place). > > ENOENT implies the file doesn't exist - but here, the file exists but we > can't open it because we're out of resources for tracking it. > Only the host knows that the file exists actually. If stat_to_qid() for this path returns ENOENT, then the file shouldn't be visible in the guest. I say "shouldn't" instead of "isn't" because I now realize that v9fs_do_readdir(), which is used to implement getdents() in the guest, sends a degraded QID, hand-crafted without calling stat_to_qid() at all... :-\ /* * Fill up just the path field of qid because the client uses * only that. To fill the entire qid structure we will have * to stat each dirent found, which is expensive */ size = MIN(sizeof(dent->d_ino), sizeof(qid.path)); memcpy(&qid.path, &dent->d_ino, size); /* Fill the other fields with dummy values */ qid.type = 0; qid.version = 0; Antonios, your patchset should probably also address this. The problem is that the dirent we get from v9fs_co_readdir() only has the inode number, so I guess we must build a dummy struct stat with: stbuf.st_ino = dent->d_ino stbuf.st_dev = st_dev of the parent directory The st_dev of the parent directory could be obtained in v9fs_readdir() and passed to v9fs_do_readdir(). Another solution could be to cache the QID in the V9fsFidState of the parent directory when it is open. Also, if we hit a collision while reading the directory, I'm afraid the remaining entries won't be read at all. I'm not sure this is really what we want. > Also, POSIX permits returning specific errno codes that aren't otherwise > listed for a syscall if the usual semantics of that errno code are > indeed the reason for the failure (you should prefer to fail with errno > codes documented by POSIX where possible, but POSIX does not limit you > to just that set). > Ok, then ENFILE wouldn't be that bad in the end. Thanks for your POSIX expertise :) Cheers, -- Greg