On 25/04/24 07:38AM, Darrick J. Wong wrote: > On Thu, Apr 24, 2025 at 08:43:33AM -0500, John Groves wrote: > > On 25/04/20 08:33PM, John Groves wrote: > > > On completion of GET_FMAP message/response, setup the full famfs > > > metadata such that it's possible to handle read/write/mmap directly to > > > dax. Note that the devdax_iomap plumbing is not in yet... > > > > > > Update MAINTAINERS for the new files. > > > > > > Signed-off-by: John Groves <j...@groves.net> > > > --- > > > MAINTAINERS | 9 + > > > fs/fuse/Makefile | 2 +- > > > fs/fuse/dir.c | 3 + > > > fs/fuse/famfs.c | 344 ++++++++++++++++++++++++++++++++++++++ > > > fs/fuse/famfs_kfmap.h | 63 +++++++ > > > fs/fuse/fuse_i.h | 16 +- > > > fs/fuse/inode.c | 2 +- > > > include/uapi/linux/fuse.h | 42 +++++ > > > 8 files changed, 477 insertions(+), 4 deletions(-) > > > create mode 100644 fs/fuse/famfs.c > > > create mode 100644 fs/fuse/famfs_kfmap.h > > > > > <snip> > > > > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h > > > index d85fb692cf3b..0f6ff1ffb23d 100644 > > > --- a/include/uapi/linux/fuse.h > > > +++ b/include/uapi/linux/fuse.h > > > @@ -1286,4 +1286,46 @@ struct fuse_uring_cmd_req { > > > uint8_t padding[6]; > > > }; > > > > > > +/* Famfs fmap message components */ > > > + > > > +#define FAMFS_FMAP_VERSION 1 > > > + > > > +#define FUSE_FAMFS_MAX_EXTENTS 2 > > > +#define FUSE_FAMFS_MAX_STRIPS 16 > > > > FYI, after thinking through the conversation with Darrick, I'm planning > > to drop FUSE_FAMFS_MAX_(EXTENTS|STRIPS) in the next version. In the > > response to GET_FMAP, it's the structures below serialized into a message > > buffer. If it fits, it's good - and if not it's invalid. When the > > in-memory metadata (defined in famfs_kfmap.h) gets assembled, if there is > > a reason to apply limits it can be done - but I don't currently see a reason > > do to that (so if I'm currently enforcing limits there, I'll probably drop > > that. > > You could also define GET_FMAP to have an offset in the request buffer, > and have the famfs daemon send back the next offset at the end of its > reply (or -1ULL to stop). Then the kernel can call GET_FMAP again with > that new offset to get more mappings. > > Though at this point maybe it should go the /other/ way, where the fuse > server can sends a "notification" to the kernel to populate its mapping > data? fuse already defines a handful of notifications for invalidating > pagecache and directory links. > > (Ugly wart: notifications aren't yet implemented for the iouring channel)
I don't have fully-formed thoughts about notifications yet; thinking... If the fmap stuff may be shared by more than one use case (as has always seemed possible), it's a good idea to think through a couple of things: 1) is there anything important missing from this general approach, and 2) do you need to *partially* cache fmaps? (or is the "offset" idea above just to deal with an fmap that might otherwise overflow a response size?) The current approach lets the kernel retrieve and cache simple and interleaved fmaps (and BTW interleaved can be multi-dev or single-dev - there are current weird cases where that's useful). Also too, FWIW everything that can be done with simple ext list fmaps can be done with a collection of interleaved extents, each with strip count = 1. But I think there is a worthwhile clarity to having both. But the current implementation does not contemplate partially cached fmaps. Adding notification could address revoking them post-haste (is that why you're thinking about notifications? And if not can you elaborate on what you're after there?). > > --D Cheers, John