On Sun, Jan 22, 2017 at 10:37 PM, Matthew Wilcox <mawil...@microsoft.com> wrote: > From: Christoph Hellwig [mailto:h...@lst.de] >> On Sun, Jan 22, 2017 at 06:39:28PM +0000, Matthew Wilcox wrote: >> > Two guests on the same physical machine (or a guest and a host) have access >> > to the same set of physical addresses. This might be an NV-DIMM, or it >> > might >> > just be DRAM (for the purposes of reducing guest overhead). The network >> > filesystem has been enhanced with a call to allow the client to ask the >> > server >> > "What is the physical address for this range of bytes in this file?" >> > >> > We don't want to use the guest pagecache here. That's antithetical to the >> > second usage, and it's inefficient for the first usage. >> >> And the answer is that you need a dax device for whatever memoery exposed >> in this way, as it needs to show up in the memory map for example. > > Wow, DAX devices look painful and awful. I certainly don't want to be > exposing the memory fronted by my network filesystem to userspace to access. > That just seems like a world of pain and bad experiences. Absolutely the > filesystem (or perhaps better, the ACPI tables) need to mark that chunk of > memory as reserved, but it's definitely not available for anyone to access > without the filesystem being aware. > > Even if we let the filesystem create a DAX device that doesn't show up in > /dev (for example), Dan's patches don't give us a way to go from a file on > the filesystem to a set of dax_ops. And it does need to be a per-file > operation, eg to support a file on an XFS volume which might be on a RT > device or a normal device. That was why I leaned towards an address_space > operation, but I'd be happy to see an inode_operation instead.
How about we solve the copy_from_user() abuse first before we hijack this thread for some future feature that afaics has no patches posted yet. An incremental step towards disentangling filesystem-dax from block_devices is a lookup mechanism to go from a block_device to a dax object that holds dax_ops. When this brave new filesystem enabling appears it can grow a mechanism to lookup, or mount on, the dax object directly. One idea is to just hang a pointer to this dax object off of bdev_inode, set at bdev open() time.