On Thu, Feb 17, 2022 at 11:09:35AM -0800, Andy Lutomirski wrote: > On Thu, Feb 17, 2022, at 5:06 AM, Chao Peng wrote: > > On Fri, Feb 11, 2022 at 03:33:35PM -0800, Andy Lutomirski wrote: > >> On 1/18/22 05:21, Chao Peng wrote: > >> > From: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com> > >> > > >> > Introduce a new seal F_SEAL_INACCESSIBLE indicating the content of > >> > the file is inaccessible from userspace through ordinary MMU access > >> > (e.g., read/write/mmap). However, the file content can be accessed > >> > via a different mechanism (e.g. KVM MMU) indirectly. > >> > > >> > It provides semantics required for KVM guest private memory support > >> > that a file descriptor with this seal set is going to be used as the > >> > source of guest memory in confidential computing environments such > >> > as Intel TDX/AMD SEV but may not be accessible from host userspace. > >> > > >> > At this time only shmem implements this seal. > >> > > >> > >> I don't dislike this *that* much, but I do dislike this. > >> F_SEAL_INACCESSIBLE > >> essentially transmutes a memfd into a different type of object. While this > >> can apparently be done successfully and without races (as in this code), > >> it's at least awkward. I think that either creating a special inaccessible > >> memfd should be a single operation that create the correct type of object > >> or > >> there should be a clear justification for why it's a two-step process. > > > > Now one justification maybe from Stever's comment to patch-00: for ARM > > usage it can be used with creating a normal memfd, (partially)populate > > it with initial guest memory content (e.g. firmware), and then > > F_SEAL_INACCESSIBLE it just before the first time lunch of the guest in > > KVM (definitely the current code needs to be changed to support that). > > Except we don't allow F_SEAL_INACCESSIBLE on a non-empty file, right? So > this won't work.
Hmm, right, if we set F_SEAL_INACCESSIBLE on a non-empty file, we will need to make sure access to existing mmap-ed area should be prevented, but that is hard. > > In any case, the whole confidential VM initialization story is a bit buddy. > From the earlier emails, it sounds like ARM expects the host to fill in guest > memory and measure it. From my recollection of Intel's scheme (which may > well be wrong, and I could easily be confusing it with SGX), TDX instead > measures what is essentially a transcript of the series of operations that > initializes the VM. These are fundamentally not the same thing even if they > accomplish the same end goal. For TDX, we unavoidably need an operation > (ioctl or similar) that initializes things according to the VM's > instructions, and ARM ought to be able to use roughly the same mechanism. Yes, TDX requires a ioctl. Steven may comment on the ARM part. Chao > > Also, if we ever get fancy and teach the page allocator about memory with > reduced directmap permissions, it may well be more efficient for userspace to > shove data into a memfd via ioctl than it is to mmap it and write the data.