From: Roman Kisel <rom...@linux.microsoft.com> Sent: Wednesday, May 7, 2025 12:21 PM > > On 5/7/2025 6:02 AM, Saurabh Singh Sengar wrote: > > > [..] > > >> + } > >> + > >> + local_irq_save(flags); > >> + in = *this_cpu_ptr(hyperv_pcpu_input_arg); > >> + out = *this_cpu_ptr(hyperv_pcpu_output_arg); > >> + > >> + if (copy_from_user(in, (void __user *)hvcall.input_ptr, > >> hvcall.input_size)) { > > > > Here is an issue related to usage of user copy functions when interrupt are > > disabled. > > It was reported by Michael K here: > > > > https://github.com/microsoft/OHCL-Linux-Kernel/issues/33 > > From the practical point of view, that memory will be touched by the > user mode by virtue of Rust requiring initialization so the a possible > page fault would be resolved before the IOCTL. OpenHCL runs without swap > so the the memory will not be paged out to require page faults to be > brought in back. > > I do agree that might be turned into a footgun by the user land if > they malloc a page w/o prefaulting (so it's just a VA range, not backed > with the physical page), and then send its address straight over here > right after w/o writing any data to it. Perhaps likelier with the output > data. Anyway, yes, relying on the user land doing sane things isn't > the best approach to the kernel programming. > > If we're inclined to fix this, I'd encourage to take an approach that > works for the confidential VMs as well so we don't have to fix that > again when start upstreaming what we have for SNP and TDX. The > allocation *must* be visible to the hypervisor in the confidential > scenarios. > > Or, maybe we could avoid the allocations by reading the first byte > of the user land buffer to "pre-fault" the page outside of the > scope that disables interrupts. Why allocate if we can avoid that? > Could set up also the SMP remote calls to run this on the desired > CPU. > > Summarizing for the case you want to change this: > > 1. Keep interrupts disabled when reading/writing to/from the Hyper-V > driver allocated input and output pages. > 2. If you decide to allocate separate pages, make sure they are > visible to the hypervisor in the confidential scenarios. I know > we're not talking SNP and TDX here just yet but it would be > a waste of time imho to build something here and scrape that > later. The issues with allocations are: > a) If allocating on-demand, we might fail the hypercall > because of OOM. That's certainly bad as the whole VM > will break down. > b) If allocating for the whole lifetime of the VM, > let us remember that we avoid using hypercalls > due to their runtime cost. We'll be keeping around > 2 pages per CPU for the few times we need them. > 3. Consider reading a byte from the user land buffers to make the page > fault happen outside of disabling interrupts. There is no > outswap (maybe could have disabling swap in Kconfig) so the page > will stay in the memory. > > If you're not changing this, feel free to keep my "Reviewed-by". >
Regardless of what might be done to prevent a page fault, I don't see an option to not fix this. copy_from_user() contains a call to might_fault(), which in turn calls might_sleep(). The intent of these runtime "annotations" is precisely for the kernel to check for such errors and complain about them. The complaining is suppressed unless CONFIG_DEBUG_ATOMIC_SLEEP is set, but we want to be able to set that option for debugging purposes and not have this code generating complaints. Michael