From: Roman Kisel <rom...@linux.microsoft.com> Sent: Wednesday, May 7, 2025 
12:21 PM
> 
> On 5/7/2025 6:02 AM, Saurabh Singh Sengar wrote:
> >
> [..]
> 
> >> +  }
> >> +
> >> +  local_irq_save(flags);
> >> +  in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> >> +  out = *this_cpu_ptr(hyperv_pcpu_output_arg);
> >> +
> >> +  if (copy_from_user(in, (void __user *)hvcall.input_ptr,
> >> hvcall.input_size)) {
> >
> > Here is an issue related to usage of user copy functions when interrupt are 
> > disabled.
> > It was reported by Michael K here:
> >
> > https://github.com/microsoft/OHCL-Linux-Kernel/issues/33
> 
>  From the practical point of view, that memory will be touched by the
> user mode by virtue of Rust requiring initialization so the a possible
> page fault would be resolved before the IOCTL. OpenHCL runs without swap
> so the the memory will not be paged out to require page faults to be
> brought in back.
> 
> I do agree that might be turned into a footgun by the user land if
> they malloc a page w/o prefaulting (so it's just a VA range, not backed
> with the physical page), and then send its address straight over here
> right after w/o writing any data to it. Perhaps likelier with the output
> data. Anyway, yes, relying on the user land doing sane things isn't
> the best approach to the kernel programming.
> 
> If we're inclined to fix this, I'd encourage to take an approach that
> works for the confidential VMs as well so we don't have to fix that
> again when start upstreaming what we have for SNP and TDX. The
> allocation *must* be visible to the hypervisor in the confidential
> scenarios.
> 
> Or, maybe we could avoid the allocations by reading the first byte
> of the user land buffer to "pre-fault" the page outside of the
> scope that disables interrupts. Why allocate if we can avoid that?
> Could set up also the SMP remote calls to run this on the desired
> CPU.
> 
> Summarizing for the case you want to change this:
> 
> 1. Keep interrupts disabled when reading/writing to/from the Hyper-V
>     driver allocated input and output pages.
> 2. If you decide to allocate separate pages, make sure they are
>     visible to the hypervisor in the confidential scenarios. I know
>     we're not talking SNP and TDX here just yet but it would be
>     a waste of time imho to build something here and scrape that
>     later. The issues with allocations are:
>         a) If allocating on-demand, we might fail the hypercall
>            because of OOM. That's certainly bad as the whole VM
>            will break down.
>         b) If allocating for the whole lifetime of the VM,
>            let us remember that we avoid using hypercalls
>            due to their runtime cost. We'll be keeping around
>            2 pages per CPU for the few times we need them.
> 3. Consider reading a byte from the user land buffers to make the page
>     fault happen outside of disabling interrupts. There is no
>     outswap (maybe could have disabling swap in Kconfig) so the page
>     will stay in the memory.
> 
> If you're not changing this, feel free to keep my "Reviewed-by".
> 

Regardless of what might be done to prevent a page fault, I don't
see an option to not fix this. copy_from_user() contains a call to
might_fault(), which in turn calls might_sleep(). The intent of these
runtime "annotations" is precisely for the kernel to check for such
errors and complain about them. The complaining is suppressed unless
CONFIG_DEBUG_ATOMIC_SLEEP is set, but we want to be able to
set that option for debugging purposes and not have this code
generating complaints.

Michael

Reply via email to