On 8/26/2025 5:07 AM, Peter Zijlstra wrote:
On Tue, Aug 26, 2025 at 05:00:31PM +0530, Naman Jain wrote:

Peter,

Naman is OOF; genuinely hoping it's not presumptuous of me to reply
to the thread - seems time sensitive.

[...]


I do not know what OpenHCL is. Nor is it clear from the code what NMIs
can't happen. Anyway, same can be achieved with breakpoints / kprobes.
You can get a trap after setting CR2 and scribble it.

You simply cannot use CR2 this way.


The code in question runs with interrupts disabled, and the kernel runs
without the memory swapping when using that module - the kernel is
a firmware to host a vTPM for virtual machines. Somewhat similar to SMM.
That should've been reflected somewhere in the comments and in Kconfig,
we could do better. All in all, the page fault cannot happen in that
path thus CR2 won't be trashed.

Nor this kind of code can be stepped through in a self-hosted
kernel debugger like kgdb. There are other examples of such code iiuc:
the asm glue code in the interrupt exception entries where the trap
frames are handled, likely return from the kernel to the user land.
The thread context-swap code hardly can be stepped through with
convenience if at all in the self-hosted debugger, too.

And an rax:rcx return, I though the canonical pair was AX,DX !?!?

Here, the code uses rax and rcx not as a means to return one 128 bit
value. The code uses them in that way as an ABI.

Still daft. Creating an ABI that goes against pre-existing conventions
is weird.


It is weird. It really sucks to have to conform to ABIs introduced a
decade+ ago when Hyper-V just appeared in the kernel and just for x86.
As weird as the pair ax:cx looks for anyone who takes joy and pride in
writing x86 asm code, it still works for the customers, we have to care
about the backward compat.

From the discussion, it doesn't appear we can ask for much as you're
right: the asm chunk looks abominable due to being essentially a
context-swap with an entity foreign to the Linux/System-V ABI. Same
must be true for the calls into the UEFI runtime services to a certain
extent due to different calling conventions.

We have a much cleaner story on ARM64 due to no legacy and using
their calling conventions aka AAPCS64 and other standards everywhere
(not only in Linux Hyper-V code) to the best of my knowledge.

If nothing of that saves the patches from the death row, maybe it'd be possible to give the patches the experimental status or get some time
extension to learn what can be improved? I am asking to save the
time spent by folks reviewing the parts that you don't see as being
prohibitively bad.

Also, that STACK_FRAME_NON_STANDARD() annotation is broken, this must
not be used for anything that can end up in vmlinux.o -- that is, the
moment you built-in this driver (=y) this comes unstuck.

The reason you're getting warnings is because you're violating the
normal calling convention and scribbling BP, we yelled at the TDX guys
for doing this, now you're getting yelled at. WTF !?!

Please explain how just shutting up objtool makes the unwind work when
the NMI hits your BP scribble?

Returning to a lower VTL treats the base pointer register as a general
purpose one and this VTL transition function makes sure to preserve the
bp register due to which the objtool trips over on the assembly touching
the bp register. We considered this warning harmless and thus we are
using this macro. Moreover NMIs are not an issue here as they won't be
coming as mentioned other. If there are alternate approaches that I should
be using, please suggest.

Using BP in an ABI like that is ridiculous and broken. We told the same
to the TDX folks when they tried, IIRC TDX was fixed.

It is simply not acceptable to break the regular calling convention with
a new ABI.

Again, I can put a breakpoint or kprobe in the region where BP is
scribbled.

Basically the argument is really simple: you run in Linux, you play by
the Linux rules. Using BP as argument is simply not possible. If your
ABI requires that, your ABI is broken and will not be supported. Rev the
ABI and try again. Same for CR2, that is not an available register in
any way.

I now understand that as part of your effort to enable IBT config on
x64, you changed the indirect calls to direct calls in Hyper-V.

Yeah, I was cleaning up indirect calls, and this really didn't need to
be one.

As of today, there is no requirement to enable IBT in OpenHCL kernel
as this runs as a paravisor in VTL2 and it does not effect the guest
VMs running
in VTL0.

I do not know what OpenHCL or VTLn means and as such pretty much the
whole of your statement makes no sense.

Anyway, AFAICT the whole idea of a hypercall page is to 'abtract' out
the VMCALL vs VMMCALL nonsense Intel/AMD inflicted on us. Surely you
don't actually need that. HyperV already knows about all the gazillion
of ways to do hypercalls.

I can disable CONFIG_X86_KERNEL_IBT when CONFIG_MSHV_VTL is enabled in
Kconfig in next version.

Or you can just straight up say: "We at microsoft don't care about
security." :-/

Doing that will ensure no distro will build your module, most build bots
will not build your module, nobody cares about your module.

And no, the problems with BP and CR2 are not related to IBT, that is
separate and no less broken. They violate the basic rules of x86_64.

--
Thank you,
Roman


Reply via email to