On Mon, Aug 19, 2024 at 3:30 PM Jan Beulich <jbeul...@suse.com> wrote:
>
> On 19.08.2024 16:16, Frediano Ziglio wrote:
> > On Thu, Aug 8, 2024 at 9:54 AM Jan Beulich <jbeul...@suse.com> wrote:
> >> On 08.08.2024 10:00, Frediano Ziglio wrote:
> >>> On Thu, Aug 8, 2024 at 8:34 AM Jan Beulich <jbeul...@suse.com> wrote:
> >>>> On 07.08.2024 15:48, Alejandro Vallejo wrote:
> >>>>> This change allows to put the trampoline in a separate, not executable
> >>>>> section. The trampoline contains a mix of code and data (data which
> >>>>> is modified from C code during early start so must be writable).
> >>>>> This is in preparation for W^X patch in order to satisfy UEFI CA
> >>>>> memory mitigation requirements.
> >>>>
> >>>> Which, aiui, has the downside of disassembly of the section no longer
> >>>> happening by default, when using objdump or similar tools, which go from
> >>>> section attributes. Why is it being in .init.text (and hence RX) not
> >>>> appropriate? It should - in principle at least - be possible to avoid
> >>>> all in-place writing to it, but instead only ever write to its relocated
> >>>> copy. Quite a bit more code churn of course.
> >>>>
> >>>> I wonder if we shouldn't put the trampoline in its own section, RWX in
> >>>> the object file, and switched to whatever appropriate in the binary
> >>>> (which really may be RX, not RW).
> >>>
> >>> We cannot have RWX to satisfy UEFI CA memory mitigation, that's why I
> >>> had to move it, code sections should not be writeable. We can mark
> >>> either RX or RW but we use the data very early so we are not able to
> >>> change the permissions (we can try with all complications that this
> >>> could bring like how to report an error at so early stages).
> >>
> >> The early writing could be done away with, as indicated. There's not
> >> really any strict requirement to write to the trampoline region within
> >> the Xen image. All updates to it could in principle be done after it
> >> was copied into low memory. Then (and of course only then) could it be
> >> part of an RX section in the image, maybe .init.text, maybe a separate
> >> .trampoline section.
> >
> >    how strong are you on this? Is this "objdump" thing such a big
> > issue? The code contains a lot of 16 bit code which would require
> > additional options anyway. Won't be an assembly listing output more
> > helpful instead?
>
> Well. Whether a listing can serve as a stand-in depends on the situation.
> Not being able to disassemble code (e.g. also in the final executable)
> can be pretty limiting. The need to pass extra options is related, but
> not really an argument against.
>

If some code is inside some data section (in the final binary) you can
use -D option to disassemble everything, even data. For instance a
"objdump -D xen-syms -m i8086" and look for some "trampoline" symbols.
Yes, the output of -D is surely longer than -d.

> > I tried to change the code to change only the final copy of the
> > trampoline but it looks like lot of code assumes it can change the
> > source of it (that is requiring it to be in a writeable section). For
> > instance EFI change settings directly and then allocate space for the
> > copy later. The allocation could be moved but there's a fallback on
> > code that assumes that early allocation can fail.
>
> Right, if there's too much standing in the way then we need to look at
> possible alternatives.
>
> > The trampoline relocation is done with PC relative addressing which is
> > helpful if you are changing the source directly, not the copy.
>
> I'm afraid I can't make a connection between this and what we're
> discussing.
>

The current C code (EFI, xen/arch/x86/efi/efi-boot.h) to relocate the
trampoline is
    for ( trampoline_ptr = __trampoline_rel_start;
          trampoline_ptr < __trampoline_rel_stop;
          ++trampoline_ptr )
        *(u32 *)(*trampoline_ptr + (long)trampoline_ptr) += phys;
the formulae is easy as relative but you would need to change to something like
    long trampoline_offset = phys - (long)trampoline_start;
    for ( trampoline_ptr = __trampoline_rel_start;
          trampoline_ptr < __trampoline_rel_stop;
          ++trampoline_ptr )
        *(u32 *)(*trampoline_ptr + (long)trampoline_ptr +
trampoline_offset) += phys;
which is surely more confusing, probably you want to change
relocations (code in trampoline.S) to offsets from trampoline_start
resulting into
    for ( trampoline_ptr = __trampoline_rel_start;
          trampoline_ptr < __trampoline_rel_stop;
          ++trampoline_ptr )
        *(u32 *)(*trampoline_ptr + phys) += phys;

well, not impossible, you will need to change trampoline code, and the
2 code to relocate it.

> > Could I ouput the trampoline in a code section ("ax" instead of "aw")
> > and then later move it into .init.data section assuring .init.data is
> > writeable but not executable?
>
> Could you go into a little more detail on what you mean here? At the
> first glance my reaction is "yes, sure, why not", but much depends on
> what exactly is meant.
>

For instance you could put the trampoline into a
    .section .init.trampoline, "awx", @progbits
section (having the "x" will be disassembled by objdump -d head.o).
Then in xen/arch/x86/xen.lds.S in the .init.data section having something like
...
  DECL_SECTION(.init.data) {
       *(.init.bss.stack_aligned)
      (.init.trampoline)
   ...
this will put the trampoline in .init.data section of the final
object. At this point the .init.data containing code will have execute
permission that you would have to fix using objcopy command.
The final trampoline will be in a data section not executable so to
use objdump you will need the -D option, but not disassembling head.o.
In theory we could keep the temporary object file before the objcopy
adjustment to avoid the -D but I don't think it would save a lot of
burdain.

> Jan

On a related subject I'm trying to come up to a solution in order to
- write more boot code in C instead of assembly;
- avoid duplication between C and assembly code (like trampoline
relocation or page table initialization);
- avoid having to pass pointers to C code (like we do for
xen/arch/x86/boot/reloc.c);
- avoid having bugs like
https://lists.xenproject.org/archives/html/xen-devel/2024-08/msg00784.html,
I'd prefer if compilation would fail in this case instead a bug hidden
in some code path potentially seldomly exercised;
- making possible to reuse code between 32 bit C code (like code in
copy_string in xen/arch/x86/boot/reloc.c and strlen in
xen/arch/x86/boot/cmdline.c).
I have an idea about it, not sure how easy and nice it could be.

Frediano

Reply via email to