On 15.08.2025 10:40, Nicola Vetrini wrote: > On 2025-08-15 10:30, Jan Beulich wrote: >> On 14.08.2025 20:20, Andrew Cooper wrote: >>> On 14/08/2025 8:26 am, Jan Beulich wrote: >>>> On 13.08.2025 13:36, Andrew Cooper wrote: >>>>> On 12/08/2025 10:43 am, Nicola Vetrini wrote: >>>>>> On 2025-08-08 22:23, Andrew Cooper wrote: >>>>>>> diff --git a/xen/arch/x86/traps-setup.c >>>>>>> b/xen/arch/x86/traps-setup.c >>>>>>> index 8ca379c9e4cb..13b8fcf0ba51 100644 >>>>>>> --- a/xen/arch/x86/traps-setup.c >>>>>>> +++ b/xen/arch/x86/traps-setup.c >>>>>>> @@ -19,6 +20,124 @@ boolean_param("ler", opt_ler); >>>>>>> >>>>>>> void nocall entry_PF(void); >>>>>>> >>>>>>> +/* >>>>>>> + * Sets up system tables and descriptors for IDT devliery. >>>>>>> + * >>>>>>> + * - Sets up TSS with stack pointers, including ISTs >>>>>>> + * - Inserts TSS selector into regular and compat GDTs >>>>>>> + * - Loads GDT, IDT, TR then null LDT >>>>>>> + * - Sets up IST references in the IDT >>>>>>> + */ >>>>>>> +static void load_system_tables(void) >>>>>>> +{ >>>>>>> + unsigned int i, cpu = smp_processor_id(); >>>>>>> + unsigned long stack_bottom = get_stack_bottom(), >>>>>>> + stack_top = stack_bottom & ~(STACK_SIZE - 1); >>>>>>> + /* >>>>>>> + * NB: define tss_page as a local variable because clang 3.5 >>>>>>> doesn't >>>>>>> + * support using ARRAY_SIZE against per-cpu variables. >>>>>>> + */ >>>>>>> + struct tss_page *tss_page = &this_cpu(tss_page); >>>>>>> + idt_entry_t *idt = this_cpu(idt); >>>>>>> + >>>>>> Given the clang baseline this might not be needed anymore? >>>>> Hmm. While true, looking at 51461114e26, the code is definitely >>>>> better >>>>> written with the tss_page variable and we wouldn't want to go back >>>>> to >>>>> the old form. >>>>> >>>>> I think that I'll simply drop the comment. >>>>> >>>>> ~Andrew >>>>> >>>>> P.S. >>>>> >>>>> Generally speaking, because of the RELOC_HIDE() in this_cpu(), any >>>>> time >>>>> you ever want two accesses to a variable, it's better (code gen >>>>> wise) to >>>>> construct a pointer to it and use the point multiple times. >>>>> >>>>> I don't understand why there's a RELOC_HIDE() in this_cpu(). The >>>>> justification doesn't make sense, but I've not had time to explore >>>>> what >>>>> happens if we take it out. >>>> There's no justification in xen/percpu.h? >>> >>> Well, it's given in compiler.h by RELOC_HIDE(). >>> >>> /* This macro obfuscates arithmetic on a variable address so that gcc >>> shouldn't recognize the original var, and make assumptions about it >>> */ >>> >>> >>> But this is far from convincing. >>> >>>> >>>> My understanding is that we simply may not expose any accesses to >>>> per_cpu_* >>>> variables directly to the compiler, or there's a risk that it might >>>> access >>>> the "master" variable (i.e. CPU0's on at least x86). >>> >>> RELOC_HIDE() doesn't do anything about the correctness of the pointer >>> arithmetic expression to make the access work. >>> >>> I don't see how a correct expression can ever access CPU0's data by >>> accident. >> >> Hmm, upon another look I agree. I wonder whether we inherited this from >> Linux, where in turn it may have been merely a workaround to deal with >> preemptible code not correctly accessing per-CPU data (i.e. not >> accounting for get_per_cpu_offset() not being stable across >> preemption). >> Yet then per_cpu() would have been of similar concern when "cpu" isn't >> properly re-fetched after any possible preemption point ... > > Probably inherited with a stripped-down comment on top of RELOC_HIDE, > see [1]. In a way it does make sense that the compiler may decide to > optimize based on this assumption, though I don't know whether wrapping > is meant to happen with per-CPU variables.
I wouldn't call it "meant to", but wrapping certainly is possible. This is arch-independent code, and hence whether any wrapping would occur depends on the VA layout of the individual architectures. Jan