On Mon, 2023-06-12 at 15:48 +0200, Jan Beulich wrote: > On 06.06.2023 21:55, Oleksii Kurochko wrote: > > The way how switch to virtual address was implemented in the > > commit e66003e7be ("xen/riscv: introduce setup_initial_pages") > > wasn't safe enough so identity mapping was introduced and > > used. > > I don't think this is sufficient as a description. You want to make > clear what the "not safe enough" is, and you also want to go into > more detail as to the solution chosen. I'm particularly puzzled that > you map just two singular pages ... I'll update a descrption.
I map two pages as it likely to be enough to switch from 1:1 mapping world. My point is that we need 1:1 mapping only for few instructions which are located in start() [ in .text.header section ]: li t0, XEN_VIRT_START la t1, start sub t1, t1, t0 /* Calculate proper VA after jump from 1:1 mapping */ la t0, .L_primary_switched sub t0, t0, t1 /* Jump from 1:1 mapping world */ jr t0 And it is needed to map 1:1 cpu0_boot_stack to not to fault when executing epilog of enable_mmu() function as it accesses a value on the stack: ffffffffc00001b0: 6422 ld s0,8(sp) ffffffffc00001b2: 0141 addi sp,sp,16 ffffffffc00001b4: 8082 ret > > > @@ -35,8 +40,10 @@ static unsigned long phys_offset; > > * > > * It might be needed one more page table in case when Xen load > > address > > * isn't 2 MB aligned. > > + * > > + * 3 additional page tables are needed for identity mapping. > > */ > > -#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) + 1) > > +#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) + 1 + 3) > > What is this 3 coming from? It feels like the value should (again) > somehow depend on CONFIG_PAGING_LEVELS. 3 is too much in the current case. It should be 2 more. The linker address is 0xFFFFFFFC00000000 thereby mapping the linker to load addresses we need 3 pages ( with the condition that the size of Xen won't be larger than 2 MB ) and 1 page if the case when Xen load address isn't 2MV aligned. We need 2 more page tables because Xen is loaded to address 0x80200000 by OpenSBI and the load address isn't equal to the linker address so we need additional 2 pages as the L2 table we already allocated when mapping the linker to load addresses. And it looks like 2 is enough for Sv48, Sv57 as L4, L3 and L2 pagetables will be already allocated during mapping the linker to load addresses. And it only will be too much for Sv32 ( which has only L1, L0 in general ) but I am not sure that anyone cares about Sv32. > > > @@ -108,16 +116,18 @@ static void __init > > setup_initial_mapping(struct mmu_desc *mmu_desc, > > { > > unsigned long paddr = (page_addr - map_start) + > > pa_start; > > unsigned int permissions = PTE_LEAF_DEFAULT; > > + unsigned long addr = (is_identity_mapping) ? > > Nit: No need for parentheses here. Thanks. > > > + page_addr : > > LINK_TO_LOAD(page_addr); > > As a remark, while we want binary operators at the end of lines when > wrapping, we usually do things differently for the ternary operator: > Either > > unsigned long addr = is_identity_mapping > ? page_addr : > LINK_TO_LOAD(page_addr); > > or > > unsigned long addr = is_identity_mapping > ? page_addr > : LINK_TO_LOAD(page_addr); It looks better. Thanks. > > . > > > @@ -232,22 +242,27 @@ void __init setup_initial_pagetables(void) > > linker_start, > > linker_end, > > load_start); > > + > > + if ( linker_start == load_start ) > > + return; > > + > > + setup_initial_mapping(&mmu_desc, > > + load_start, > > + load_start + PAGE_SIZE, > > + load_start); > > + > > + setup_initial_mapping(&mmu_desc, > > + (unsigned long)cpu0_boot_stack, > > + (unsigned long)cpu0_boot_stack + > > PAGE_SIZE, > > Shouldn't this be STACK_SIZE (and then also be prepared for > STACK_SIZE > PAGE_SIZE)? Yes, it will be better to use STACK_SIZE but for 1:1 mapping it will be enough PAGE_SIZE as I mentioned above this only need for epilogue of enable_mmu() function. > > > + (unsigned long)cpu0_boot_stack); > > } > > > > -void __init noreturn noinline enable_mmu() > > +/* > > + * enable_mmu() can't be __init because __init section isn't part > > of identity > > + * mapping so it will cause an issue after MMU will be enabled. > > + */ > > As hinted at above already - perhaps the identity mapping wants to be > larger, up to covering the entire Xen image? Since it's temporary > only anyway, you could even consider using a large page (and RWX > permission). You already require no overlap of link and load > addresses, > so at least small page mappings ought to be possible for the entire > image. It makes sense and started to thought about that after I applied the patch for Dom0 running... I think we can map 1 GB page which will cover the whole Xen image. Also in that case we haven't to allocate 2 more page tables. > > > @@ -255,25 +270,41 @@ void __init noreturn noinline enable_mmu() > > csr_write(CSR_SATP, > > PFN_DOWN((unsigned long)stage1_pgtbl_root) | > > RV_STAGE1_MODE << SATP_MODE_SHIFT); > > +} > > + > > +void __init remove_identity_mapping(void) > > +{ > > + int i, j; > > Nit: unsigned int please. Thanks. > > > + pte_t *pgtbl; > > + unsigned int index, xen_index; > > These would all probably better be declared in the narrowest possible > scope. Sure. > > > - asm volatile ( ".p2align 2" ); > > - mmu_is_enabled: > > /* > > - * Stack should be re-inited as: > > - * 1. Right now an address of the stack is relative to load > > time > > - * addresses what will cause an issue in case of load start > > address > > - * isn't equal to linker start address. > > - * 2. Addresses in stack are all load time relative which can > > be an > > - * issue in case when load start address isn't equal to > > linker > > - * start address. > > - * > > - * We can't return to the caller because the stack was reseted > > - * and it may have stash some variable on the stack. > > - * Jump to a brand new function as the stack was reseted > > + * id_addrs should be in sync with id mapping in > > + * setup_initial_pagetables() > > What is "id" meant to stand for here? Also if things need keeping in > sync, then a similar comment should exist on the other side. "id" here mean identity. > > > */ > > + unsigned long id_addrs[] = { > > + LINK_TO_LOAD(_start), > > + LINK_TO_LOAD(cpu0_boot_stack), > > + }; > > > > - switch_stack_and_jump((unsigned long)cpu0_boot_stack + > > STACK_SIZE, > > - cont_after_mmu_is_enabled); > > + pgtbl = stage1_pgtbl_root; > > + > > + for ( j = 0; j < ARRAY_SIZE(id_addrs); j++ ) > > + { > > + for ( pgtbl = stage1_pgtbl_root, i = CONFIG_PAGING_LEVELS > > - 1; i >= 0; i-- ) > > + { > > + index = pt_index(i, id_addrs[j]); > > + xen_index = pt_index(i, XEN_VIRT_START); > > + > > + if ( index != xen_index ) > > + { > > + pgtbl[index].pte = 0; > > + break; > > + } > > Up to here I understand index specifies a particular PTE within > pgtbl[]. > > > + pgtbl = &pgtbl[index]; > > But then how can this be the continuation of the page table walk? > Don't > you need to read the address out of the PTE? You are right. it should be: pgtbl = (pte_t *)pte_to_paddr(pgtbl[index]); > > > --- a/xen/arch/riscv/riscv64/head.S > > +++ b/xen/arch/riscv/riscv64/head.S > > @@ -31,6 +31,36 @@ ENTRY(start) > > > > jal calc_phys_offset > > > > + jal setup_initial_pagetables > > + > > + jal enable_mmu > > + > > + /* > > + * Calculate physical offset > > + * > > + * We can't re-use a value in phys_offset variable here as > > + * phys_offset is located in .bss and this section isn't > > + * 1:1 mapped and an access to it will cause MMU fault > > + */ > > + li t0, XEN_VIRT_START > > + la t1, start > > + sub t1, t1, t0 > > If I'm not mistaken this calculates start - XEN_VIRT_START, and ... > > > + /* Calculate proper VA after jump from 1:1 mapping */ > > + la t0, .L_primary_switched > > + sub t0, t0, t1 > > ... then this .L_primary_switched - (start - XEN_VIRT_START). Which > can > also be expressed as (.L_primary_switched - start) + XEN_VIRT_START, > the first part of which gas ought to be able to resolve for you. But > upon experimenting a little it looks like it can't. (I had thought of > something along the lines of > > li t0, .L_primary_switched - start > li t1, XEN_VIRT_START > add t0, t0, t1 > > or even > > li t1, XEN_VIRT_START > add t0, t1, %pcrel_lo(.L_primary_switched - start) > > .) Calculation of ".L_primary_switched - start" might be an issue for gcc. I tried to do something similar and recieved or "Error: can't resolve .L_primary_switched - start" or "Error: illegal operands `li t0,.L_primary_switched-start'" ~ Oleksii