On Thu, 1 Feb 2024 at 20:20, Caleb Connolly <caleb.conno...@linaro.org> wrote: > > > > On 01/02/2024 12:07, Sumit Garg wrote: > > On Tue, 30 Jan 2024 at 19:35, Caleb Connolly <caleb.conno...@linaro.org> > > wrote: > >> > >> On Qualcomm platforms, the TZ may already have certain memory regions > >> under protection by the time U-Boot starts. There is a rare case on some > >> platforms where the prefetcher might speculatively access one of these > >> regions resulting in a board crash (TZ traps and then resets the board). > >> > >> We shouldn't be accessing these regions from within U-Boot anyway, so > >> let's mark them all with PTE_TYPE_FAULT to prevent any speculative > >> access and correctly trap in EL1 rather than EL3. > >> > >> This is quite costly with caches off (takes ~2 seconds on SDM845 vs 35ms > >> with caches on). So to minimise the impact this is only enabled on > >> QCS404 for now (where the issue is known to occur). > > > > It only took 65ms with caches off on QCS404 as per below print: > > > > carveout time: 65ms > > > > It's probably due to some missing bits as pointed below to get it > > working fast on SDM845 too.. > > Ah I didn't consider that the difference might really just be the 2M vs > 4K alignment. > > [snip] > > >> + > >> +/* Mark all no-map regions as PTE_TYPE_FAULT to prevent speculative > >> access. > >> + * On some platforms this is enough to trigger a security violation and > >> trap > >> + * to EL3. > >> + */ > >> +static void carve_out_reserved_memory(void) > >> +{ > >> + static struct fdt_resource res[N_RESERVED_REGIONS] = { 0 }; > >> + int parent, rmem, count, i = 0; > >> + phys_addr_t start; > >> + size_t size; > >> + > >> + /* Some reserved nodes must be carved out, as the cache-prefetcher > >> may otherwise > >> + * attempt to access them, causing a security exception. > >> + */ > >> + parent = fdt_path_offset(gd->fdt_blob, "/reserved-memory"); > >> + if (parent <= 0) { > >> + log_err("No reserved memory regions found\n"); > >> + return; > >> + } > >> + count = fdtdec_get_child_count(gd->fdt_blob, parent); > >> + if (count > N_RESERVED_REGIONS) { > >> + log_err("Too many reserved memory regions!\n"); > >> + count = N_RESERVED_REGIONS; > >> + } > >> + > >> + /* Collect the reserved memory regions */ > >> + fdt_for_each_subnode(rmem, gd->fdt_blob, parent) { > >> + if (!fdt_getprop(gd->fdt_blob, rmem, "no-map", NULL)) > >> + continue; > >> + if (fdt_getprop(gd->fdt_blob, rmem, "compatible", NULL)) > >> + continue; > >> + /* Skip regions that don't have a fixed address/size */ > >> + if (fdt_get_resource(gd->fdt_blob, rmem, "reg", 0, > >> &res[i++])) > >> + i--; > >> + } > >> + > >> + /* Sort the reserved memory regions by address */ > >> + count = i; > >> + qsort(res, count, sizeof(struct fdt_resource), fdt_cmp_res); > >> + > >> + /* Now set the right attributes for them. Often a lot of the > >> regions are tightly packed together > >> + * so we can optimise the number of calls to > >> mmu_change_region_attr() by combining adjacent > >> + * regions. > >> + */ > >> + start = ALIGN_DOWN(res[0].start, SZ_2M); > >> + size = ALIGN(res[0].end, SZ_4K) - start; > >> + for (i = 1; i < count; i++) { > > > > I suppose this loop fails to configure attributes for the last no-map > > region like uefi_mem region on QCS404. > > Right. > > > >> + /* We ideally want to 2M align everything for more > >> efficient pagetables, but we must avoid > >> + * overwriting reserved memory regions which shouldn't be > >> mapped as FAULT (like those with > >> + * compatible properties). > >> + * If within 2M of the previous region, bump the size to > >> include this region. Otherwise > >> + * start a new region. > >> + */ > >> + if (start + size < res[i].start - SZ_2M) { > >> + debug(" 0x%016llx - 0x%016llx FAULT\n", > >> + start, start + size); > >> + mmu_change_region_attr(start, size, > >> PTE_TYPE_FAULT); > > > > Here the size won't always be 2M aligned which is the case for SDM845. > > I would suggest you do: > > > > mmu_change_region_attr(start, ALIGN(size, > > SZ_2M), PTE_TYPE_FAULT); > > This isn't an option as I want to avoid unmapping reserved memory > regions which have a compatible property; on SDM845 for example this > includes the cmd-db and rmtfs regions. These are not 2M aligned and > might be directly adjacent to other regions, so doing an align here > could result in unmapping part of these regions.
The regions with a compatible property and "no-map" should be mapped by corresponding drivers later on when caches are enabled. The default memory attributes or cache policy may not make sense for them for eg. a shared memory region with coprocessors which has to be marked un-cached etc. > > This will be especially relevant for SM8250 where I'll be enabling > cmd-db and RPMh drivers in order to turn on the USB VBUS regulator on RB5. So the cmd-db and RPMh drivers should map corresponding reserved memory appropriately as per their requirements via mmu_change_region_attr(). Since we have the caches enabled at that point it won't be an expensive operation. -Sumit > > > >> + start = ALIGN_DOWN(res[i].start, SZ_2M); > >> + size = ALIGN(res[i].end, SZ_4K) - start; > >> + } else { > >> + /* Bump size if this region is immediately after > >> the previous one */ > >> + size = ALIGN(res[i].end, SZ_4K) - start; > >> + } > >> + } > >> +} > >> + > >> +/* This function open-codes setup_all_pgtables() so that we can > >> + * insert additional mappings *before* turning on the MMU. > >> + */ > >> void enable_caches(void) > >> { > >> + u64 tlb_addr = gd->arch.tlb_addr; > >> + u64 tlb_size = gd->arch.tlb_size; > >> + ulong carveout_start; > >> + > >> + gd->arch.tlb_fillptr = tlb_addr; > >> + > >> build_mem_map(); > >> > >> icache_enable(); > >> + > >> + /* Create normal system page tables */ > >> + setup_pgtables(); > >> + > >> + /* Create emergency page tables */ > >> + gd->arch.tlb_size -= (uintptr_t)gd->arch.tlb_fillptr - > >> + (uintptr_t)gd->arch.tlb_addr; > >> + gd->arch.tlb_addr = gd->arch.tlb_fillptr; > >> + setup_pgtables(); > >> + gd->arch.tlb_emerg = gd->arch.tlb_addr; > >> + gd->arch.tlb_addr = tlb_addr; > >> + gd->arch.tlb_size = tlb_size; > >> + > >> + /* Doing this has a noticeable impact on boot time, so until we can > >> + * find a more efficient solution, just enable the workaround for > >> + * the one board where it's necessary. Without this there seems to > >> + * be a speculative access to a region which is protected by the > >> TZ. > >> + */ > >> + if (of_machine_is_compatible("qcom,qcs404")) { > >> + carveout_start = get_timer(0); > >> + carve_out_reserved_memory(); > >> + debug("carveout time: %lums\n", get_timer(carveout_start)); > >> + } > >> + > >> dcache_enable(); > >> } > >> > >> -- > >> 2.43.0 > >> > > -- > // Caleb (they/them)