Hi! Dne petek, 30. april 2021 ob 15:34:28 CEST je Andre Przywara napisal(a): > On Fri, 30 Apr 2021 14:02:52 +0200 (CEST) > Mark Kettenis <mark.kette...@xs4all.nl> wrote: > > Hi Mark, > > thanks for the reply! > > (CC:ing Alex and Heinrich for the UEFI questions below) > > > > Date: Fri, 30 Apr 2021 12:21:21 +0100 > > > From: Andre Przywara <andre.przyw...@arm.com> > > > > > > Hi, > > > > > > We now see the first Allwinner devices [1] having DRAM located above > > > 4GB in address space (4GB DRAM starting at 1GB). After one fix[2] > > > this works somewhat fine, but the sun8i-emac network device is still > > > limited to 32-bit DMA addresses. With U-Boot relocating itself (plus > > > stack and heap) to the end of DRAM, it now runs completely beyond 4GB > > > on those machines, so not giving pure 32-bit addresses for buffers > > > anymore. > > > In Linux we handle this easily by just keeping the default DMA > > > mask at 32 bits, and letting the DMA framework deal with the nasty > > > details. > > > > > > I was wondering how this should be handled in U-Boot? The straight > > > forward solution would be: > > > - Let the driver allocate the RX and TX buffers separately, placing them > > > > > > below 4GB in the address space (using lmb_reserve(), I guess?) > > > > > > - Use those RX buffers and hand the addresses back to the upper layers. > > > - We already copy TX packets, so this would also be covered, in this > > > > > > situation. Other drivers might need to introduce copying. > > > > What you describe here is called a bounce buffer approach. I believe > > Linux developers also refer to this as swiotlb. > > Yes, but it's not entirely the same as bounce buffering in Linux, > since we allocate the buffers ourselves, in the driver, so we have full > control over it. The problem I face is that malloc() works on the heap > (which is high), or we use the automatic priv_alloc mechanism, which > uses the heap as well, IIUC. > > > > This sounds like a common problem, so I was wondering if there is a > > > more generic solution to this? Maybe there are already platforms or > > > devices affected? Or should the whole heap and stack be moved below 4GB > > > (if this is easily possible)? > > > In our case we make the buffers part of our priv struct, so should > > > there be an option to let the priv_auto allocation come from below 4GB? > > > > > > Grateful for any input on this! > > > > I looked into this a bit when I was trying to figure out what to do on > > Apple M1 systems where I have a somewhat related issue. These systems > > have an IOMMU that can't be bypassed. Since I don't want to add IOMMU > > infrastructure to U-Boot, I set up the IOMMU to map a fixed block of > > physical memory and make sure that all allocations of memory come from > > that block of memory. In this case this is fairly easy to achieve. > > U-Boot allocates memory from the top of usable memory, so as long as I > > let the IOMMU map that high memory, things work. U-Boot doesn't need > > a lot of memory, so a block of 512MB is more than sufficient. > > I'd rather not play around with the visible memory size (see below). > And while technically there is a (scatter/gather) IOMMU in the SoC, it > would be too big guns for that small problem.
IOMMU is connected only to video related cores, so it's not an option here. Best regards, Jernej > > > In your case this means that as long as you set the top of usable > > memory to an address < 4G, U-Boot itself should be fine and no bounce > > buffers are needed. You have to make sure the addresses in the U-Boot > > environment for loading things like the kernel and the FDT are set to > > an address < 4G as well. > > > > For EFI things are different though. You want to expose all physical > > memory in the EFI memory map. > > Not only for UEFI, since U-Boot populates the DT memory node even for > booti/bootm, in arch/arm/lib/bootm-fdt.c:arch_fixup_fdt(). > So limiting the memory is not an option, since this would be passed on > to the OS. > > > This means that an EFI application > > (such as an OS loader) may pick memory > 4G and use it to do I/O. > > I think we should be safe here, as the driver has full control over the > buffers: For TX we copy already, to use "fire-and-forget", so we > just start the DMA and return. And for RX U-Boot network drivers > return the buffer address, so it's our own buffer again. So wherever > higher layers put the packets, we should be good (given our own buffers > are). > > > So I guess my question boils down to: How can I best allocate buffers > from "low" memory? And do those buffers carveouts make it into the UEFI > memory map, as reserved regions? Or can UEFI differentiate between > boot services and runtime services allocations? The buffers would be > needed during boot services, for the UEFI network protocol. But later > on they can be abandoned. > > > this purpose U-Boot already implements bounce buffers. See the > > CONFIG_EFI_LOADER_BOUNCE_BUFFER option. > > Interesting, thanks, I will have a look at that. Maybe that contains > some useful traces to other code. > > Cheers, > Andre