Hi Quentin, On Fri, 28 Feb 2025 at 04:04, Quentin Schulz <quentin.sch...@cherry.de> wrote: > > Hi Simon, > > On 2/27/25 5:24 PM, Simon Glass wrote: > > Hi Quentin, > > > > On Wed, 26 Feb 2025 at 03:53, Quentin Schulz <quentin.sch...@cherry.de> > > wrote: > >> > >> Hi Moteen, > >> > >> On 2/26/25 6:57 AM, Moteen Shah wrote: > >>> > >>> On 17/02/25 20:32, Quentin Schulz wrote: > >>>> Hi Moteen, > >>>> > >>>> On 2/12/25 10:18 AM, Moteen Shah wrote: > >>>>> [You don't often get email from m-s...@ti.com. Learn why this is > >>>>> important at https://aka.ms/LearnAboutSenderIdentification ] > >>>>> > >>>>> In the U-Boot pre-relocation stage, if the parent node lacks bootph* > >>>>> property and the driver lacks a pre-reloc flag, all of its subsequent > >>>>> subnodes gets skipped over from driver binding—even if they have a > >>>>> bootph* property. > >>>>> > >>>>> This series addresses the issue by scanning through all the subnodes > >>>>> of the current node for the bootph* property and propagate it to all > >>>>> of its supernodes, ensuring that all of the applicable drivers are > >>>>> bound and probed prior to relocation. This series implements one of > >>>>> the solutions mentioned in [0]. > >>>>> > >>>>> Since, all the nodes which are not having any bootph* property will > >>>>> also be traversed, we will have to incur some overheads in boot time, > >>>>> hence protecting the feature under a config. > >>>>> > >>>>> Boot time overheads: > >>>>> Baseline: Upstream u-boot > >>>>> > >>>>> Patch test: Baseline + remove all bootph-all properties from > >>>>> *-u-boot.dtsi except the ones which are supposed to be probed > >>>>> but have no bootph in any of its subnode. > >>>>> > >>>>> J7200 delta from baseline : ~100ms > >>>>> J784S4 delta from baseline : ~350ms > >>>>> > >>>> > >>>> Pfew, that's a lot of time. Can you tell us what's the delta in > >>>> percentage from baseline? Because if your system is usually booting in > >>>> one minute, 100ms isn't too bad :) > >>>> > >>> > >>> Our system's boot time is about 2.2s (J7200) and that of J784s4 is 2.7s > >>> (owing to a larger DT). > >>> > >> > >> OK so respectively 4.5 and 12.9% boot time increase if I'm doing maths > >> the right way, that's really a lot :/ > >> > >>> > >>>> FYI, I believe we've been hit by this issue on Rockchip but cannot > >>>> find the thread or patch right now. > >>>> > >>>> For TPL and SPL, the Device Tree is parsed and looked for appropriate > >>>> bootph properties. Any node which doesn't have a bootph property and > >>>> doesn't have any children with a bootph property is removed from the > >>>> tree. However, the bootph property (if only present in a children > >>>> node) isn't propagated (meaning the node doesn't get the property). > >>>> This is done by fdtgrep. > >>>> > >>>> The issue is that for U-Boot proper pre-relocation, the whole DT is > >>>> taken and only nodes with the appropriate bootph property is probed > >>>> and children nodes do NOT count as opposed to the TPL/SPL case. > >>>> > >>>> My idea was that maybe we should rather propagate the property, at the > >>>> very least in U-Boot proper pre-relocation. This does mean we will > >>>> increase (by which amount?) the size of the DT in U-Boot proper > >>>> because we would add this property recursively up the tree from a node > >>>> that has the bootph property for U-Boot proper pre-relocation. This > >>>> **could** be an issue as the DT could be passed between stages and we > >>>> would then hit the size limit. Sadly, I didn't take the time to look > >>>> into adding support for that in fdtgrep nor will I have the time to do > >>>> it, so take this as me sharing my wish list with you. > >>>> > >>> > >>> Thanks for sharing this, the size increase this patch introduces for 48 > >>> such bootph-* tags is about 1.5KB's, we can go ahead and bind the super > >>> parent to bypass the part of adding the tag, but for the next parent we > >>> will have to traverse again down the DT adding in unnecessary > >>> traversals(considering a case that we are 4-5 levels deep in the DT). > >>> > >> > >> j784s4-evm/u-boot.dtb is 131616B > >> j7200-evm/u-boot.dtb is 88368B > >> > >> so 1.5KiB would mean respectively, 1.1 and 1.7% in **DTB** **size** > >> increase, not sure how that translate in terms of boot time though. > >> > >> Reading Tom's notes from the meeting a few days ago where this was > >> seemingly discussed, I believe the issue is that the kernel wants the > >> bootph- property only in the child node. But I assume this applies only > >> to the DTS, which would be fine with this build time propagation of the > >> bootph- properties to node parents recursively. Would the kernel also > >> want the same limitation for the DTB? > > > > It doesn't matter to the kernel, since there is no restriction as to > > which nodes such a property can be added. > > > > I like the binman route as it lends itself to further optimisations in > > the future. For example, we might provide an ordered list of node > > offsets to process before relocation. > > > > We already have bootph- properties for pre-reloc, why would we need an > ordered list of node offsets to process before relocation? What's the > idea here?
Just to save execution time. > > > But we could also have an algorithm which maintains a small list of > > node-offsets which have been visited and have the required tag, so > > avoid constant re-traversal. > > > > Why can't we use the same behavior we have for bootph- properties for > TPL/SPL in pre-reloc, why do we have to have a completely different > implementation there? > > In the worst case, we could have a separate DTB for pre-reloc too, > stripped down the same way it's done for TPL and SPL for example. The reason we (already) have a completely separate implementation is that the unwanted nodes simply aren't present in xPL. So there is no need to check for tags. > > > Ideally, pre-relocation, we should not need a lot of drivers, since > > SPL has done much of the early-init work already. Perhaps a UART and > > not even any pinctrl / clocks / power? > > > > I can tell you I need the storage medium used by SPL to load proper to > be in the pre-reloc phase of proper for everything to work properly, so > "just UART" is not good enough for me. > > So, SPI flash, SPI controller, eMMC controller, SD controller, pinconf, > pinmux, clocks, ... c.f. 100f489f58a6 ("rockchip: rk3399: Fix loading > FIT from SD-card when booting from eMMC") (yes the commit is about > fixing FIT loading, but see the end of the commit log). In principle I don't see why. If there is a full-featured SPL, then U-Boot really doesn't need to do much in its pre-relocation phase. One of the challenges is that we can't remove clock/power/etc. because drivers expect them. For example the UART will use the clock to set the baud rate. But we can check for -ENOSYS and assume that we don't need to do it. > > I don't think it makes sense to have an automagic solution that decides > which nodes should appear or not. We already have bootph- properties for > that. Are you referring to my 'ordered list of nodes'? It's just that we can do a little pre-processing and save time. It would need to be part of the DT though, so it's a bit tricky.... > > I feel like I'm missing something from the big picture, can someone tell > me what it is :) ? We're just trying to reduce the time taken by board_init_f(), which seems too high on some platforms. Regards, Simon