Hi Quentin,

On Fri, 28 Feb 2025 at 04:04, Quentin Schulz <quentin.sch...@cherry.de> wrote:
>
> Hi Simon,
>
> On 2/27/25 5:24 PM, Simon Glass wrote:
> > Hi Quentin,
> >
> > On Wed, 26 Feb 2025 at 03:53, Quentin Schulz <quentin.sch...@cherry.de> 
> > wrote:
> >>
> >> Hi Moteen,
> >>
> >> On 2/26/25 6:57 AM, Moteen Shah wrote:
> >>>
> >>> On 17/02/25 20:32, Quentin Schulz wrote:
> >>>> Hi Moteen,
> >>>>
> >>>> On 2/12/25 10:18 AM, Moteen Shah wrote:
> >>>>> [You don't often get email from m-s...@ti.com. Learn why this is
> >>>>> important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>
> >>>>> In the U-Boot pre-relocation stage, if the parent node lacks bootph*
> >>>>> property and the driver lacks a pre-reloc flag, all of its subsequent
> >>>>> subnodes gets skipped over from driver binding—even if they have a
> >>>>> bootph* property.
> >>>>>
> >>>>> This series addresses the issue by scanning through all the subnodes
> >>>>> of the current node for the bootph* property and propagate it to all
> >>>>> of its supernodes, ensuring that all of the applicable drivers are
> >>>>> bound and probed prior to relocation. This series implements one of
> >>>>> the solutions mentioned in [0].
> >>>>>
> >>>>> Since, all the nodes which are not having any bootph* property will
> >>>>> also be traversed, we will have to incur some overheads in boot time,
> >>>>> hence protecting the feature under a config.
> >>>>>
> >>>>> Boot time overheads:
> >>>>>       Baseline: Upstream u-boot
> >>>>>
> >>>>>       Patch test: Baseline + remove all bootph-all properties from
> >>>>>       *-u-boot.dtsi except the ones which are supposed to be probed
> >>>>>       but have no bootph in any of its subnode.
> >>>>>
> >>>>>       J7200 delta from baseline  : ~100ms
> >>>>>       J784S4 delta from baseline : ~350ms
> >>>>>
> >>>>
> >>>> Pfew, that's a lot of time. Can you tell us what's the delta in
> >>>> percentage from baseline? Because if your system is usually booting in
> >>>> one minute, 100ms isn't too bad :)
> >>>>
> >>>
> >>> Our system's boot time is about 2.2s (J7200) and that of J784s4 is 2.7s
> >>> (owing to a larger DT).
> >>>
> >>
> >> OK so respectively 4.5 and 12.9% boot time increase if I'm doing maths
> >> the right way, that's really a lot :/
> >>
> >>>
> >>>> FYI, I believe we've been hit by this issue on Rockchip but cannot
> >>>> find the thread or patch right now.
> >>>>
> >>>> For TPL and SPL, the Device Tree is parsed and looked for appropriate
> >>>> bootph properties. Any node which doesn't have a bootph property and
> >>>> doesn't have any children with a bootph property is removed from the
> >>>> tree. However, the bootph property (if only present in a children
> >>>> node) isn't propagated (meaning the node doesn't get the property).
> >>>> This is done by fdtgrep.
> >>>>
> >>>> The issue is that for U-Boot proper pre-relocation, the whole DT is
> >>>> taken and only nodes with the appropriate bootph property is probed
> >>>> and children nodes do NOT count as opposed to the TPL/SPL case.
> >>>>
> >>>> My idea was that maybe we should rather propagate the property, at the
> >>>> very least in U-Boot proper pre-relocation. This does mean we will
> >>>> increase (by which amount?) the size of the DT in U-Boot proper
> >>>> because we would add this property recursively up the tree from a node
> >>>> that has the bootph property for U-Boot proper pre-relocation. This
> >>>> **could** be an issue as the DT could be passed between stages and we
> >>>> would then hit the size limit. Sadly, I didn't take the time to look
> >>>> into adding support for that in fdtgrep nor will I have the time to do
> >>>> it, so take this as me sharing my wish list with you.
> >>>>
> >>>
> >>> Thanks for sharing this, the size increase this patch introduces for 48
> >>> such bootph-* tags is about 1.5KB's, we can go ahead and bind the super
> >>> parent to bypass the part of adding the tag, but for the next parent we
> >>> will have to traverse again down the DT adding in unnecessary
> >>> traversals(considering a case that we are 4-5 levels deep in the DT).
> >>>
> >>
> >> j784s4-evm/u-boot.dtb is 131616B
> >> j7200-evm/u-boot.dtb is 88368B
> >>
> >> so 1.5KiB would mean respectively, 1.1 and 1.7% in **DTB** **size**
> >> increase, not sure how that translate in terms of boot time though.
> >>
> >> Reading Tom's notes from the meeting a few days ago where this was
> >> seemingly discussed, I believe the issue is that the kernel wants the
> >> bootph- property only in the child node. But I assume this applies only
> >> to the DTS, which would be fine with this build time propagation of the
> >> bootph- properties to node parents recursively. Would the kernel also
> >> want the same limitation for the DTB?
> >
> > It doesn't matter to the kernel, since there is no restriction as to
> > which nodes such a property can be added.
> >
> > I like the binman route as it lends itself to further optimisations in
> > the future. For example, we might provide an ordered list of node
> > offsets to process before relocation.
> >
>
> We already have bootph- properties for pre-reloc, why would we need an
> ordered list of node offsets to process before relocation? What's the
> idea here?

Just to save execution time.

>
> > But we could also have an algorithm which maintains a small list of
> > node-offsets which have been visited and have the required tag, so
> > avoid constant re-traversal.
> >
>
> Why can't we use the same behavior we have for bootph- properties for
> TPL/SPL in pre-reloc, why do we have to have a completely different
> implementation there?
>
> In the worst case, we could have a separate DTB for pre-reloc too,
> stripped down the same way it's done for TPL and SPL for example.

The reason we (already) have a completely separate implementation is
that the unwanted nodes simply aren't present in xPL. So there is no
need to check for tags.

>
> > Ideally, pre-relocation, we should not need a lot of drivers, since
> > SPL has done much of the early-init work already. Perhaps a UART and
> > not even any pinctrl / clocks / power?
> >
>
> I can tell you I need the storage medium used by SPL to load proper to
> be in the pre-reloc phase of proper for everything to work properly, so
> "just UART" is not good enough for me.
>
> So, SPI flash, SPI controller, eMMC controller, SD controller, pinconf,
> pinmux, clocks, ... c.f. 100f489f58a6 ("rockchip: rk3399: Fix loading
> FIT from SD-card when booting from eMMC") (yes the commit is about
> fixing FIT loading, but see the end of the commit log).

In principle I don't see why. If there is a full-featured SPL, then
U-Boot really doesn't need to do much in its pre-relocation phase.

One of the challenges is that we can't remove clock/power/etc. because
drivers expect them. For example the UART will use the clock to set
the baud rate. But we can check for -ENOSYS and assume that we don't
need to do it.

>
> I don't think it makes sense to have an automagic solution that decides
> which nodes should appear or not. We already have bootph- properties for
> that.

Are you referring to my 'ordered list of nodes'? It's just that we can
do a little pre-processing and save time. It would need to be part of
the DT though, so it's a bit tricky....

>
> I feel like I'm missing something from the big picture, can someone tell
> me what it is :) ?

We're just trying to reduce the time taken by board_init_f(), which
seems too high on some platforms.

Regards,
Simon

Reply via email to