Hi Svyatoslav,

On Mon, 7 Apr 2025 at 08:43, Svyatoslav Ryhel <clamo...@gmail.com> wrote:
>
>
> 07.04.25 4:53 пп, Tom Rini:
> > On Mon, Apr 07, 2025 at 01:07:34PM +1200, Simon Glass wrote:
> >> Hi Tom,
> >>
> >> On Mon, 7 Apr 2025 at 10:48, Tom Rini <tr...@konsulko.com> wrote:
> >>> On Mon, Apr 07, 2025 at 09:12:32AM +1200, Simon Glass wrote:
> >>>> Hi Tom,
> >>>>
> >>>> On Sun, 6 Apr 2025 at 11:23, Tom Rini <tr...@konsulko.com> wrote:
> >>>>> On Sun, Apr 06, 2025 at 08:46:31AM +1200, Simon Glass wrote:
> >>>>>> (question for Heinrich below)
> >>>>>>
> >>>>>> Hi Tom,
> >>>>>>
> >>>>>> On Sun, 6 Apr 2025 at 02:48, Tom Rini <tr...@konsulko.com> wrote:
> >>>>>>> On Sat, Apr 05, 2025 at 08:40:08AM -0600, Tom Rini wrote:
> >>>>>>>> On Fri, Apr 04, 2025 at 04:48:01PM -0600, Tom Rini wrote:
> >>>>>>>>> On Fri, Apr 04, 2025 at 11:32:12PM +0200, Jonas Karlman wrote:
> >>>>>>>>>> Hi Simon,
> >>>>>>>>>>
> >>>>>>>>>> On 2025-04-04 00:30, Simon Glass wrote:
> >>>>>>>>>>> Hi Jonas,
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, 4 Apr 2025 at 09:57, Jonas Karlman <jo...@kwiboo.se> 
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> Hi Tom and Simon,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2025-03-19 00:21, Tom Rini wrote:
> >>>>>>>>>>>>> On Wed, 05 Mar 2025 17:24:54 -0700, Simon Glass wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> This series includes some patches related to allowing 
> >>>>>>>>>>>>>> read_all() to be
> >>>>>>>>>>>>>> used with the extlinux / PXE bootmeths.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> These patches were split out from the stb4 series, since it 
> >>>>>>>>>>>>>> will need to
> >>>>>>>>>>>>>> have additional patches for LWIP, to avoid breaking PXE 
> >>>>>>>>>>>>>> booting when
> >>>>>>>>>>>>>> LWIP is used.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> [...]
> >>>>>>>>>>>>> Applied to u-boot/next, thanks!
> >>>>>>>>>>>> This series broke booting a compressed arm64 defconfig Linux 
> >>>>>>>>>>>> kernel
> >>>>>>>>>>>> (without module loading) due to changes in decompression buffer 
> >>>>>>>>>>>> length.
> >>>>>>>>>>>>
> >>>>>>>>>>>> My arm64 defconfig kernel (Image.gz) is ~24 MiB compressed and 
> >>>>>>>>>>>> ~85 MiB
> >>>>>>>>>>>> uncompressed.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Before this series the decompression buffer was 10x the 
> >>>>>>>>>>>> kernel_comp_size
> >>>>>>>>>>>> and now it is instead limited by the SYS_BOOTM_LEN Kconfig 
> >>>>>>>>>>>> symbol.
> >>>>>>>>>>>>
> >>>>>>>>>>>> A broken boot using current next branch:
> >>>>>>>>>>>>
> >>>>>>>>>>>>    Retrieving file: /Image.gz
> >>>>>>>>>>>>    Retrieving file: /initramfs.cpio.gz
> >>>>>>>>>>>>    append: earlycon
> >>>>>>>>>>>>    cmd 'booti' states 1f1f addr_img '0x02080000' conf_ramdisk 
> >>>>>>>>>>>> '0x06000000:394d3c' conf_fdt '3df16350' images 000000003ffe53a0
> >>>>>>>>>>>>       kernel data at 0x02080000, len = 0x00000000 (0)
> >>>>>>>>>>>>    load 2080000 start 2080000 len 0 ep 0 os 5 comp 1
> >>>>>>>>>>>>    find_other type 2 os 5
> >>>>>>>>>>>>    ## Flattened Device Tree blob at 3df16350
> >>>>>>>>>>>>       Booting using the fdt blob at 0x3df16350
> >>>>>>>>>>>>    Working FDT set to 3df16350
> >>>>>>>>>>>>    load_os load 8000000 image_start 2080000 image_len 2000000
> >>>>>>>>>>>>       Uncompressing Kernel Image to 8000000
> >>>>>>>>>>>>    Error: inflate() returned -5
> >>>>>>>>>>>>    Image too large: increase CONFIG_SYS_BOOTM_LEN
> >>>>>>>>>>>>    Must RESET board to recover
> >>>>>>>>>>>>    Resetting the board...
> >>>>>>>>>>>>
> >>>>>>>>>>>> Changing SYS_BOOTM_LEN from default 64 MiB to 128 MiB fixed the 
> >>>>>>>>>>>> boot:
> >>>>>>>>>>>>
> >>>>>>>>>>>>    Retrieving file: /Image.gz
> >>>>>>>>>>>>    Retrieving file: /initramfs.cpio.gz
> >>>>>>>>>>>>    append: earlycon
> >>>>>>>>>>>>    cmd 'booti' states 1f1f addr_img '0x02080000' conf_ramdisk 
> >>>>>>>>>>>> '0x06000000:394d3c' conf_fdt '3df16430' images 000000003ffe5480
> >>>>>>>>>>>>       kernel data at 0x02080000, len = 0x00000000 (0)
> >>>>>>>>>>>>    load 2080000 start 2080000 len 0 ep 0 os 5 comp 1
> >>>>>>>>>>>>    find_other type 2 os 5
> >>>>>>>>>>>>    ## Flattened Device Tree blob at 3df16430
> >>>>>>>>>>>>       Booting using the fdt blob at 0x3df16430
> >>>>>>>>>>>>    Working FDT set to 3df16430
> >>>>>>>>>>>>    load_os load 8000000 image_start 2080000 image_len 2000000
> >>>>>>>>>>>>       Uncompressing Kernel Image to 8000000
> >>>>>>>>>>>>       kernel loaded at 0x08000000, end = 0x0d3b5a00
> >>>>>>>>>>>>       Loading Ramdisk to 3cb51000, end 3cee5d3c ... OK
> >>>>>>>>>>>>       Loading Device Tree to 000000003cb3f000, end 
> >>>>>>>>>>>> 000000003cb509df ... OK
> >>>>>>>>>>>>    Working FDT set to 3cb3f000
> >>>>>>>>>>>>
> >>>>>>>>>>>>    Starting kernel ...
> >>>>>>>>>>>>
> >>>>>>>>>>>> Do we need to increase the default SYS_BOOTM_LEN for ARM64 now?
> >>>>>>>>>>> Are you using the 'booti' command? Can you post a bit more console
> >>>>>>>>>>> output or a script here as it isn't clear what boot command you 
> >>>>>>>>>>> are
> >>>>>>>>>>> using? For now I'm going to assume booti
> >>>>>>>>>> Above was using an extlinux.conf with bootstd, not using the booti 
> >>>>>>>>>> cmd.
> >>>>>>>>>>
> >>>>>>>>>> Use of booti cmd is not affected as it continue to use a 10x 
> >>>>>>>>>> multiple
> >>>>>>>>>> of the kernel_comp_size env var for buffer size.
> >>>>>>>>>>
> >>>>>>>>>> The change this series introduced was instead of the 10x multiple 
> >>>>>>>>>> of
> >>>>>>>>>> kernel_comp_size now use SYS_BOOTM_LEN for buffer size (320 vs 32 
> >>>>>>>>>> MiB).
> >>>>>>>>>>
> >>>>>>>>>> And because cmd/booti.c was not converted there is now two 
> >>>>>>>>>> different
> >>>>>>>>>> handling of this decompression buffer size.
> >>>>>>>>>>
> >>>>>>>>>> If the change to now use SYS_BOOTM_LEN was intended this need to be
> >>>>>>>>>> changed as it has mostly been irrelevant for booting Linux on 
> >>>>>>>>>> AArch64
> >>>>>>>>>> boards prior to this series.
> >>>>>>>>> We need to clean things up such that the path which is using 10x
> >>>>>>>>> multiple (and we lmb_alloc an area iirc) is used in all cases.
> >>>>>>>> Looking at the code this morning, I was confusing this with what we 
> >>>>>>>> do
> >>>>>>>> for compressed FIT images, oops.
> >>>>>>> Ugh. Simon, can you go and fix this whole problem? Both do_bootz and
> >>>>>>> do_booti have non-trivial amounts of code that's missed by changing 
> >>>>>>> from
> >>>>>>> calling do_bootX() to bootX_run().
> >>>>>> For do_bootz() I believe the current code is correct. It does call
> >>>>>> bootz_start() and then doesn't call it again when bootz_run() starts.
> >>>>>> It looks like you have taken my Ubuntu test so perhaps I can add a
> >>>>>> bootz test for another board in my lab.
> >>>>>>
> >>>>>> For do_booti(), the code in booti_start() is used for the
> >>>>>> non-compressed case, so I believe it is still needed and that bootm
> >>>>>> doesn't currently honour 'Image' relocation when using an uncompressed
> >>>>>> image. But then there is the issue of SYS_BOOTM_LEN, below.
> >>>>>>
> >>>>>> bootm_load_os() has code for ARM at the end, which I suspect needs to
> >>>>>> run on RISC-V too, in case the image needs to be relocated. Marek
> >>>>>> added this 7 years ago[1] so it predates RISC-V and Heinrich perhaps
> >>>>>> didn't notice it when adding his support , if it is actually needed.
> >>>>>>
> >>>>>> SYS_BOOTM_LEN is a 'last bastion' security feature to avoid people
> >>>>>> loading images that wrap around memory, etc. I could put bootm_len
> >>>>>> into struct bootm_info so that a 10x value can be used for the booti
> >>>>>> case.
> >>> I'm not sure any of the above is 100% correct. My very quick look was
> >>> that it seems like if we want to have the command part of bootX be able
> >>> to be equivalently called via API, more of most of the commands needs to
> >>> be moved out of cmd/cmdX.c and to boot/bootX.c and put in to some state.
> >> Yes, my intent has been to move code out of commands into boot/ and
> >> have things work as before.
> > Then none of this should be in a "pxe" series, and you should have
> > been removing a bunch more code out of the command part of the file.
> >
> >> Also don't forget that booting a FIT with a compressed boot kernel did
> >> not work before my series. The series was largely written to fix bugs
> >> (in one place) rather that just to refactor for the fun it.
> > I'm not sure which FIT use case you're talking about here exactly and
> > when it got broken (because if it was in use already, someone would have
> > reported it).
> >
> >>>>>> Tom, we also have the issue that you haven't taken the rest of the PXE
> >>>>>> series (due to the lwip case), so our trees are out of sync in this
> >>>>>> area. It would really help if you could take that series, and again, I
> >>>>>> am happy to do the lwip case once it is landed.
> >>> Your further PXE rework is irrelevant to this discussion. Except in that
> >>> it points out there's still further changes that could break other
> >>> things even more.
> >> Yes, any change in untested code has the potential to create bugs. I
> >> do the best I can, but I could use a little more support in this
> >> effort.
> > And the best way to get support for that is to slow down and make
> > smaller more easily testable series. There's yet another bug in this
> > series too that Svyatoslav has or will post about shortly.
> >
> >>>>>> So for now, if people agree, I can do a small series to use a
> >>>>>> different SYS_BOOTM_LEN for the booti command and perhaps add RISC-V
> >>>>>> as above.
> >>> Well, no, that sounds like a step in the wrong direction. I'd like to
> >>> instead restore the expected behavior, and then perhaps we could clean
> >>> it up such that the compressed case for booti-Images works like FIT
> >>> images where we lmb allocate a larger area if needed.
> >> Yes, restoring the 10x decompression size is what I am talking about
> >> here. Is there anything else?
> > Functional wise, the being reported now issue Svyatoslav found too.
> >
> > Implementation wise, yes, I don't want to refactor and re-wrtie a bunch
> > of this as the first step. The first step should be just what's needed
> > to move the working functionality out of cmd/bootX.c and in to
> > boot/bootX.c and let people verify things haven't regressed. Then some
> > rework can be reviewed on its own.
> >
> >>>>>> Later, I could continue the boot-unification effort, e.g. delete the
> >>>>>> booti_start() function. and incorporate the uncompressed case into
> >>>>>> bootm code.
> >>>>> I'm not sure your reading of the situation is right. I'm more inclined
> >>>>> to revert this merge rather than take on more rework in the area.
> >>>> Well that would be a pretty poor show. No one else has spent effort
> >>>> trying to clean up and rationalise this code in the last 10 years, so
> >>>> far as I can see. This is in -next so there is plenty of time to
> >>>> resolve any issues.
> >>> Yes, things haven't broken in 10 years. This is why I don't value "tidy
> >>> up and refactor" in and of itself. Massive reworks then require a long
> >>> time to check all of the corner cases. This has been true of many of the
> >>> massive reworks you've done over the years.
> >> Well, do you want to be able to boot without CMDLINE, or not?
> > It's an interesting long term goal. But since there's always some
> > distribution or another that ships with each release we do, not having
> > functional regressions is more important than speed of new features.
> > Being able to practically disable the command line has been a feature
> > for forever. Being able to literally has implications for both security
> > and end user rights that need to be considered too.
> >
> >>>> I believe my reading of the situation is right, but if you disagree
> >>>> with some of it, please indicate which.
> >>> Your answer is not "Oh, I'll go understand what I broke and restore
> >>> expected behavior". It's "you should take more of my big rework and I'll
> >>> rework this area even more". That is not convincing that we won't have
> >>> v2025.07 broken here.
> >> You only took half of the series, remember. We are talking about a
> >> tiny corner case here, with no test coverage. I'm offering to fix the
> >> case that has been reported. I can also even add a test for it.
> > It wasn't half of the series. I didn't take more because it was either
> > the stuff that broke other functionality or because it wasn't used until
> > some further series from you. It's not a tiny corner case. A test case
> > is nice, sure, but we'll never have 100% test coverage and need physical
> > testing too. Which most people only have time for around releases. Which
> > is why we need to go slow. Which again is why you need to split up your
> > changes more and expect them to go slower. So there's time to test
> > things.
> Hello there! I have recently faced an issue with booting Linux on my
> LG P895 (Tegra30 smartphone). It has installed PostmarketOS with extlinux
> bootmethod and boots perfectly fine with current master but fails with next.
>
> Here is fragment of the boot log
>
>    Retrieving file: /Image
>    Retrieving file: /initramfs
>    append: rw gpt
>    Retrieving file: /tegra30-lg-p895.dtb
>    ERROR: booting os 'Invalid OS' (0) is not supported
>    Boot failed (err=-14)
>
> I have debugged a bit and so far reverting:
>
> e2e87b84 "boot: pxe: Refactor label_run_boot() to avoid cmdline"
> feb8d7fd "pxe_utils: Simplify default fdt in label_run_boot()"
> b1340802 "boot: pxe: Use bootm_...() functions where possible"
>
> Fixed booting for my device.

Thanks for the report and debugging. I'll take a look. I suspect the
global 'images' is somehow not being updated.

Regards,
Simon

Reply via email to