On Mon, Apr 07, 2025 at 01:07:34PM +1200, Simon Glass wrote:
Hi Tom,
On Mon, 7 Apr 2025 at 10:48, Tom Rini <tr...@konsulko.com> wrote:
On Mon, Apr 07, 2025 at 09:12:32AM +1200, Simon Glass wrote:
Hi Tom,
On Sun, 6 Apr 2025 at 11:23, Tom Rini <tr...@konsulko.com> wrote:
On Sun, Apr 06, 2025 at 08:46:31AM +1200, Simon Glass wrote:
(question for Heinrich below)
Hi Tom,
On Sun, 6 Apr 2025 at 02:48, Tom Rini <tr...@konsulko.com> wrote:
On Sat, Apr 05, 2025 at 08:40:08AM -0600, Tom Rini wrote:
On Fri, Apr 04, 2025 at 04:48:01PM -0600, Tom Rini wrote:
On Fri, Apr 04, 2025 at 11:32:12PM +0200, Jonas Karlman wrote:
Hi Simon,
On 2025-04-04 00:30, Simon Glass wrote:
Hi Jonas,
On Fri, 4 Apr 2025 at 09:57, Jonas Karlman <jo...@kwiboo.se> wrote:
Hi Tom and Simon,
On 2025-03-19 00:21, Tom Rini wrote:
On Wed, 05 Mar 2025 17:24:54 -0700, Simon Glass wrote:
This series includes some patches related to allowing read_all() to be
used with the extlinux / PXE bootmeths.
These patches were split out from the stb4 series, since it will need to
have additional patches for LWIP, to avoid breaking PXE booting when
LWIP is used.
[...]
Applied to u-boot/next, thanks!
This series broke booting a compressed arm64 defconfig Linux kernel
(without module loading) due to changes in decompression buffer length.
My arm64 defconfig kernel (Image.gz) is ~24 MiB compressed and ~85 MiB
uncompressed.
Before this series the decompression buffer was 10x the kernel_comp_size
and now it is instead limited by the SYS_BOOTM_LEN Kconfig symbol.
A broken boot using current next branch:
Retrieving file: /Image.gz
Retrieving file: /initramfs.cpio.gz
append: earlycon
cmd 'booti' states 1f1f addr_img '0x02080000' conf_ramdisk
'0x06000000:394d3c' conf_fdt '3df16350' images 000000003ffe53a0
kernel data at 0x02080000, len = 0x00000000 (0)
load 2080000 start 2080000 len 0 ep 0 os 5 comp 1
find_other type 2 os 5
## Flattened Device Tree blob at 3df16350
Booting using the fdt blob at 0x3df16350
Working FDT set to 3df16350
load_os load 8000000 image_start 2080000 image_len 2000000
Uncompressing Kernel Image to 8000000
Error: inflate() returned -5
Image too large: increase CONFIG_SYS_BOOTM_LEN
Must RESET board to recover
Resetting the board...
Changing SYS_BOOTM_LEN from default 64 MiB to 128 MiB fixed the boot:
Retrieving file: /Image.gz
Retrieving file: /initramfs.cpio.gz
append: earlycon
cmd 'booti' states 1f1f addr_img '0x02080000' conf_ramdisk
'0x06000000:394d3c' conf_fdt '3df16430' images 000000003ffe5480
kernel data at 0x02080000, len = 0x00000000 (0)
load 2080000 start 2080000 len 0 ep 0 os 5 comp 1
find_other type 2 os 5
## Flattened Device Tree blob at 3df16430
Booting using the fdt blob at 0x3df16430
Working FDT set to 3df16430
load_os load 8000000 image_start 2080000 image_len 2000000
Uncompressing Kernel Image to 8000000
kernel loaded at 0x08000000, end = 0x0d3b5a00
Loading Ramdisk to 3cb51000, end 3cee5d3c ... OK
Loading Device Tree to 000000003cb3f000, end 000000003cb509df ... OK
Working FDT set to 3cb3f000
Starting kernel ...
Do we need to increase the default SYS_BOOTM_LEN for ARM64 now?
Are you using the 'booti' command? Can you post a bit more console
output or a script here as it isn't clear what boot command you are
using? For now I'm going to assume booti
Above was using an extlinux.conf with bootstd, not using the booti cmd.
Use of booti cmd is not affected as it continue to use a 10x multiple
of the kernel_comp_size env var for buffer size.
The change this series introduced was instead of the 10x multiple of
kernel_comp_size now use SYS_BOOTM_LEN for buffer size (320 vs 32 MiB).
And because cmd/booti.c was not converted there is now two different
handling of this decompression buffer size.
If the change to now use SYS_BOOTM_LEN was intended this need to be
changed as it has mostly been irrelevant for booting Linux on AArch64
boards prior to this series.
We need to clean things up such that the path which is using 10x
multiple (and we lmb_alloc an area iirc) is used in all cases.
Looking at the code this morning, I was confusing this with what we do
for compressed FIT images, oops.
Ugh. Simon, can you go and fix this whole problem? Both do_bootz and
do_booti have non-trivial amounts of code that's missed by changing from
calling do_bootX() to bootX_run().
For do_bootz() I believe the current code is correct. It does call
bootz_start() and then doesn't call it again when bootz_run() starts.
It looks like you have taken my Ubuntu test so perhaps I can add a
bootz test for another board in my lab.
For do_booti(), the code in booti_start() is used for the
non-compressed case, so I believe it is still needed and that bootm
doesn't currently honour 'Image' relocation when using an uncompressed
image. But then there is the issue of SYS_BOOTM_LEN, below.
bootm_load_os() has code for ARM at the end, which I suspect needs to
run on RISC-V too, in case the image needs to be relocated. Marek
added this 7 years ago[1] so it predates RISC-V and Heinrich perhaps
didn't notice it when adding his support , if it is actually needed.
SYS_BOOTM_LEN is a 'last bastion' security feature to avoid people
loading images that wrap around memory, etc. I could put bootm_len
into struct bootm_info so that a 10x value can be used for the booti
case.
I'm not sure any of the above is 100% correct. My very quick look was
that it seems like if we want to have the command part of bootX be able
to be equivalently called via API, more of most of the commands needs to
be moved out of cmd/cmdX.c and to boot/bootX.c and put in to some state.
Yes, my intent has been to move code out of commands into boot/ and
have things work as before.
Then none of this should be in a "pxe" series, and you should have
been removing a bunch more code out of the command part of the file.
Also don't forget that booting a FIT with a compressed boot kernel did
not work before my series. The series was largely written to fix bugs
(in one place) rather that just to refactor for the fun it.
I'm not sure which FIT use case you're talking about here exactly and
when it got broken (because if it was in use already, someone would have
reported it).
Tom, we also have the issue that you haven't taken the rest of the PXE
series (due to the lwip case), so our trees are out of sync in this
area. It would really help if you could take that series, and again, I
am happy to do the lwip case once it is landed.
Your further PXE rework is irrelevant to this discussion. Except in that
it points out there's still further changes that could break other
things even more.
Yes, any change in untested code has the potential to create bugs. I
do the best I can, but I could use a little more support in this
effort.
And the best way to get support for that is to slow down and make
smaller more easily testable series. There's yet another bug in this
series too that Svyatoslav has or will post about shortly.
So for now, if people agree, I can do a small series to use a
different SYS_BOOTM_LEN for the booti command and perhaps add RISC-V
as above.
Well, no, that sounds like a step in the wrong direction. I'd like to
instead restore the expected behavior, and then perhaps we could clean
it up such that the compressed case for booti-Images works like FIT
images where we lmb allocate a larger area if needed.
Yes, restoring the 10x decompression size is what I am talking about
here. Is there anything else?
Functional wise, the being reported now issue Svyatoslav found too.
Implementation wise, yes, I don't want to refactor and re-wrtie a bunch
of this as the first step. The first step should be just what's needed
to move the working functionality out of cmd/bootX.c and in to
boot/bootX.c and let people verify things haven't regressed. Then some
rework can be reviewed on its own.
Later, I could continue the boot-unification effort, e.g. delete the
booti_start() function. and incorporate the uncompressed case into
bootm code.
I'm not sure your reading of the situation is right. I'm more inclined
to revert this merge rather than take on more rework in the area.
Well that would be a pretty poor show. No one else has spent effort
trying to clean up and rationalise this code in the last 10 years, so
far as I can see. This is in -next so there is plenty of time to
resolve any issues.
Yes, things haven't broken in 10 years. This is why I don't value "tidy
up and refactor" in and of itself. Massive reworks then require a long
time to check all of the corner cases. This has been true of many of the
massive reworks you've done over the years.
Well, do you want to be able to boot without CMDLINE, or not?
It's an interesting long term goal. But since there's always some
distribution or another that ships with each release we do, not having
functional regressions is more important than speed of new features.
Being able to practically disable the command line has been a feature
for forever. Being able to literally has implications for both security
and end user rights that need to be considered too.
I believe my reading of the situation is right, but if you disagree
with some of it, please indicate which.
Your answer is not "Oh, I'll go understand what I broke and restore
expected behavior". It's "you should take more of my big rework and I'll
rework this area even more". That is not convincing that we won't have
v2025.07 broken here.
You only took half of the series, remember. We are talking about a
tiny corner case here, with no test coverage. I'm offering to fix the
case that has been reported. I can also even add a test for it.
It wasn't half of the series. I didn't take more because it was either
the stuff that broke other functionality or because it wasn't used until
some further series from you. It's not a tiny corner case. A test case
is nice, sure, but we'll never have 100% test coverage and need physical
testing too. Which most people only have time for around releases. Which
is why we need to go slow. Which again is why you need to split up your
changes more and expect them to go slower. So there's time to test
things.