On Mon, Jan 13, 2025 at 12:01:36PM -0700, Simon Glass wrote: > Hi Tom, > > On Fri, 10 Jan 2025 at 09:48, Tom Rini <tr...@konsulko.com> wrote: > > > > On Fri, Jan 10, 2025 at 06:40:37AM -0700, Simon Glass wrote: > > > Hi Tom, > > > > > > On Thu, 9 Jan 2025 at 09:51, Tom Rini <tr...@konsulko.com> wrote: > > > > > > > > On Thu, Jan 09, 2025 at 08:02:01AM -0700, Simon Glass wrote: > > > > > Hi Tom, > > > > > > > > > > On Wed, 8 Jan 2025 at 12:15, Tom Rini <tr...@konsulko.com> wrote: > > > > > > > > > > > > On Wed, Jan 08, 2025 at 10:02:52AM -0700, Simon Glass wrote: > > > > > > > Hi Heinrich, Tom, > > > > > > > > > > > > > > On Tue, 7 Jan 2025 at 08:47, Heinrich Schuchardt > > > > > > > <xypron.g...@gmx.de> wrote: > > > > > > > > > > > > > > > > On 07.01.25 16:11, Tom Rini wrote: > > > > > > > > > On Tue, Jan 07, 2025 at 06:57:50AM -0700, Simon Glass wrote: > > > > > > > > >> Hi Heinrich, > > > > > > > > >> > > > > > > > > >> On Tue, 7 Jan 2025 at 06:11, Heinrich Schuchardt > > > > > > > > >> <xypron.g...@gmx.de> wrote: > > > > > > > > >>> > > > > > > > > >>> On 07.01.25 13:15, Simon Glass wrote: > > > > > > > > >>>> Hi Heinrich, > > > > > > > > >>>> > > > > > > > > >>>> On Mon, 6 Jan 2025 at 10:00, Heinrich Schuchardt > > > > > > > > >>>> <xypron.g...@gmx.de> wrote: > > > > > > > > >>>>> > > > > > > > > >>>>> On 06.01.25 15:47, Simon Glass wrote: > > > > > > > > >>>>>> This test was hamstrung in code review so this series is > > > > > > > > >>>>>> an attempt to > > > > > > > > >>>>>> complete the intended functionality: > > > > > > > > >>>>>> > > > > > > > > >>>>>> - Check memory allocations look correct > > > > > > > > >>>>>> - Check that exit-boot-services removes active-DMA > > > > > > > > >>>>>> devices > > > > > > > > >>>>>> - Check that the bootflow is still present after testapp > > > > > > > > >>>>>> finishes > > > > > > > > >>>>>> > > > > > > > > >>>>>> The EFI functionality duplicates > > > > > > > > >>>>>> bootm_announce_and_cleanup() and still > > > > > > > > >>>>>> uses the defunct board_quiesce_devices() so a nice > > > > > > > > >>>>>> cleanup would be to > > > > > > > > >>>>>> call the bootm function instead, with suitable > > > > > > > > >>>>>> modifications. That would > > > > > > > > >>>>>> allow bootstage to work too. > > > > > > > > >>>>>> > > > > > > > > >>>>>> This series is based on sjg/master since the EFI logging > > > > > > > > >>>>>> was rejected so > > > > > > > > >>>>>> far. > > > > > > > > >>>>> > > > > > > > > >>>>> Yes, it was rejected because a solution at the lib/log.c > > > > > > > > >>>>> level would be > > > > > > > > >>>>> more generic. > > > > > > > > >>>> > > > > > > > > >>>> As I mentioned, that idea isn't suitable for programmatic > > > > > > > > >>>> use. > > > > > > > > >>> > > > > > > > > >>> What can be done with show_addr("mem", rec->memory); that > > > > > > > > >>> log_debug() > > > > > > > > >>> does not offer or which you could not do with a new log > > > > > > > > >>> function in > > > > > > > > >>> lib/log.c that takes variadic arguments? > > > > > > > > >> > > > > > > > > >> There are asserts in [1], for example. How do you propose to > > > > > > > > >> handle > > > > > > > > >> that? See [2] for my previous explanation, quoted here: > > > > > > > > >> > > > > > > > > >>> CONFIG_LOG with a bloblist option would be a great idea, > > > > > > > > >>> but it's hard > > > > > > > > >>> to programmatically scan text...plus only the external call > > > > > > > > >>> sites are > > > > > > > > >>> actually logged. > > > > > > > > >> > > > > > > > > >> Also see the discussion on the original patch [3]. There was > > > > > > > > >> also your > > > > > > > > >> reply at [4], but I think you missed that this is intended > > > > > > > > >> for use in > > > > > > > > >> unit tests (i.e. with ut_assert()). > > > > > > > > >> > > > > > > > > >> You also requested that this be generalised, rather than > > > > > > > > >> being > > > > > > > > >> EFI-loader-specific. I have no objection to that, but don't > > > > > > > > >> have a use > > > > > > > > >> case for it yet, so have deferred that to later. It's a > > > > > > > > >> fairly simple > > > > > > > > >> change, if/when needed. If the series was not NAKed, I'd be > > > > > > > > >> happy to > > > > > > > > >> do it now. > > > > > > > > >> > > > > > > > > >>>> > > > > > > > > >>>>> > > > > > > > > >>>>> Tom suggested not to send patches that are for private > > > > > > > > >>>>> enjoyment to the > > > > > > > > >>>>> mailing list. > > > > > > > > >>>> > > > > > > > > >>>> My contributions to U-Boot are only ever about private > > > > > > > > >>>> enjoyment :-) > > > > > > > > >>>> > > > > > > > > >>>> Do you have any comments on the patches? > > > > > > > > >> > > > > > > > > >> Regards, > > > > > > > > >> Simon > > > > > > > > >> > > > > > > > > >> [1] > > > > > > > > >> https://patchwork.ozlabs.org/project/uboot/patch/20250106144755.3054780-6-...@chromium.org/ > > > > > > > > >> [2] > > > > > > > > >> https://lore.kernel.org/u-boot/CAFLszTjxOE_037+kR0jgdax80sBombYo_k0YgiuVnP=kzco...@mail.gmail.com/ > > > > > > > > >> [3] > > > > > > > > >> https://lore.kernel.org/u-boot/CAC_iWjKtaN54B98OKbkoXkC_GmKJ=x+M4=uy_o6rosopzad...@mail.gmail.com/ > > > > > > > > >> [4] > > > > > > > > >> https://lore.kernel.org/u-boot/d513d326-41a6-425e-b11f-85958065b...@gmx.de/ > > > > > > > > > > > > > > > > > > Looking at the logging portions of the original series again, > > > > > > > > > especially > > > > > > > > > if this was made generic, we probably don't want to print to > > > > > > > > > actual > > > > > > > > > console every time we're making a note of some memory > > > > > > > > > allocation for > > > > > > > > > example, that would be unreadable outside of a debug context. > > > > > > > > > The point > > > > > > > > > of this really seems to be "log things for verifying in tests > > > > > > > > > later". > > > > > > > > > Does that end up being useful? I don't know. Heinrich or > > > > > > > > > Ilias, do the > > > > > > > > > tests in [1] look generally useful? > > > > > > > > > > > > > > > > > > > > > > > > > The tests in [1] are not documented, not even in the commit > > > > > > > > message. So > > > > > > > > the reasoning behind the tests remains Simon's secret. > > > > > > > > > > > > > > Are you asking for code comments in the test? If so, I can add > > > > > > > some. > > > > > > > > > > > > > > > > > > > > > > > At first sight the tests in [1] don't make much sense. E.g. > > > > > > > > that only a > > > > > > > > subset of memory types have been used does not tell that the > > > > > > > > right > > > > > > > > memory type has been used for the right object. > > > > > > > > > > > > > > It is a pretty good start, though. It makes sure that the memory > > > > > > > types > > > > > > > are sane, checks addresses are within DRAM, etc. With [5] it makes > > > > > > > sure that devices are removed. > > > > > > > > > > > > > > > > > > > > > > > Implementing a specific tracing functionality for EFI is > > > > > > > > definitively > > > > > > > > the wrong way forward as it will lead to code duplication. > > > > > > > > > > > > > > We can cross that bridge when we come to it. > > > > > > > > > > > > Well, no. It's backwards to make a bridge in one place when everyone > > > > > > agrees it needs to be moved somewhere else. I mean [5] is a generic > > > > > > issue and test/py/tests/test_net_boot.py or some other test we > > > > > > already > > > > > > have which tests booting an OS should confirm that we've quiesced > > > > > > devices before moving on. And as a bonus it's in python where > > > > > > dealing > > > > > > with strings doesn't suck. > > > > > > > > > > I really don't want to write C tests in Python. CI is slow enough as > > > > > it is, something realy want to fix. I'm also not sure how you can tell > > > > > if a device has been removed. Run 'dm tree' and look for the missing > > > > > 'star' in the resulting 300 lines of text? > > > > > > > > As I'm in a bisect-hell in our C tests you'll have to forgive me for not > > > > thinking the C tests are noticeably faster than python tests. Or that > > > > they aren't their own potential source of corner-case bugs. But I > > > > digress.. > > > > > > Welcome to my world. I bisected my lab devices so many times to try to > > > isolate all the breakages that have crept in. What is the problem, > > > maybe I can help? > > > > Sure. test/cmd/hash.c::dm_test_cmd_hash_md5 fails randomly, in maybe 1 > > out of 100 runs, via pytest, in sandbox. Not via "./u-boot -T -c 'ut dm > > dm_test_cmd_hash_md5'" however (I stopped checking after 1000 > > iterations). I was iterating over "and built with clang" but I think it > > happens with gcc too, from the actual failures in CI. And you can use > > "-k ut" to limit to just what's matched there, so it's a quicker > > iteration. > > Hmmm do you have a link? It's hard to imagine what it is, but perhaps > a dependency on a previous test.
Sure: https://source.denx.de/u-boot/u-boot/-/jobs/993200#L286 My current gut feeling is that we've got some section overlap issue somewhere. And, FWIW, if I turn off EFI_LOADER I still see it. And if you use the same binary it won't always fail. > At present 'ut all' fails so I am going to take a look at that. Quite > a bit of clean-up needed in test system, though. Ideally we could run > the tests in random order so we can find and fix the dependencies. For > driver model we reinit as needed, but that's not the case for EFI, for > example. Personally, for making pytest faster I'd look at the general recommendations various blog posts about "make your pytest run faster" and then go from there. > > > > And yes, taking a bunch of text and parsing it, is what python is fast > > > > at. And easier to write. > > > > > > > > > But actually [5] is not generic, since EFI uses its own code to remove > > > > > devices. This test is solely focussed on EFI. > > > > > > > > Yes, you're testing the EFI version of the code in > > > > arch/$(ARCH)/lib/bootm.c. The remove devices functions being called in > > > > both cases are generic. > > > > > > The code in EFI is: > > > > > > if (!efi_st_keep_devices) { > > > bootm_disable_interrupts(); > > > if (IS_ENABLED(CONFIG_USB_DEVICE)) > > > udc_disconnect(); > > > board_quiesce_devices(); > > > dm_remove_devices_active(); > > > } > > > > > > It does call somewhat the same functions, but is doing its own thing, > > > not even using the arch-specific code. As I mentioned, a nice clean-up > > > would be to make bootm_announce_and_cleanup() common. > > > > Yes, we almost agree? Both the EFI code, and arch/$(ARCH)/lib/bootm.c > > have functions that make the above calls. A nice clean-up would be to > > have something common. > > Yes indeed. It still does not provide a test for the EFI bootmeth, > though, where I found half a dozen bugs. Yes, the bootmeth code is it's own thing. > > > Actually, now that I see efi_st_keep_devices, I wonder why Heinrich > > > didn't want my ANSI patch[6] which serves a similar function. > > > > No? Your patch disables ANSI output in those tests, that variable is for > > making sure those tests can accomplish (if I skim things right) similar > > kinds of tests you've asked for before, but with an EFI app instead? But > > perhaps better to not start yet another tangent here... > > I wouldn't know where to start, anyway... > > > > > > If you want the logging to be renamed and placed centrally I don't > > > > > mind doing it now. But note that only EFI will use it for now. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We already have function _log() which is variadic. > > > > > > > > > > > > > > > > Simon could write a new log driver that parses the `format` > > > > > > > > parameter > > > > > > > > and saves the binary data in an appropriate format for analysis > > > > > > > > by the > > > > > > > > unit tests: > > > > > > > > > > > > > > > > * For %s the driver should save the string and not the address > > > > > > > > of the > > > > > > > > string. > > > > > > > > * For %pD the driver should save the device path instead of the > > > > > > > > pointer. > > > > > > > > * ... > > > > > > > > > > > > > > > > Some changes to the log driver interface will be needed to pass > > > > > > > > the > > > > > > > > variadic arguments instead of the formatted message. > > > > > > > > > > > > > > Perhaps the word 'log' is confusing people. But the above > > > > > > > suggestion > > > > > > > is quite a complicated way of handling things. We have no way to > > > > > > > decode printf() strings in this way. See log_dispatch() for how > > > > > > > this > > > > > > > is handled today. It uses sprintf(). Trying to test based on text > > > > > > > output would be very clumsy (lots of regexes and sscan() calls?) > > > > > > > and > > > > > > > result in a huge amount of parsing code, highly dependent on the > > > > > > > printf() format, etc. > > > > > > > > > > > > > > I very-much doubt that would produce a useful implementation, but > > > > > > > if > > > > > > > you would like to try it out then I would be happy to look at it. > > > > > > > > > > > > > > I mentioned this several times, but even if we did go that way, we > > > > > > > only have logging on the external calls, so much of the EFI-memory > > > > > > > allocation in U-Boot would not be logged. > > > > > > > > > > > > > > Regards, > > > > > > > Simon > > > > > > > > > > > > > > [5] > > > > > > > https://patchwork.ozlabs.org/project/uboot/patch/20250106144755.3054780-9-...@chromium.org/ > > > > > > > > > > > > Yes, calling this a "log" when it's intended for capturing > > > > > > information > > > > > > for tests got some of this off on the wrong track. But that also > > > > > > helps > > > > > > explain now that this is still on the wrong track and should > > > > > > instead be > > > > > > following normal design practices for testing and expanding existing > > > > > > infrastructure and not inventing a new everything. So if you don't > > > > > > like > > > > > > Heinrich's suggestion, take a look at Caleb's suggestion. > > > > > > > > > > I don't have the energy to port the tracing framework from Linux to > > > > > U-Boot, although I agree it would be useful. Still, function tracing > > > > > is quite fragile and confusing to work with when refactoring code. I > > > > > don't like that idea much for this use case, although if function > > > > > tracing did exist in U-Boot I would likely have used it. > > > > > > > > I mean yes, it would be good if you went back and expanded on the trace > > > > functionality you did before. > > > > > > I still don't believe it is the best solution and seems like yet > > > another ocean I should avoid sticking my heater into. > > > > I strongly disagree. If you go back to the trace code you brought in to > > start with and make it more useful / include newer features existing > > elsewhere you're not going to end up in conflict with everyone asking > > why you're doing something subsystem specific. > > Perhaps someone else could do this? It would be a substantial amount > of work to bring runtime tooling into U-Boot, bpf and the like. It > would be quite a pain to use, I suspect, and certainly not possible to > write a simple C test as I have done here. I'm not sure where bpf and the like comes from in what I said here. But again, if you find that logging thing you wrote handy, keep it in a topic branch somewhere and then later on you can "Told You So" the rest of us later on. > > > > > > And if you > > > > > > don't like Caleb's suggestion, go put this in a topic branch you can > > > > > > merge when you need to debug some problem that seemingly nothing > > > > > > else > > > > > > will catch. > > > > > > > > > > Here we are over a year after I reported the bug and we still don't > > > > > have a test to cover it. This series is better than the available > > > > > alternatives, IMO. > > > > > > > > Well, no. We have commit dabaa4ae3206 ("dm: Add > > > > dm_remove_devices_active() for ordered device removal") we have a test > > > > for the underlying problem. We need more functional boot tests, but we > > > > need those to be in python too, and not more C code. > > > > > > That is a nice improvement, but did not fix the underlying problem. > > > The underlying problem was that EFI was calling exit-boot-services, > > > causing U-Boot to free up data structures which were needed to boot. > > > This was on x86_64. I never quite figured out which one (very hard > > > when you cannot get back to U-Boot to check). > > > > > > There were quite a lot of problems, actually. There v2 series is at [7] > > > > > > Only a C test can check what actually happens inside U-Boot. > > > > Yes, I think now we get back to disagreeing on which symptoms lead to > > which code problems and then what to do about them. > > OK > > > > > > > And you're not just coming up with a test, you're refactoring a bunch of > > > > code and introducing new subsystems in order to do that. When as I keep > > > > pointing out, we don't need that. We could easily extend the existing OS > > > > boot tests we have to script booting an ISO. And we only run those when > > > > say "ENABLE_SLOW_TESTS" is set, and only do that on tagged releases. > > > > > > Yes of course we need to refactor to make tests work. This is not > > > necessarily a bad thing, as it helps us break code down into testable > > > chunks. We cannot rely only on large functional-tests, not that you > > > are suggesting that. See [8], but they are too slow, too hard to debug > > > when they fail. They also tend to devolve into chaos as people get > > > lazy and stop writing unit/smaller tests. > > > > I'll just note that I don't ever even think to use "make tests" or > > "qcheck" or any of the others since they never work for me. > > Would you mind filing an issue on that? I use 'make pcheck' all the time. Done: https://source.denx.de/u-boot/custodians/u-boot-dm/-/issues/33 I didn't file one for "qcheck" which fails differently in that same environment. -- Tom
signature.asc
Description: PGP signature