On 3/26/20 5:19 PM, Simon Glass wrote: > Hi Patrick, Hi,
> On Wed, 25 Mar 2020 at 09:57, Patrick DELAUNAY <patrick.delau...@st.com> > wrote: >> >> Hi, >> >>> From: Marek Vasut <ma...@denx.de> >>> Sent: mercredi 25 mars 2020 00:39 >>> >>> Hi, >>> >>> I was looking at the STM32MP1 boot time and I noticed it takes about 2 >>> seconds >>> to get to U-Boot. >> >> Thanks for the feedback. >> >> To be clear, the SPL is not the ST priority as we have many limitation >> (mainly on >> power management) for the SPL boot chain (stm32mp15_basic_defconfig): >> Rom code => SPL => U-Boot >> >> The preconized boot chain for STM32MP1 is Rom code => TF-A => U-Boot >> (stm32mp15_trusted_defconfg). >> >>> One problem is the insane I2C timing calculation in stm32f7 i2c driver, >>> which is >>> almost a mallocator and CPU stress test and takes about 1 second to >>> complete in >>> SPL -- we need some simpler replacement for that, possibly the one in DWC >>> I2C >>> driver might do? >> >> Our first idea to manage this I2C settings (prescaler/timings setting) was >> to set this values >> in device tree, but this binding was refused so this function >> stm32_i2c_choose_solution() > > Was the binding refused in linux? Could we add something > U-Boot-specific then? I think having 'early' timings, etc. is very > handy. We are doing this on x86. > > Of course it has traditionally been impossible to convince Linux > people to add this sort of thing. Still, I think we should do it. Our > U-Boot-specific files allow this. Or reuse the DWC I2C driver timing calculation, which is real simple, fast, and should be accurate enough. >> provided the better settings for any input clock and I2C frequency (called >> for each probe). >> >> But it is brutal and not optimum solution: try all the solution to found the >> better one. >> And the performance problem of this loop (shared code between Linux / >> U-Boot/TF-A drivers) >> had be already see/checked on ST side in TF-A context. > > We should be able to calculate it, like with dw-i2c. Yes >> We try to improve the solution, without success, but finally the performance >> issue >> was solved by dcache activation in TF-A before to execute this loop. > > I would like to see patches to enable the cache. We did this some > years ago in a Chromebook and it made a big difference. It is not that > hard. ACK. Why did the chromebook patches never make it upstream ? >> But as in SPL the data cache is not activated, this loop has terrible >> performance. >> >> We need to ding again of this topic for U-Boot point of view >> (SPL & also in U-Boot, before relocation and after relocation) . >> >> And I had shared this issue with the ST owner of this code. >> >> For information, I add some trace and I get for same code execution on DK2 >> board. >> - 440ms in SPL (dcache OFF) >> - 36ms in U-Boot (dcache ON) >> >>> Another item I found is that, in U-Boot, initf_dm() takes about half a >>> second and so >>> does serial_init(). I didn't dig into it to find out why, but I suspect it >>> has to do with >>> the massive amount of UCLASSes the DM has to traverse OR with the CPU being >>> slow at that point, as the clock driver didn't get probed just yet. >>> >>> Thoughts ? >> >> Yes, it is the first parsing of device tree, and it is really slow... >> directly linked to device >> tree size and libfdt. > > I wonder if we can improve this. There was a change to how the drivers > were bound (changing the ordering). We could perhaps revert that for > SPL. Link ? [...] -- Best regards, Marek Vasut