On Fri, May 03, 2019 at 10:10:57AM +1000, Stewart Smith wrote: > David Gibson <da...@gibson.dropbear.id.au> writes: > > On Wed, May 01, 2019 at 01:42:21PM +1000, Alexey Kardashevskiy wrote: > >> At the moment, on 256CPU + 256 PCI devices guest, it takes the guest > >> about 8.5sec to fetch the entire device tree via the client interface > >> as the DT is traversed twice - for strings blob and for struct blob. > >> Also, "getprop" is quite slow too as SLOF stores properties in a linked > >> list. > >> > >> However, since [1] SLOF builds flattened device tree (FDT) for another > >> purpose. [2] adds a new "fdt-fetch" client interface for the OS to fetch > >> the FDT. > >> > >> This tries the new method; if not supported, this falls back to > >> the old method. > >> > >> There is a change in the FDT layout - the old method produced > >> (reserved map, strings, structs), the new one receives only strings and > >> structs from the firmware and adds the final reserved map to the end, > >> so it is (fw reserved map, strings, structs, reserved map). > >> This still produces the same unflattened device tree. > >> > >> This merges the reserved map from the firmware into the kernel's reserved > >> map. At the moment SLOF generates an empty reserved map so this does not > >> change the existing behaviour in regard of reservations. > >> > >> This supports only v17 onward as only that version provides dt_struct_size > >> which works as "fdt-fetch" only produces v17 blobs. > >> > >> If "fdt-fetch" is not available, the old method of fetching the DT is used. > >> > >> [1] https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=e6fc84652c9c00 > >> [2] https://git.qemu.org/?p=SLOF.git;a=commit;h=ecda95906930b80 > >> > >> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > > > > Hrm. I've gotta say I'm not terribly convinced that it's worth adding > > a new interface we'll need to maintain to save 8s on a somewhat > > contrived testcase. > > 256CPUs aren't that many anymore though. Although I guess that many PCI > devices is still a little uncommon. > > A 4 socket POWER8 or POWER9 can easily be that large, and a small test > kernel/userspace will boot in ~2.5-4 seconds. So it's possible that > the device tree fetch could be surprisingly non-trivial percentage of boot > time at least on some machines.
All client interface calls are really heavy, and you need to do a lot of them if you have a big device tree. This takes time, even if the linked list stuff does not kill you :-) Segher