On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez <s...@redhat.com> wrote: > Montes, Julio <julio.mon...@intel.com> writes: > > > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote: > >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <s...@redhat.com> wrote: > >> > Stefan Hajnoczi <stefa...@gmail.com> writes: > >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote: > >> > > > Stefan Hajnoczi <stefa...@gmail.com> writes: > >> > > > > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: > >> > > > -------------- > >> > > > | Conclusion | > >> > > > -------------- > >> > > > > >> > > > The average boot time of microvm is a third of Q35's (115ms vs. > >> > > > 363ms), > >> > > > and is smaller on all sections (QEMU initialization, firmware > >> > > > overhead > >> > > > and kernel start-to-user). > >> > > > > >> > > > Microvm's memory tree is also visibly simpler, significantly > >> > > > reducing > >> > > > the exposed surface to the guest. > >> > > > > >> > > > While we can certainly work on making Q35 smaller, I definitely > >> > > > think > >> > > > it's better (and way safer!) having a specialized machine type > >> > > > for a > >> > > > specific use case, than a minimal Q35 whose behavior > >> > > > significantly > >> > > > diverges from a conventional Q35. > >> > > > >> > > Interesting, so not a 10x difference! This might be amenable to > >> > > optimization. > >> > > > >> > > My concern with microvm is that it's so limited that few users > >> > > will be > >> > > able to benefit from the reduced attack surface and faster > >> > > startup time. > >> > > I think it's worth investigating slimming down Q35 further first. > >> > > > >> > > In terms of startup time the first step would be profiling Q35 > >> > > kernel > >> > > startup to find out what's taking so long (firmware > >> > > initialization, PCI > >> > > probing, etc)? > >> > > >> > Some findings: > >> > > >> > 1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") > >> > saves a > >> > whooping 120ms by avoiding the APIC timer calibration at > >> > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock > >> > > >> > Average boot time with "-cpu host" > >> > qemu_init_end: 76.408950 > >> > linux_start_kernel: 116.166142 (+39.757192) > >> > linux_start_user: 242.954347 (+126.788205) > >> > > >> > Average boot time with default "cpu" > >> > qemu_init_end: 77.467852 > >> > linux_start_kernel: 116.688472 (+39.22062) > >> > linux_start_user: 363.033365 (+246.344893) > >> > >> \o/ > >> > >> > 2. The other 130ms are a direct result of PCI and ACPI presence > >> > (tested > >> > with a kernel without support for those elements). I'll publish > >> > some > >> > detailed numbers next week. > >> > >> Here are the Kata Containers kernel parameters: > >> > >> var kernelParams = []Param{ > >> {"tsc", "reliable"}, > >> {"no_timer_check", ""}, > >> {"rcupdate.rcu_expedited", "1"}, > >> {"i8042.direct", "1"}, > >> {"i8042.dumbkbd", "1"}, > >> {"i8042.nopnp", "1"}, > >> {"i8042.noaux", "1"}, > >> {"noreplace-smp", ""}, > >> {"reboot", "k"}, > >> {"console", "hvc0"}, > >> {"console", "hvc1"}, > >> {"iommu", "off"}, > >> {"cryptomgr.notests", ""}, > >> {"net.ifnames", "0"}, > >> {"pci", "lastbus=0"}, > >> } > >> > >> pci lastbus=0 looks interesting and so do some of the others :). > >> > > > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35, > > kernel won't scan the 255.. buses :) > > I can confirm that adding pci=lastbus=0 makes a significant > improvement. In fact, is the only option from Kata's kernel parameter > list that has an impact, probably because the kernel is already quite > minimalistic. > > Average boot time with "-cpu host" and "pci=lastbus=0" > qemu_init_end: 73.711569 > linux_start_kernel: 113.414311 (+39.702742) > linux_start_user: 190.949939 (+77.535628) > > That's still ~40% slower than microvm, and the breach quickly widens > when adding more PCI devices (each one adds 10-15ms), but it's certainly > an improvement over the original numbers. > > On the other hand, there isn't much we can do here from QEMU's > perspective, as this is basically Guest OS tuning.
fw_cfg could expose this information so guest kernels know when to stop enumerating the PCI bus. This would make all PCI guests with new kernels boot ~50 ms faster, regardless of machine type. The difference between microvm and tuned Q35 is 76 ms now. microvm: qemu_init_end: 64.043264 linux_start_kernel: 65.481782 (+1.438518) linux_start_user: 114.938353 (+49.456571) Q35 with -cpu host and pci=lasbus=0: qemu_init_end: 73.711569 linux_start_kernel: 113.414311 (+39.702742) linux_start_user: 190.949939 (+77.535628) There is a ~39 ms difference before linux_start_kernel. SeaBIOS is loading the PVH Option ROM. Stefano: any recommendations for profiling or tuning SeaBIOS? Stefan