Hi! > >> + u64 p0, p1; > >> int ret; > >> > >> atomic_set(&late_cpus_in, 0); > >> atomic_set(&late_cpus_out, 0); > >> > >> + p0 = rdtsc_ordered(); > >> + > >> ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask); > >> + > >> + p1 = rdtsc_ordered(); > >> + > >> if (ret > 0) > >> microcode_check(); > >> > >> pr_info("Reload completed, microcode revision: 0x%x\n", > >> boot_cpu_data.microcode); > >> > >> + pr_info("p0: %lld, p1: %lld, diff: %lld\n", p0, p1, p1 - p0); > >> + > >> return ret; > >> } > >> > >> We have used a machine with a broken microcode in BIOS and no microcode in > >> initramfs (to bypass early loading). > >> > >> Here are the results for parallel loading (we made two measurements): > > > >> [ 18.197760] microcode: updated to revision 0x200005e, date = 2019-04-02 > >> [ 18.201225] x86/CPU: CPU features have changed after loading microcode, > >> but might not take effect. > >> [ 18.201230] microcode: Reload completed, microcode revision: 0x200005e > >> [ 18.201232] microcode: p0: 118138123843052, p1: 118138153732656, diff: > >> 29889604 > > > >> Here are the results of serial loading: > >> > >> [ 17.542518] microcode: updated to revision 0x200005e, date = 2019-04-02 > >> [ 17.898365] x86/CPU: CPU features have changed after loading microcode, > >> but might not take effect. > >> [ 17.898370] microcode: Reload completed, microcode revision: 0x200005e > >> [ 17.898372] microcode: p0: 149220216047388, p1: 149221058945422, diff: > >> 842898034 > >> > >> One can see that the difference is an order magnitude. > > > > Well, that's impressive, but it seems to finish 300 msec later? Where does > > that difference > > come from / how much real time do you gain by this? > > The difference comes from the large amount of cores/threads the machine has: > 72 in this case, but there are machines with more. As the commit message says > initially the microcode was applied serially one by one and now the microcode > is updated in parallel on all cores. > > 300ms seems nothing but it is enough to cause disruption in some critical > services (e.g. storage) - 300ms in which we do not execute anything on CPUs. > Also this 300ms is increasing when the machine is fully loaded with guests. >
Yes, but if you look at the dmesgs I quoted, paralel microcode update actually finished 300msec _later_. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
signature.asc
Description: Digital signature