On 08/10/17 05:45, Alex Williamson wrote: > On Thu, 10 Aug 2017 00:29:36 +0200 > Laszlo Ersek <ler...@redhat.com> wrote: > >> On 08/09/17 23:37, Alex Williamson wrote: >>> On Wed, 09 Aug 2017 21:55:00 +0100 >>> "Patrick O'Callaghan" <p...@usb.ve> wrote: >>> >>>> On Wed, 2017-08-09 at 13:24 -0500, David wrote: >>>>> Anyone else having trouble with a recent version of KVM / QEMU? >>>>> Also I am still a Linux newbie, how should I troubleshoot this? >>>> >>>> For one thing, you could start by looking in the QEMU log file >>>> and/or the system journal. You don't give any information about the >>>> VM (e.g. I assume you're using GPU passthrough but you don't say >>>> anything about it) so it's going to be hard for anyone else to >>>> guess what the problem is. Perhaps if you post the XML file it >>>> might give someone a clue. >>>> >>>> Also, note that Fedora 24 was EOL-ed today. You should update your >>>> system to at least F25 as soon as possible. I'm on F26 and having >>>> no problems. >>> >>> Yep, really hard to act on the limited information here. Is it by >>> chance a GPU assigned VM running OVMF and does that OVMF come from >>> the kraxel repo rather than the base fedora repo? Thanks, >> >> Yes, someone who can reproduce the problem -- from the reports, there >> are several users -- will have to bite the bullet, and bisect OVMF, >> and/or bisect the host kernel. > > Done. As with David, I hit the problem that my previously working VM > just hangs with all the vCPUs pegged. Replacing Gerd's OVMF build > with an older one from the virt-preview repo resolves the issue. > Bisecting OVMF lands here: > > commit 3b2928b46987693caaaeefbb7b799d1e1de803c0 > Author: Michael Kinney <michael.d.kin...@intel.com> > Date: Wed May 17 12:19:16 2017 -0700 > > UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues > > https://bugzilla.tianocore.org/show_bug.cgi?id=565 > > Fix NASM compatibility issues with XCODE5 tool chain. > The XCODE5 tool chain for X64 builds using PIE (Position > Independent Executable). For most assembly sources using > PIE mode does not cause any issues. > > However, if assembly code is copied to a different address > (such as AP startup code in the MpInitLib), then the > X64 assembly source must be implemented to be compatible > with PIE mode that uses RIP relative addressing. > > The specific changes in this patch are: > > * Use LEA instruction instead of MOV instruction to lookup > the addresses of functions. > > * The assembly function RendezvousFunnelProc() is copied > below 1MB so it can be executed as part of the MpInitLib > AP startup sequence. RendezvousFunnelProc() calls the > external function InitializeFloatingPointUnits(). The > absolute address of InitializeFloatingPointUnits() is > added to the MP_CPU_EXCHANGE_INFO structure that is passed > to RendezvousFunnelProc(). > > Cc: Andrew Fish <af...@apple.com> > Cc: Jeff Fan <jeff....@intel.com> > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Michael D Kinney <michael.d.kin...@intel.com> > Reviewed-by: Jeff Fan <jeff....@intel.com> > Reviewed-by: Andrew Fish <af...@apple.com> > > Reverting this patch against current HEAD (7ef0dae092af) also gives me > a working image. When it fails, it only gets this far: > > SecCoreStartupWithStack(0xFFFCC000, 0x818000) > Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE > Install PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3 > Install PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A > The 0th FV start address is 0x00000820000, size is 0x000E0000, handle is > 0x820000 > Register PPI Notify: 49EDB1C1-BF21-4761-BB12-EB0031AABB39 > Register PPI Notify: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38 > Install PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6 > Install PPI: DBE23AA9-A345-4B97-85B6-B226F1617389 > Loading PEIM at 0x0000082B880 EntryPoint=0x0000082E8F9 PcdPeim.efi > Install PPI: 06E81C58-4AD7-44BC-8390-F10265F72480 > Install PPI: 01F34D25-4DE2-23AD-3FF3-36353FF323F1 > Install PPI: 4D8B155B-C059-4C8F-8926-06FD4331DB8A > Install PPI: A60C6B59-E459-425D-9C69-0BCC9CB27D81 > Loading PEIM at 0x00000830040 EntryPoint=0x00000831415 > ReportStatusCodeRouterPei.efi > Install PPI: 0065D394-9951-4144-82A3-0AFC8579C251 > Install PPI: 229832D3-7A30-4B36-B827-F40CB7D45436 > Loading PEIM at 0x00000831F40 EntryPoint=0x0000083318A > StatusCodeHandlerPei.efi > Loading PEIM at 0x00000833DC0 EntryPoint=0x00000837D0E PlatformPei.efi > Select Item: 0x0 > FW CFG Signature: 0x554D4551 > Select Item: 0x1 > FW CFG Revision: 0x3 > QemuFwCfg interface (DMA) is supported. > Platform PEIM Loaded > CMOS: > 00: 17 00 30 00 21 00 04 09 08 17 26 02 10 80 00 00 > 10: 00 00 00 00 06 80 02 FF FF 00 00 00 00 00 00 00 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 30: FF FF 20 00 00 BF 00 20 30 00 00 00 00 12 00 00 > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 50: 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 05 > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Select Item: 0x19 > Select Item: 0x28 > S3 support was detected on QEMU > Install PPI: 7408D748-FC8C-4EE6-9288-C4BEC092A410 > Select Item: 0x19 > Select Item: 0x24 > Select Item: 0x19 > Select Item: 0x19 > GetFirstNonAddress: Pci64Base=0x800000000 Pci64Size=0x800000000 > Select Item: 0x5 > MaxCpuCountInitialization: QEMU reports 6 processor(s) > PublishPeiMemory: mPhysMemAddressWidth=36 PeiMemoryCap=65800 KB > PeiInstallPeiMemory MemoryBegin 0xBBF0E000, MemoryLength 0x4042000 > QemuInitializeRam called > Select Item: 0x19 > Select Item: 0x24 > Reserved variable store memory: 0xBFECC000; size: 528kb > Platform PEI Firmware Volume Initialization > Install PPI: 49EDB1C1-BF21-4761-BB12-EB0031AABB39 > Notify: PPI Guid: 49EDB1C1-BF21-4761-BB12-EB0031AABB39, Peim notify entry > point: 826922 > The 1th FV start address is 0x00000900000, size is 0x00A00000, handle is > 0x900000 > Select Item: 0x19 > Select Item: 0x19 > Select Item: 0x19 > Select Item: 0x25 > Register PPI Notify: EE16160A-E8BE-47A6-820A-C6900DB0250A > Temp Stack : BaseAddress=0x814000 Length=0x4000 > Temp Heap : BaseAddress=0x810000 Length=0x4000 > Total temporary memory: 32768 bytes. > temporary memory stack ever used: 16384 bytes. > temporary memory heap used: 8000 bytes. > Old Stack size 16384, New stack size 131072 > Stack Hob: BaseAddress=0xBBF0E000 Length=0x20000 > Heap Offset = 0xBB71E000 Stack Offset = 0xBB716000 > TemporaryRamMigration(0x810000, 0xBBF2A000, 0x8000) > Loading PEIM at 0x000BFEBF000 EntryPoint=0x000BFEC7C48 PeiCore.efi > Reinstall PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3 > Reinstall PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A > Reinstall PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6 > Install PPI: F894643D-C449-42D1-8EA8-85BDD8C65BDE > Loading PEIM at 0x000BFEBB000 EntryPoint=0x000BFEBD941 DxeIpl.efi > Install PPI: 1A36E4E7-FAB6-476A-8E75-695A0576FDD7 > Install PPI: 0AE8CE5D-E448-4437-A8D7-EBF5F194F731 > Loading PEIM at 0x000BFEB7000 EntryPoint=0x000BFEB9304 S3Resume2Pei.efi > Install PPI: 6D582DBC-DB85-4514-8FCC-5ADF6227B147 > Loading PEIM at 0x000BFEAF000 EntryPoint=0x000BFEB3189 CpuMpPei.efi > AP Loop Mode is 1 > WakeupBufferStart = 9F000, WakeupBufferSize = 1000 > <hang> > > Without the above patch, we continue on as: > > APIC MODE is 1 > MpInitLib: Find 6 processors in system. > Does not find any stored CPU BIST information from PPI! > APICID - 0x00000000, BIST - 0x00000000 > APICID - 0x00000001, BIST - 0x00000000 > APICID - 0x00000002, BIST - 0x00000000 > APICID - 0x00000003, BIST - 0x00000000 > APICID - 0x00000004, BIST - 0x00000000 > APICID - 0x00000005, BIST - 0x00000000 > Install PPI: 9E9F374B-8F16-4230-9824-5846EE766A97 > Install PPI: EE16160A-E8BE-47A6-820A-C6900DB0250A > Notify: PPI Guid: EE16160A-E8BE-47A6-820A-C6900DB0250A, Peim notify entry > point: 835C29 > DXE IPL Entry > Loading PEIM at 0x000BFE5B000 EntryPoint=0x000BFE605E2 DxeCore.efi > Loading DXE CORE at 0x000BFE5B000 EntryPoint=0x000BFE605E2 > Install PPI: 605EA650-C65C-42E1-BA80-91A52AB618C6 > CoreInitializeMemoryServices: > BaseAddress - 0xBBF32000 Length - 0x3EC7000 MinimalMemorySizeNeeded - > 0x10F4000 > ... > > > Given the patch identified by bisect, I'll also note that my build > environment is recent F26 system, I don't see any toolchain stuff > available for update. > > $ nasm --version > NASM version 2.13.01 compiled on May 22 2017
Thus far it doesn't seem to be related to NASM. I built NASM 2.13.01 from source, and then built CpuMpPei using both NASM 2.10.07-7.el7 and upstream 2.13.01. I compared two files between the build outputs: - MpFuncs.obj, which comes directly from the assembly source modified by commit 3b2928b46987, - CpuMpPei.efi, which is the final linked-together PEIM binary that gets executed during boot. Regarding MpFuncs.obj, there is a single byte difference between the binaries built by both versions of NASM. From "cmp -l": > 305 4 10 The difference can be seen with "readelf --all --wide" better: > --- MpFuncs.obj.el7.readelf 2017-08-10 13:36:38.494517531 +0200 > +++ MpFuncs.obj.2.13.01.readelf 2017-08-10 13:37:02.063251900 +0200 > @@ -22,11 +22,11 @@ > Section Headers: > [Nr] Name Type Address Off Size ES > Flg Lk Inf Al > [ 0] NULL 0000000000000000 000000 000000 00 > 0 0 0 > [ 1] .text PROGBITS 0000000000000000 000180 000295 00 > AX 0 0 16 > [ 2] .shstrtab STRTAB 0000000000000000 000420 000021 00 > 0 0 1 > - [ 3] .symtab SYMTAB 0000000000000000 000450 0004b0 18 > 4 45 4 > + [ 3] .symtab SYMTAB 0000000000000000 000450 0004b0 18 > 4 45 8 > [ 4] .strtab STRTAB 0000000000000000 000900 0003c9 00 > 0 0 1 > Key to Flags: > W (write), A (alloc), X (execute), M (merge), S (strings), l (large) > I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) > O (extra OS processing required) o (OS specific), p (processor specific) That is, the difference is between the Alignment fields of the SYMTAB section header. It goes from 4 to 8. Regarding CpuMpPei.efi (i.e., the linked-together binary), there is *no* difference. And, indeed I cannot reproduce the boot failure on RHEL-7, despite using NASM 2.13.01. All the above was done with a NOOPT build (compiler optimizations disabled, DEBUGs and ASSERT()s built into the code). I repeated the exercise with DEBUG too (which is what both kraxel and Fedora ship, meaning compiler optimizations enabled, DEBUGs and ASSERT()s built into the code). I experienced no boot failure this way either. So I think it comes down to another toolchain difference between RHEL-7 and Fedora-26. Likely gcc -- the commit you identified is also related to a C compiler. In particular: - on RHEL-7, the system compiler (gcc-4.8.5) is mapped to the "GCC48" toolchain settings of edk2, which lack support for LTO (link time optimization), - in the Fedora package's SPEC file, the "GCC49" toolchain settings are selected for the build in a fixed manner, which also lack support for LTO, - in kraxel's SPEC file, Fedora-26's gcc-7 compiler is mapped to the "GCC5" toolchain settings, which *enable* LTO (for DEBUG). The test you suggested to David elsewhere in this thread confirms that the Fedora -- well, "virt-preview" -- package, built with GCC49 settings, works, and that Gerd's package, built with GCC5 settings, breaks. I'll spin up a Fedora-26 guest and build OVMF there. Thanks Laszlo _______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users