On 08.01.21 03:20, Nick Desaulniers wrote: > On Thu, Jan 7, 2021 at 3:27 PM David Hildenbrand <dhild...@redhat.com> wrote: >> >> >>> Am 08.01.2021 um 00:21 schrieb Nick Desaulniers <ndesaulni...@google.com>: >>> >>> On Thu, Jan 7, 2021 at 3:13 PM David Hildenbrand <da...@redhat.com> wrote: >>>> >>>> RISBHG is broken and currently hinders clang builds of upstream kernels >>>> from booting: the kernel crashes early, while decompressing the image. >>>> >>>> [...] >>>> Kernel fault: interruption code 0005 ilc:2 >>>> Kernel random base: 0000000000000000 >>>> PSW : 0000200180000000 0000000000017a1e >>>> R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3 >>>> GPRS: 0000000000000001 0000000c00000000 00000003fffffff4 00000000fffffff0 >>>> 0000000000000000 00000000fffffff4 000000000000000c 00000000fffffff0 >>>> 00000000fffffffc 0000000000000000 00000000fffffff8 00000000008e25a8 >>>> 0000000000000009 0000000000000002 0000000000000008 000000000000bce0 >>>> >>>> One example of a buggy instruction is: >>>> >>>> 17dde: ec 1e 00 9f 20 5d risbhg %r1,%r14,0,159,32 >>>> >>>> With %r14 = 0x9 and %r1 = 0x7 should result in %r1 = 0x900000007, however, >>>> results in %r1 = 0. >>>> >>>> Let's interpret values of i3/i4 as documented in the PoP and make >>>> computation of "mask" only based on i3 and i4 and use "pmask" only at the >>>> very end to make sure wrapping is only applied to the high/low doubleword. >>>> >>>> With this patch, I can successfully boot a v5.10 kernel built with >>>> clang, and gcc builds keep on working. >>>> >>>> Fixes: 2d6a869833d9 ("target-s390: Implement RISBG") >>>> Reported-by: Nick Desaulniers <ndesaulni...@google.com> >>>> Cc: Guenter Roeck <li...@roeck-us.net> >>>> Cc: Christian Borntraeger <borntrae...@de.ibm.com> >>>> Signed-off-by: David Hildenbrand <da...@redhat.com> >>>> --- >>>> >>>> This BUG was a nightmare to debug and the code a nightmare to understand. >>>> >>>> To make clang/gcc builds boot, the following fix is required as well on >>>> top of current master: "[PATCH] target/s390x: Fix ALGSI" >>>> https://lkml.kernel.org/r/20210107202135.52379-1-da...@redhat.com >>> >>> In that case, a huge thank you!!! for this work! ++beers_owed. >>> >> >> :) a kernel build for z13 should work with the (default) „-cpu qemu“ cpu >> type. > > Hmm...so I don't think clang can build a Linux kernel image with > CONFIG_MARCH_Z13=y just yet; just defconfig. Otherwise looks like > clang barfs on some of the inline asm constraints. >
Ah, right. I overwrote my manual config by a temporary defconfig :) So, I'm on x86-64 F33. clang version 11.0.0 (Fedora 11.0.0-2.fc33) LLVM version 11.0.0 I cannot directly use "LLVM=1" for cross-compilation, as I keep getting "error: unknown emulation: elf64_s390" from ld.lld and "error: invalid output format: 'elf64-s390'" from llvm-objcopy. I assume that's fixed in llvm12? 1. I patch around it (strange, I remember CC= .. used to work, but it no longer does) --- index e30cf02da8b8..89c57062ed5d 100644 --- a/Makefile +++ b/Makefile @@ -427,13 +427,13 @@ KBUILD_HOSTLDLIBS := $(HOST_LFS_LIBS) $(HOSTLDLIBS) CPP = $(CC) -E ifneq ($(LLVM),) CC = clang -LD = ld.lld -AR = llvm-ar -NM = llvm-nm -OBJCOPY = llvm-objcopy -OBJDUMP = llvm-objdump -READELF = llvm-readelf -STRIP = llvm-strip +LD = $(CROSS_COMPILE)ld +AR = $(CROSS_COMPILE)ar +NM = $(CROSS_COMPILE)nm +OBJCOPY = $(CROSS_COMPILE)objcopy +OBJDUMP = $(CROSS_COMPILE)objdump +READELF = $(CROSS_COMPILE)readelf +STRIP = $(CROSS_COMPILE)strip else CC = $(CROSS_COMPILE)gcc LD = $(CROSS_COMPILE)ld --- 2. Compile using clang Using latest linux-next (1c925d2030afd354a02c23500386e620e662622b) + above patch --- #!/bin/bash export ARCH=s390; export CROSS_COMPILE=s390x-linux-gnu- export LLVM=1 make distclean make defconfig # Make F32 initrd boot without inserting modules ./scripts/config -e CONFIG_SCSI_ISCSI_ATTRS ./scripts/config -e CONFIG_ISCSI_TCP make -j40 > /dev/null --- 3. Run it via QEMU. I boot a full Fedora 32 using the cloud-image + initrd from Fedora 32 (tried to stick to your cmdline where possible) ./build/qemu-system-s390x \ -m 512M \ -cpu qemu \ -display none \ -nodefaults \ -kernel ../linux-cross/arch/s390/boot/bzImage \ -append "root=/dev/vda1 conmode=sclp console=ttyS0" \ -initrd ../Fedora-Cloud-Base-32-1.6.x86_64-initrd.img \ -hda ../Fedora-Cloud-Base-32-1.6.x86_64-initrd.img \ -serial mon:stdio KASLR disabled: CPU has no PRNG [ 0.408769] Linux version 5.11.0-rc2-next-20210108-dirty (dhildenb@desktop) (clang version 11.0.0 (Fedora 11.0.0-2.fc33), GNU ld version 2.35.1-1.fc33) #1 SMP Fri Jan 8 10:23:01 CET 2021 [ 0.410266] setup: Linux is running under KVM in 64-bit mode [ 0.415840] setup: The maximum memory size is 512MB [ 0.417278] cpu: 1 configured CPUs, 0 standby CPUs ... Fedora 32 (Cloud Edition) Kernel 5.11.0-rc2-next-20210108-dirty on an s390x (ttysclp0) atomic-00 login: > It looks like with your patch applied we get further into the boot! > I'm not seeing any output with: > $ /android0/qemu/build/qemu-system-s390x -cpu qemu -append > 'conmode=sclp console=ttyS0' -display none -initrd > /<path/to>/boot-utils/images/s390/rootfs.cpio -kernel > arch/s390/boot/bzImage -m 512m -nodefaults -serial mon:stdio > > (Based on a quick skim through > https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.ludd/ludd_r_lmtkernelparameter.html). > Do I have all of those right? > > If I attach GDB to QEMU running that kernel image, I was able to view > the print banner once via `lx-dmesg` gdb macro in the kernel, but it > seems on subsequent runs control flow gets diverted unexpected post > entry to start_kernel() always to `s390_base_pgm_handler` ...errr..at > least when I try to single step in GDB. Tried with linux-5.10.y, > mainline, and linux-next. > > qemu: 470dd6bd360782f5137f7e3376af6a44658eb1d3 + your patch > llvm: 106e66f3f555c8f887e82c5f04c3e77bdaf345e8 > linux-5.10.y: d1988041d19dc8b532579bdbb7c4a978391c0011 > linux: 71c061d2443814de15e177489d5cc00a4a253ef3 > linux-next: f87684f6470f5f02bd47d4afb900366e5d2f31b6 > > > (gdb) hbreak setup_arch > Hardware assisted breakpoint 1 at 0x142229e: file > arch/s390/kernel/setup.c, line 1091. > (gdb) c > Continuing. > > Program received signal SIGTRAP, Trace/breakpoint trap. > 0x00000000014222a0 in setup_arch (cmdline_p=0x11d7ed8) at > arch/s390/kernel/setup.c:1091 > 1091 if (MACHINE_IS_VM) > (gdb) lx-dmesg > [ 0.376351] Linux version 5.11.0-rc2-00157-ga2885c701c30 > (ndesaulni...@ndesaulniers1.mtv.corp.google.com) (Nick Desaulniers > clang version 12.0.0 (g...@github.com:llvm/llvm-project.git > e75fec2b238f0e26cfb7645f2208baebe3440d41), GNU ld (GNU Binutils for > Debian) 2.35.1) #81 SMP Thu Jan 7 17:57:34 PST 2021 So you're using llvm 12. Maybe that makes a difference. Or we have an issue with our arm64 backend. Or using ld.lld and friends make a difference. Guess I'd have to custom-compile llvm12 (gah) ... maybe I can find some rpms somewhere. -- Thanks, David / dhildenb