Kristof, With commit 367705+367706 and if_bridge statically linked. It crashes while adding an epair of a jail.
With commit 367705+367706 and if_bridge dynamically loaded there is a crash at reboot #0 0xffffffff8069ddc5 at kdb_backtrace+0x65 #1 0xffffffff80652c8b at vpanic+0x17b #2 0xffffffff80652b03 at panic+0x43 #3 0xffffffff809c8951 at trap_fatal+0x391 #4 0xffffffff809c89af at trap_pfault+0x4f #5 0xffffffff809c7ff6 at trap+0x286 #6 0xffffffff809a1ec8 at calltrap+0x8 #7 0xffffffff8079f7ed at ip_input+0x63d #8 0xffffffff8077a07a at netisr_dispatch_src+0xca #9 0xffffffff8075a6f8 at ether_demux+0x138 #10 0xffffffff8075b9bb at ether_nh_input+0x33b #11 0xffffffff8077a07a at netisr_dispatch_src+0xca #12 0xffffffff8075ab1b at ether_input+0x4b #13 0xffffffff8077a80b at swi_net+0x12b #14 0xffffffff8061e10c at ithread_loop+0x23c #15 0xffffffff8061afbe at fork_exit+0x7e #16 0xffffffff809a2efe at fork_trampoline+0xe Peter > On 21 Nov 2020, at 17:22, Peter Blok <pb...@bsd4all.org> wrote: > > Kristof, > > With a GENERIC kernel it does NOT happen. I do have a different iflib related > panic at reboot, but I’ll report that separately. > > I brought the two config files closer together and found out that if I remove > if_bridge from the config file and have it loaded dynamically when the bridge > is created, the problem no longer happens and everything works ok. > > Peter > >> On 20 Nov 2020, at 15:53, Kristof Provost <k...@freebsd.org> wrote: >> >> I still can’t reproduce that panic. >> >> Does it happen immediately after you start a vnet jail? >> >> Does it also happen with a GENERIC kernel? >> >> Regards, >> Kristof >> >> On 20 Nov 2020, at 14:53, Peter Blok wrote: >> >>> The panic with ipsec code in the backtrace was already very strange. I was >>> using IPsec, but only on one interface totally separate from the members of >>> the bridge as well as the bridge itself. The jails were not doing any ipsec >>> as well. Note that panic was a while ago and it was after the 1st bridge >>> epochification was done on stable-12 which was later backed out >>> >>> Today the system is no longer using ipsec, but it is still compiled in. I >>> can remove it if need be for a test >>> >>> >>> src.conf >>> WITHOUT_KERBEROS=yes >>> WITHOUT_GSSAPI=yes >>> WITHOUT_SENDMAIL=true >>> WITHOUT_MAILWRAPPER=true >>> WITHOUT_DMAGENT=true >>> WITHOUT_GAMES=true >>> WITHOUT_IPFILTER=true >>> WITHOUT_UNBOUND=true >>> WITHOUT_PROFILE=true >>> WITHOUT_ATM=true >>> WITHOUT_BSNMP=true >>> #WITHOUT_CROSS_COMPILER=true >>> WITHOUT_DEBUG_FILES=true >>> WITHOUT_DICT=true >>> WITHOUT_FLOPPY=true >>> WITHOUT_HTML=true >>> WITHOUT_HYPERV=true >>> WITHOUT_NDIS=true >>> WITHOUT_NIS=true >>> WITHOUT_PPP=true >>> WITHOUT_TALK=true >>> WITHOUT_TESTS=true >>> WITHOUT_WIRELESS=true >>> #WITHOUT_LIB32=true >>> WITHOUT_LPR=true >>> >>> make.conf >>> KERNCONF=BHYVE >>> MODULES_OVERRIDE=opensolaris dtrace zfs vmm nmdm if_bridge bridgestp >>> if_vxlan pflog libmchain libiconv smbfs linux linux64 linux_common linuxkpi >>> linprocfs linsysfs ext2fs >>> DEFAULT_VERSIONS+=perl5=5.30 mysql=5.7 python=3.8 python3=3.8 >>> OPTIONS_UNSET=DOCS NLS MANPAGES >>> >>> BHYVE >>> cpu HAMMER >>> ident BHYVE >>> >>> makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols >>> makeoptions WITH_CTF=1 # Run ctfconvert(1) for DTrace support >>> >>> options CAMDEBUG >>> >>> options SCHED_ULE # ULE scheduler >>> options PREEMPTION # Enable kernel thread preemption >>> options INET # InterNETworking >>> options INET6 # IPv6 communications protocols >>> options IPSEC >>> options TCP_OFFLOAD # TCP offload >>> options TCP_RFC7413 # TCP FASTOPEN >>> options SCTP # Stream Control Transmission Protocol >>> options FFS # Berkeley Fast Filesystem >>> options SOFTUPDATES # Enable FFS soft updates support >>> options UFS_ACL # Support for access control lists >>> options UFS_DIRHASH # Improve performance on big directories >>> options UFS_GJOURNAL # Enable gjournal-based UFS journaling >>> options QUOTA # Enable disk quotas for UFS >>> options SUIDDIR >>> options NFSCL # Network Filesystem Client >>> options NFSD # Network Filesystem Server >>> options NFSLOCKD # Network Lock Manager >>> options MSDOSFS # MSDOS Filesystem >>> options CD9660 # ISO 9660 Filesystem >>> options FUSEFS >>> options NULLFS # NULL filesystem >>> options UNIONFS >>> options FDESCFS # File descriptor filesystem >>> options PROCFS # Process filesystem (requires PSEUDOFS) >>> options PSEUDOFS # Pseudo-filesystem framework >>> options GEOM_PART_GPT # GUID Partition Tables. >>> options GEOM_RAID # Soft RAID functionality. >>> options GEOM_LABEL # Provides labelization >>> options GEOM_ELI # Disk encryption. >>> options COMPAT_FREEBSD32 # Compatible with i386 binaries >>> options COMPAT_FREEBSD4 # Compatible with FreeBSD4 >>> options COMPAT_FREEBSD5 # Compatible with FreeBSD5 >>> options COMPAT_FREEBSD6 # Compatible with FreeBSD6 >>> options COMPAT_FREEBSD7 # Compatible with FreeBSD7 >>> options COMPAT_FREEBSD9 # Compatible with FreeBSD9 >>> options COMPAT_FREEBSD10 # Compatible with FreeBSD10 >>> options COMPAT_FREEBSD11 # Compatible with FreeBSD11 >>> options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI >>> options KTRACE # ktrace(1) support >>> options STACK # stack(9) support >>> options SYSVSHM # SYSV-style shared memory >>> options SYSVMSG # SYSV-style message queues >>> options SYSVSEM # SYSV-style semaphores >>> options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time >>> extensions >>> options PRINTF_BUFR_SIZE=128 # Prevent printf output being >>> interspersed. >>> options KBD_INSTALL_CDEV # install a CDEV entry in /dev >>> options HWPMC_HOOKS # Necessary kernel hooks for hwpmc(4) >>> options AUDIT # Security event auditing >>> options CAPABILITY_MODE # Capsicum capability mode >>> options CAPABILITIES # Capsicum capabilities >>> options MAC # TrustedBSD MAC Framework >>> options MAC_PORTACL >>> options MAC_NTPD >>> options KDTRACE_FRAME # Ensure frames are compiled in >>> options KDTRACE_HOOKS # Kernel DTrace hooks >>> options DDB_CTF # Kernel ELF linker loads CTF data >>> options INCLUDE_CONFIG_FILE # Include this file in kernel >>> >>> # Debugging support. Always need this: >>> options KDB # Enable kernel debugger support. >>> options KDB_TRACE # Print a stack trace for a panic. >>> options KDB_UNATTENDED >>> >>> # Make an SMP-capable kernel by default >>> options SMP # Symmetric MultiProcessor Kernel >>> options EARLY_AP_STARTUP >>> >>> # CPU frequency control >>> device cpufreq >>> device cpuctl >>> device coretemp >>> >>> # Bus support. >>> device acpi >>> options ACPI_DMAR >>> device pci >>> options PCI_IOV # PCI SR-IOV support >>> >>> device iicbus >>> device iicbb >>> >>> device iic >>> device ic >>> device iicsmb >>> >>> device ichsmb >>> device smbus >>> device smb >>> >>> #device jedec_dimm >>> >>> # ATA controllers >>> device ahci # AHCI-compatible SATA >>> controllers >>> device mvs # Marvell >>> 88SX50XX/88SX60XX/88SX70XX/SoC SATA >>> >>> # SCSI Controllers >>> device mps # LSI-Logic MPT-Fusion 2 >>> >>> # ATA/SCSI peripherals >>> device scbus # SCSI bus (required for >>> ATA/SCSI) >>> device da # Direct Access (disks) >>> device cd # CD >>> device pass # Passthrough device (direct >>> ATA/SCSI access) >>> device ses # Enclosure Services (SES and >>> SAF-TE) >>> device sg >>> >>> device cfiscsi >>> device ctl # CAM Target Layer >>> device iscsi >>> >>> # atkbdc0 controls both the keyboard and the PS/2 mouse >>> device atkbdc # AT keyboard controller >>> device atkbd # AT keyboard >>> device psm # PS/2 mouse >>> >>> device kbdmux # keyboard multiplexer >>> >>> # vt is the new video console driver >>> device vt >>> device vt_vga >>> device vt_efifb >>> >>> # Serial (COM) ports >>> device uart # Generic UART driver >>> >>> # PCI/PCI-X/PCIe Ethernet NICs that use iflib infrastructure >>> device iflib >>> device em # Intel PRO/1000 Gigabit >>> Ethernet Family >>> device ix # Intel PRO/10GbE PCIE PF >>> Ethernet >>> >>> # Network stack virtualization. >>> options VIMAGE >>> >>> # Pseudo devices. >>> device crypto >>> device cryptodev >>> device loop # Network loopback >>> device random # Entropy device >>> device padlock_rng # VIA Padlock RNG >>> device rdrand_rng # Intel Bull Mountain RNG >>> device ipmi >>> device smbios >>> device vpd >>> device aesni # AES-NI OpenCrypto module >>> device ether # Ethernet support >>> device lagg >>> device vlan # 802.1Q VLAN support >>> device tuntap # Packet tunnel. >>> device md # Memory "disks" >>> device gif # IPv6 and IPv4 tunneling >>> device firmware # firmware assist module >>> >>> device pf >>> #device pflog >>> #device pfsync >>> >>> # The `bpf' device enables the Berkeley Packet Filter. >>> # Be aware of the administrative consequences of enabling this! >>> # Note that 'bpf' is required for DHCP. >>> device bpf # Berkeley packet filter >>> >>> # The `epair' device implements a virtual back-to-back connected Ethernet >>> # like interface pair. >>> device epair >>> >>> # USB support >>> options USB_DEBUG # enable debug msgs >>> device uhci # UHCI PCI->USB interface >>> device ohci # OHCI PCI->USB interface >>> device ehci # EHCI PCI->USB interface (USB >>> 2.0) >>> device xhci # XHCI PCI->USB interface (USB >>> 3.0) >>> device usb # USB Bus (required) >>> device uhid >>> device ukbd # Keyboard >>> device umass # Disks/Mass storage - Requires >>> scbus and da >>> device ums >>> >>> device filemon >>> >>> device if_bridge >>> >>>> On 20 Nov 2020, at 12:53, Kristof Provost <k...@freebsd.org> wrote: >>>> >>>> Can you share your kernel config file (and src.conf / make.conf if they >>>> exist)? >>>> >>>> This second panic is in the IPSec code. My current thinking is that your >>>> kernel config is triggering a bug that’s manifesting in multiple places, >>>> but not actually caused by those places. >>>> >>>> I’d like to be able to reproduce it so we can debug it. >>>> >>>> Best regards, >>>> Kristof >>>> >>>> On 20 Nov 2020, at 12:02, Peter Blok wrote: >>>>> Hi Kristof, >>>>> >>>>> This is 12-stable. With the previous bridge epochification that was >>>>> backed out my config had a panic too. >>>>> >>>>> I don’t have any local modifications. I did a clean rebuild after >>>>> removing /usr/obj/usr >>>>> >>>>> My kernel is custom - I only have zfs.ko, opensolaris.ko, vmm.ko and >>>>> nmdm.ko as modules. Everything else is statically linked. I have removed >>>>> all drivers not needed for the hardware at hand. >>>>> >>>>> My bridge is between two vlans from the same trunk and the jail epair >>>>> devices as well as the bhyve tap devices. >>>>> >>>>> The panic happens when the jails are starting. >>>>> >>>>> I can try to narrow it down over the weekend and make the crash dump >>>>> available for analysis. >>>>> >>>>> Previously I had the following crash with 363492 >>>>> >>>>> kernel trap 12 with interrupts disabled >>>>> >>>>> >>>>> Fatal trap 12: page fault while in kernel mode >>>>> cpuid = 2; apic id = 02 >>>>> fault virtual address = 0xffffffff00000410 >>>>> fault code = supervisor read data, page not present >>>>> instruction pointer = 0x20:0xffffffff80692326 >>>>> stack pointer = 0x28:0xfffffe00c06097b0 >>>>> frame pointer = 0x28:0xfffffe00c06097f0 >>>>> code segment = base 0x0, limit 0xfffff, type 0x1b >>>>> = DPL 0, pres 1, long 1, def32 0, gran 1 >>>>> processor eflags = resume, IOPL = 0 >>>>> current process = 2030 (ifconfig) >>>>> trap number = 12 >>>>> panic: page fault >>>>> cpuid = 2 >>>>> time = 1595683412 >>>>> KDB: stack backtrace: >>>>> #0 0xffffffff80698165 at kdb_backtrace+0x65 >>>>> #1 0xffffffff8064d67b at vpanic+0x17b >>>>> #2 0xffffffff8064d4f3 at panic+0x43 >>>>> #3 0xffffffff809cc311 at trap_fatal+0x391 >>>>> #4 0xffffffff809cc36f at trap_pfault+0x4f >>>>> #5 0xffffffff809cb9b6 at trap+0x286 >>>>> #6 0xffffffff809a5b28 at calltrap+0x8 >>>>> #7 0xffffffff803677fd at ck_epoch_synchronize_wait+0x8d >>>>> #8 0xffffffff8069213a at epoch_wait_preempt+0xaa >>>>> #9 0xffffffff807615b7 at ipsec_ioctl+0x3a7 >>>>> #10 0xffffffff8075274f at ifioctl+0x47f >>>>> #11 0xffffffff806b5ea7 at kern_ioctl+0x2b7 >>>>> #12 0xffffffff806b5b4a at sys_ioctl+0xfa >>>>> #13 0xffffffff809ccec7 at amd64_syscall+0x387 >>>>> #14 0xffffffff809a6450 at fast_syscall_common+0x101 >>>>> >>>>> >>>>> >>>>> >>>>>> On 20 Nov 2020, at 11:30, Kristof Provost <k...@freebsd.org> wrote: >>>>>> >>>>>> On 20 Nov 2020, at 11:18, peter.b...@bsd4all.org >>>>>> <mailto:peter.b...@bsd4all.org> wrote: >>>>>>> I’m afraid the last Epoch fix for bridge is not solving the problem ( >>>>>>> or perhaps creates a new ). >>>>>>> >>>>>> We’re talking about the stable/12 branch, right? >>>>>> >>>>>>> This seems to happen when the jail epair is added to the bridge. >>>>>>> >>>>>> There must be something more to it than that. I’ve run the bridge tests >>>>>> on stable/12 without issue, and this is a problem we didn’t see when the >>>>>> bridge epochification initially went into stable/12. >>>>>> >>>>>> Do you have a custom kernel config? Other patches? What exact commands >>>>>> do you run to trigger the panic? >>>>>> >>>>>>> kernel trap 12 with interrupts disabled >>>>>>> >>>>>>> >>>>>>> Fatal trap 12: page fault while in kernel mode >>>>>>> cpuid = 6; apic id = 06 >>>>>>> fault virtual address = 0xc10 >>>>>>> fault code = supervisor read data, page not present >>>>>>> instruction pointer = 0x20:0xffffffff80695e76 >>>>>>> stack pointer = 0x28:0xfffffe00bf14e6e0 >>>>>>> frame pointer = 0x28:0xfffffe00bf14e720 >>>>>>> code segment = base 0x0, limit 0xfffff, type 0x1b >>>>>>> = DPL 0, pres 1, long 1, def32 0, gran 1 >>>>>>> processor eflags = resume, IOPL = 0 >>>>>>> current process = 1686 (jail) >>>>>>> trap number = 12 >>>>>>> panic: page fault >>>>>>> cpuid = 6 >>>>>>> time = 1605811310 >>>>>>> KDB: stack backtrace: >>>>>>> #0 0xffffffff8069bb85 at kdb_backtrace+0x65 >>>>>>> #1 0xffffffff80650a4b at vpanic+0x17b >>>>>>> #2 0xffffffff806508c3 at panic+0x43 >>>>>>> #3 0xffffffff809d0351 at trap_fatal+0x391 >>>>>>> #4 0xffffffff809d03af at trap_pfault+0x4f >>>>>>> #5 0xffffffff809cf9f6 at trap+0x286 >>>>>>> #6 0xffffffff809a98c8 at calltrap+0x8 >>>>>>> #7 0xffffffff80368a8d at ck_epoch_synchronize_wait+0x8d >>>>>>> #8 0xffffffff80695c8a at epoch_wait_preempt+0xaa >>>>>>> #9 0xffffffff80757d40 at vnet_if_init+0x120 >>>>>>> #10 0xffffffff8078c994 at vnet_alloc+0x114 >>>>>>> #11 0xffffffff8061e3f7 at kern_jail_set+0x1bb7 >>>>>>> #12 0xffffffff80620190 at sys_jail_set+0x40 >>>>>>> #13 0xffffffff809d0f07 at amd64_syscall+0x387 >>>>>>> #14 0xffffffff809aa1ee at fast_syscall_common+0xf8 >>>>>> >>>>>> This panic is rather odd. This isn’t even the bridge code. This is >>>>>> during initial creation of the vnet. I don’t really see how this could >>>>>> even trigger panics. >>>>>> That panic looks as if something corrupted the net_epoch_preempt, by >>>>>> overwriting the epoch->e_epoch. The bridge patches only access this >>>>>> variable through the well-established functions and macros. I see no >>>>>> obvious way that they could corrupt it. >>>>>> >>>>>> Best regards, >>>>>> Kristof >>>> >>>> >>>> _______________________________________________ >>>> freebsd-stable@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >>>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" >
smime.p7s
Description: S/MIME cryptographic signature