On Fri 09/06/2017 12:07, Martin Pieuchot wrote:
> On 08/06/17(Thu) 20:38, Björn Ketelaars wrote:
> > On Thu 08/06/2017 16:55, Martin Pieuchot wrote:
> > > On 07/06/17(Wed) 09:43, Björn Ketelaars wrote:
> > > > On Sat 03/06/2017 08:44, Björn Ketelaars wrote:
> > > > > 
> > > > > Reverting back to the previous kernel fixed the issue above. 
> > > > > Question: can
> > > > > someone give a hint on how to track this issue?
> > > > 
> > > > After a bit of experimenting I'm able to reproduce the problem. Summary 
> > > > is
> > > > that queueing in pf and use of a current (after May 30), multi processor
> > > > kernel (bsd.mp from snapshots) causes these specific watchdog timeouts
> > > > followed by a system freeze.
> > > > 
> > > > Issue is 'gone' when:
> > > > 1.) using an older kernel (before May 30);
> > > > 2.) removal of queueing statements from pf.conf. Included below the 
> > > > specific
> > > >     snippet;
> > > > 3.) switch from MP kernel to SP kernel.
> > > > 
> > > > New observation is that while queueing, using a MP kernel, the download
> > > > bandwidth is only a fraction of what is expected. Exchanging the MP 
> > > > kernel
> > > > with a SP kernel restores the download bandwidth to expected level.
> > > > 
> > > > I'm guessing that this issue is related to recent work on PF?
> > > 
> > > It's certainly a problem in, or exposed by, re(4) with the recent MP work
> > > in the network stack.
> > > 
> > > It would help if you could build a kernel with MP_LOCKDEBUG defined and
> > > see if the resulting kernel enters ddb(4) instead of freezing.
> > > 
> > > Thanks,
> > > Martin
> > 
> > Thanks for the hint! It helped in entering ddb. I collected a bit of output,
> > which you can find below. If I read the trace correctly the crash is related
> > to line 1750 of sys/dev/ic/re.c:
> > 
> >     d->rl_cmdstat |= htole32(RL_TDESC_CMD_EOF);
> 
> Could you test the diff below, always with a MP_LOCKDEBUG kernel and
> tell us if you can reproduce the freeze or if the kernel enters ddb(4)?
> 
> Another question, how often do you see "watchdog timeout" messages?
> 
> Index: re.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/ic/re.c,v
> retrieving revision 1.201
> diff -u -p -r1.201 re.c
> --- re.c      24 Jan 2017 03:57:34 -0000      1.201
> +++ re.c      9 Jun 2017 10:04:43 -0000
> @@ -2074,9 +2074,6 @@ re_watchdog(struct ifnet *ifp)
>       s = splnet();
>       printf("%s: watchdog timeout\n", sc->sc_dev.dv_xname);
>  
> -     re_txeof(sc);
> -     re_rxeof(sc);
> -
>       re_init(ifp);
>  
>       splx(s);

The diff (with a MP_LOCKDEBUG kernel) resulted in similar traces as before.
ddb Output is included below.

With your diff the number of timeout messages decreased from 9 to 2 before
entering ddb.


ddb{1}> show panic
the kernel did not panic

ddb{1}> machine ddbcpu 0
Stopped at      db_enter+0x9:   leave

ddb{0}> trace
db_enter(ffffffff8196b640,200,ffff8000004d1600,10,ffff8000220e9938,282) at db_e
nter+0x9
x86_ipi_handler(ffff800000074010,ffffffff819e7160,102860,c,4,1813a9522) at x86_
ipi_handler+0x85
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
___mp_lock(ffffffff819e7160,ffffffff8187acf0,1,1,ffff8000220a38c0,ffff8000220a7
ab0) at ___mp_lock+0x4a
___mp_acquire_count(ffffffff819e7160,1,ffff8000220a38c0,ffff8000220a7ab0,ffffff
011da1e1c8,ffffffff81019942) at ___mp_acquire_count+0x33
mi_switch(ffffff011b4a93c0,ffffffff818bf7e7,9cd,ffff8000220e9bb0,ffff8000220a7a
b0,ffff8000220e9ba0) at mi_switch+0x22b
sleep_finish(ffff8000220e9bb0,1,118,ffff8000220e9bb0,ffffff011b4a93c0,ffffffff8
18bf7e7) at sleep_finish+0xc2
tsleep(ffffff011b4a93c0,118,ffffffff818bf7e7,9cd,0,40) at tsleep+0x164
kqueue_scan(ffffff011b4a93c0,40,143d48714800,ffff8000220e9dd0,ffff8000220a7ab0,
ffff8000220e9de4) at kqueue_scan+0x15b
sys_kevent(ffff8000220a7ab0,ffff8000220e9e60,ffff8000220e9eb0,ffff8000220a7ab0,
48,1) at sys_kevent+0x2f6
syscall() at syscall+0x29f
--- syscall (number 72) ---
end of kernel
end trace frame: 0x143d48714800, count: -11
0x143c524f280a:

ddb{0}> machine ddbcpu 1
Stopped at      re_encap+0x24d: movl    0(%r15),%eax

ddb{1}> trace
re_encap(ffff800000084000,dd,ffffff00d6e03f00,ffff8000000842c0,400,ffff80000008
4000) at re_encap+0x24d
re_start(ffff8000000842c0,7,ffffff00d6dd7200,ffff800000084338,ffff8000000842c0,
ffffffff81091d70) at re_start+0x8c
ifq_serialize(ffff8000000842c0,ffff800000084360,ffff800000084090,ffffff00d6a515
00,ffff8000000842c0,0) at ifq_serialize+0xf8
if_enqueue(ffff800000084090,ffffff00d6a51500,ffff80000001dc80,ffffff00d6a51500,
37,2) at if_enqueue+0x90
ether_output(ffff800000084090,ffffff00d6dd7200,ffff80000001dc80,ffffff011b3f49a
0,ffff800000933400,ffff80000001dc80) at ether_output+0x1d6
ip_output(ffffff00d6dd7200,0,ffff800022032cc0,1,0,0) at ip_output+0x8a1
ip_forward(ffffff00d6dd7200,ffff8000002a4090,0,0,8244c78a,8244c78a) at ip_forwa
rd+0x1e7
ipv4_input(ffff8000002a4090,ffffff00d6dd7200,800,800,ffffff0009d00200,ffff80000
02a4090) at ipv4_input+0x4f7
ether_input(ffff8000002a4090,ffffff00d6dd7200,0,ffffffff81221640,ffff8000002a42
88,ffff8000002a4090) at ether_input+0xbd
if_input_process(2,ffff800022032eb0,0,0,ffff800022032eb0,ffff800000019080) at i
f_input_process+0xfa
taskq_thread(ffff800000019080,2a2,ffff800000019080,ffffffff81524ec0,0,ffff80002
2032f10) at taskq_thread+0x79
end trace frame: 0x0, count: -11

ddb{1}> boot reboot
panic: rw_enter: netlock locking against myself
Stopped at      db_enter+0x9:   leave
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
  18657  98624     73    0x100010       0x80    0  syslogd
*294902  94762      0     0x14000      0x200    1  softnet
db_enter(ffffffff8173ec8e,ffff800022032130,10,ffff800022032110,282,8) at db_ent
er+0x9
panic(ffffffff8172c108,14,ffff8000ffffed50,ffffffff8197d750,ffff8000ffffed54,ff
ff800022032240) at panic+0x102
_rw_enter(ffffffff8197d750,1,ffff80002202e000,0,ffffffff819685a0,ffff8000220324
18) at _rw_enter+0x20f
if_downall(ffffffff819685a0,ffff800022032418,4804,ffff8000220323b8,82fd78f8e65d
236c,0) at if_downall+0x2d
boot(4804,ffffffff81945ea0,4804,ffffffff819685a0,ffff800022032418,4804) at boot
+0xeb
reboot(4804,5,0,ffff8000220323f0,ffff8000220323b8,d) at reboot+0x37
db_boot_reboot_cmd(ffffffff811c1c2d,0,ffffffffffffffff,ffff800022032420,0,82fd7
8f8e65d236c) at db_boot_reboot_cmd+0x22
db_command(ffffffff819685a0,ffffffff819681e0,ffff800022032690,6,0,78) at db_com
mand+0x13a
db_command_loop(82fd78f8e65d236c,0,ffffffff8143247f,ffff80002202e000,0,ffff8000
22032690) at db_command_loop+0xb0
db_trap(6,0,7,82fd78f8e65d236c,81186c47,ffffffff8166e3a4) at db_trap+0xf4
db_ktrap(6,0,ffff800022032690,1,ffff800022032690,ffff8000ffffed50) at db_ktrap+
0x10f
trap() at trap+0x16d
--- trap (number 6) ---
re_encap(ffff800000084000,dd,ffffff00d6e03f00,ffff8000000842c0,400,ffff80000008
4000) at re_encap+0x24d
re_start(ffff8000000842c0,7,ffffff00d6dd7200,ffff800000084338,ffff8000000842c0,
ffffffff81091d70) at re_start+0x8c
end trace frame: 0xffff8000220329c0, count: 0
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{1}> dmesg
OpenBSD 6.1-current (GENERIC.MP) #9: Fri Jun  9 13:42:45 CEST 2017
    b...@gateway.lan:/home/code/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4245995520 (4049MB)
avail mem = 4111519744 (3921MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdf16d820 (6 entries)
bios0: vendor coreboot version "SageBios_PCEngines_APU-45" date 04/05/2014
bios0: PC Engines APU
acpi0 at bios0: rev 0
acpi0: sleep states S0 S1 S3 S4 S5
acpi0: tables DSDT FACP SPCR HPET APIC HEST SSDT SSDT SSDT
acpi0: wakeup devices AGPB(S4) HDMI(S4) PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PE2
0(S4) PE21(S4) PE22(S4) PE23(S4) PIBR(S4) UOH1(S3) UOH2(S3) UOH3(S3) UOH4(S3) U
OH5(S3) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD G-T40E Processor, 1000.16 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C
FLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB
,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,IT
SC
cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 1
6-way L2 cache
cpu0: 8 4MB entries fully associative
cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu0: TSC frequency 1000155810 Hz
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 199MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD G-T40E Processor, 1000.00 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C
FLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB
,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,IT
SC
cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 1
6-way L2 cache
cpu1: 8 4MB entries fully associative
cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 21, 24 pins
acpiprt0 at acpi0: bus -1 (AGPB)
acpiprt1 at acpi0: bus -1 (HDMI)
acpiprt2 at acpi0: bus 1 (PBR4)
acpiprt3 at acpi0: bus 2 (PBR5)
acpiprt4 at acpi0: bus 3 (PBR6)
acpiprt5 at acpi0: bus -1 (PBR7)
acpiprt6 at acpi0: bus 5 (PE20)
acpiprt7 at acpi0: bus -1 (PE21)
acpiprt8 at acpi0: bus -1 (PE22)
acpiprt9 at acpi0: bus -1 (PE23)
acpiprt10 at acpi0: bus 0 (PCI0)
acpiprt11 at acpi0: bus 4 (PIBR)
acpicpu0 at acpi0: C2(0@100 io@0x841), C1(@1 halt!), PSS
acpicpu1 at acpi0: C2(0@100 io@0x841), C1(@1 halt!), PSS
acpibtn0 at acpi0: PWRB
cpu0: 1000 MHz: speeds: 1000 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "AMD AMD64 14h Host" rev 0x00
ppb0 at pci0 dev 4 function 0 "AMD AMD64 14h PCIE" rev 0x00: msi
pci1 at ppb0 bus 1
re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x06: RTL8168E/8111E (0x2c00), m
si, address 00:0d:b9:33:8e:a4
rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 4
ppb1 at pci0 dev 5 function 0 "AMD AMD64 14h PCIE" rev 0x00: msi
pci2 at ppb1 bus 2
re1 at pci2 dev 0 function 0 "Realtek 8168" rev 0x06: RTL8168E/8111E (0x2c00), m
si, address 00:0d:b9:33:8e:a5
rgephy1 at re1 phy 7: RTL8169S/8110S/8211 PHY, rev. 4
ppb2 at pci0 dev 6 function 0 "AMD AMD64 14h PCIE" rev 0x00: msi
pci3 at ppb2 bus 3
re2 at pci3 dev 0 function 0 "Realtek 8168" rev 0x06: RTL8168E/8111E (0x2c00), m
si, address 00:0d:b9:33:8e:a6
rgephy2 at re2 phy 7: RTL8169S/8110S/8211 PHY, rev. 4
ahci0 at pci0 dev 17 function 0 "ATI SBx00 SATA" rev 0x40: apic 2 int 19, AHCI 1
.2
ahci0: port 0: 6.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, SAMSUNG SSD PM83, CXM1> SCSI3 0/direct fixe
d naa.5002538043584d30
sd0: 30533MB, 512 bytes/sector, 62533296 sectors, thin
ohci0 at pci0 dev 18 function 0 "ATI SB700 USB" rev 0x00: apic 2 int 18, versio
n 1.0, legacy support
ehci0 at pci0 dev 18 function 2 "ATI SB700 USB2" rev 0x00: apic 2 int 17
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "ATI EHCI root hub" rev 2.00/1.00 add
r 1
ohci1 at pci0 dev 19 function 0 "ATI SB700 USB" rev 0x00: apic 2 int 18, versio
n 1.0, legacy support
ehci1 at pci0 dev 19 function 2 "ATI SB700 USB2" rev 0x00: apic 2 int 17
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "ATI EHCI root hub" rev 2.00/1.00 add
r 1
piixpm0 at pci0 dev 20 function 0 "ATI SBx00 SMBus" rev 0x42: polling
iic0 at piixpm0
pcib0 at pci0 dev 20 function 3 "ATI SB700 ISA" rev 0x40
ppb3 at pci0 dev 20 function 4 "ATI SB600 PCI" rev 0x40
pci4 at ppb3 bus 4
ohci2 at pci0 dev 20 function 5 "ATI SB700 USB" rev 0x00: apic 2 int 18, versio
n 1.0, legacy support
ppb4 at pci0 dev 21 function 0 "ATI SB800 PCIE" rev 0x00
pci5 at ppb4 bus 5
ohci3 at pci0 dev 22 function 0 "ATI SB700 USB" rev 0x00: apic 2 int 18, versio
n 1.0, legacy support
ehci2 at pci0 dev 22 function 2 "ATI SB700 USB2" rev 0x00: apic 2 int 17
usb2 at ehci2: USB revision 2.0
uhub2 at usb2 configuration 1 interface 0 "ATI EHCI root hub" rev 2.00/1.00 add
r 1
pchb1 at pci0 dev 24 function 0 "AMD AMD64 14h Link Cfg" rev 0x43
pchb2 at pci0 dev 24 function 1 "AMD AMD64 14h Address Map" rev 0x00
pchb3 at pci0 dev 24 function 2 "AMD AMD64 14h DRAM Cfg" rev 0x00
km0 at pci0 dev 24 function 3 "AMD AMD64 14h Misc Cfg" rev 0x00
pchb4 at pci0 dev 24 function 4 "AMD AMD64 14h CPU Power" rev 0x00
pchb5 at pci0 dev 24 function 5 "AMD AMD64 14h Reserved" rev 0x00
pchb6 at pci0 dev 24 function 6 "AMD AMD64 14h NB Power" rev 0x00
pchb7 at pci0 dev 24 function 7 "AMD AMD64 14h Reserved" rev 0x00
usb3 at ohci0: USB revision 1.0
uhub3 at usb3 configuration 1 interface 0 "ATI OHCI root hub" rev 1.00/1.00 add
r 1
usb4 at ohci1: USB revision 1.0
uhub4 at usb4 configuration 1 interface 0 "ATI OHCI root hub" rev 1.00/1.00 add
r 1
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
wbsio0 at isa0 port 0x2e/2: NCT5104D rev 0x52
usb5 at ohci2: USB revision 1.0
uhub5 at usb5 configuration 1 interface 0 "ATI OHCI root hub" rev 1.00/1.00 add
r 1
usb6 at ohci3: USB revision 1.0
uhub6 at usb6 configuration 1 interface 0 "ATI OHCI root hub" rev 1.00/1.00 add
r 1
vmm0 at mainbus0: SVM/RVI
umass0 at uhub2 port 1 configuration 1 interface 0 "Generic Flash Card Reader/W
riter" rev 2.01/1.00 addr 2
umass0: using SCSI over Bulk-Only
scsibus2 at umass0: 2 targets, initiator 0
sd1 at scsibus2 targ 1 lun 0: <Multiple, Card Reader, 1.00> SCSI2 0/direct remo
vable serial.058f6366058F63666485
sd1: 7600MB, 512 bytes/sector, 15564800 sectors
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on sd0a (28fcdc10008babff.a) swap on sd0b dump on sd0b
re1: watchdog timeout
re1: watchdog timeout
uvm_fault(0xffffffff819f14a0, 0x400, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at      re_encap+0x24d: movl    0(%r15),%eax

Reply via email to