Hi folks, We have been trying to track down a bad mbuf management for about two weeks on a customized 7.1 base. I have finally been able to reproduce it with a stock FreeBSD 7-STABLE (kernel from r225276, userland from 7.4).
With the help of the attached patches, I have just been able to trigger the following panic: panic: Corrupted unused flags, expected 0xffffffff00000000, got 0x0, flags 0x3 cpuid = 1 Uptime: 3d10h5m3s Cannot dump. No dump device defined The patches taints high order 32bits of m_flags (extended to 64bits; those are thus unused) right before the mbuf is referenced in the TX ring of the igb(4) driver, I expect this value to never change until right before the mbuf is freed igb_txeof(). [About this, Jack, am I correct expecting the mbuf's flags not to be touched between igb_xmit() and igb_txeof() ?] I have strong suspicions that this is the cause of PR/155597, eventually a few others PR. My current assumption about the root cause of this behavior is that the same mbuf ends up being queued for TX twice. After the first TX, it gets released, eventually reused in socket's buffer, but ends up being freed again after the second TX, screwing the chains at the same time, leading to crashes of the box. On the crashes happens in multiple locations. We have seen crashes (both clean panic() and NULL-pointer dereference) in various places over the last weeks, first an almost daily panic() in sbdrop() or sbsndptr(), and NULL-pointer dereferences in hfsc_dequeue(), m_ext_free(), m_copym, and the list goes on. On the driver p.o.v, crashes happened with igb(4) end of last year. At the time, dropping the number of queue to 1 mitigated the problem... so far. Now, the daily crashes happens with em(4) (single queue by default on 7). We also have records of crashes in sbsndptr() on vr(4)-based devices. Crashes seen on em(4) configuration are almost always preceded by one or many: emX: discard frame w/o packet header which we agree, should not happen. On the traffic p.o.v., crashes happens on a 24h basis with a box handling about 30Mbps over a couple of thousands TCP connections. I have been able to get consistent crashes in about 1h with ALTQ enabled, proxying about 200 TCP connection over 200Mbps of traffic (unidirectional, sub-ms RTT). Crashes becomes a lot faster with ALTQ disabled, down to a reliable crash within 5min. The box is running a few hundreds ipfw rules. Without any ipfw rules loaded, crashes happens within 30min. Now, the FreeBSD 7-STABLE machine has been able to handle about 900Mbps of traffic, over 2*500 TCP connections (500 receiving, 500 sending), over 24h without crashing. It crashed almost instantly when I restarted the test today. The kernel has been built with INVARIANTS and INVARIANT_SUPPORT. Hardware enumerate as follow: CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2493.76-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x1067a Family = 6 Model = 17 Stepping = 10 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x40ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE> AMD Features=0x20100000<NX,LM> AMD Features2=0x1<LAHF> Cores per package: 4 real memory = 3757834240 (3583 MB) avail memory = 3678064640 (3507 MB) ACPI APIC Table: <100509 APIC1714> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu (AP): APIC ID: 4 (disabled) cpu (AP): APIC ID: 5 (disabled) cpu (AP): APIC ID: 6 (disabled) cpu (AP): APIC ID: 7 (disabled) [...] igb0: <Intel(R) PRO/1000 Network Connection version - 2.2.5> port 0xec00-0xec1f mem 0xfdf60000-0xfdf7ffff,0xfdf40000-0xfdf5ffff,0xfdfb8000-0xfdfbbfff irq 16 at device 0.0 on pci7 igb0: Using MSIX interrupts with 5 vectors igb0: [ITHREAD] igb0: [ITHREAD] igb0: [ITHREAD] igb0: [ITHREAD] igb0: [ITHREAD] igb0: Ethernet address: 00:15:b2:xx:xx:xx igb1: <Intel(R) PRO/1000 Network Connection version - 2.2.5> port 0xec80-0xec9f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff,0xfdfbc000-0xfdfbffff irq 17 at device 0.1 on pci7 igb1: Using MSIX interrupts with 5 vectors igb1: [ITHREAD] igb1: [ITHREAD] igb1: [ITHREAD] igb1: [ITHREAD] igb1: [ITHREAD] igb1: Ethernet address: 00:15:b2:xx:xx:xx em(4) and igb(4) are a direct backport from HEAD, plus the build fix I posted a few days ago. Custom mbuf debugging is attached, as well as the config of the kernel. The goal of the changes was first to enforce mbuf trashing, then locate the bogus m_freem()[0], thus modifying the mbuf free path to taint the mbuf not with the static 0xdeadc0de, but with the IP of the call-site. Then add some consistency check. At this point, any help is appreciated! Thanks in advance, - Arnaud [0]: the reason behind that is that I first got tons of crashes related 0xdeadc0de, in particular, an mbuf being tagged M_PROMISC, while the interface was not in promiscuous mode crashing on an unwanted call to m_freem() in ether_demub()
GENERIC
Description: Binary data
From 5221519fc7d4c31b6c1ea2c7581bd998b7505e65 Mon Sep 17 00:00:00 2001 From: Arnaud Lacombe <lacom...@gmail.com> Date: Tue, 30 Aug 2011 21:55:47 -0400 Subject: [PATCH 1/4] inlinize m_freem() --- sys/kern/uipc_mbuf.c | 12 ------------ sys/sys/mbuf.h | 14 +++++++++++++- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c index f823e26..9f9288f 100644 --- a/sys/kern/uipc_mbuf.c +++ b/sys/kern/uipc_mbuf.c @@ -151,18 +151,6 @@ m_getm2(struct mbuf *m, int len, int how, short type, int flags) return (m); } -/* - * Free an entire chain of mbufs and associated external buffers, if - * applicable. - */ -void -m_freem(struct mbuf *mb) -{ - - while (mb != NULL) - mb = m_free(mb); -} - /*- * Configure a provided mbuf to refer to the provided external storage * buffer and setup a reference count for said buffer. If the setting diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h index 8529cca..f3f98b0 100644 --- a/sys/sys/mbuf.h +++ b/sys/sys/mbuf.h @@ -358,6 +358,7 @@ static __inline struct mbuf *m_getjcl(int how, short type, int flags, int size); static __inline struct mbuf *m_getclr(int how, short type); /* XXX */ static __inline struct mbuf *m_free(struct mbuf *m); +static __inline void m_freem(struct mbuf *m); static __inline void m_clget(struct mbuf *m, int how); static __inline void *m_cljget(struct mbuf *m, int how, int size); static __inline void m_chtype(struct mbuf *m, short new_type); @@ -520,6 +521,18 @@ m_free(struct mbuf *m) return (n); } +/* + * Free an entire chain of mbufs and associated external buffers, if + * applicable. + */ +static __inline void +m_freem(struct mbuf *m) +{ + + while (m != NULL) + m = m_free(m); +} + static __inline void m_clget(struct mbuf *m, int how) { @@ -767,7 +780,6 @@ struct mbuf *m_dup(struct mbuf *, int); int m_dup_pkthdr(struct mbuf *, struct mbuf *, int); u_int m_fixhdr(struct mbuf *); struct mbuf *m_fragment(struct mbuf *, int, int); -void m_freem(struct mbuf *); struct mbuf *m_getm2(struct mbuf *, int, int, short, int); struct mbuf *m_getptr(struct mbuf *, int, int *); u_int m_length(struct mbuf *, struct mbuf **); -- 1.7.6.153.g78432
From 0f2ada7a85c9e665be23f4801e1722e776492fdc Mon Sep 17 00:00:00 2001 From: Arnaud Lacombe <lacom...@gmail.com> Date: Wed, 31 Aug 2011 12:16:58 -0400 Subject: [PATCH 2/4] mbuf use-after-free marking --- sys/kern/uipc_mbuf.c | 4 ++-- sys/sys/mbuf.h | 37 ++++++++++++++++++++++++++++++++----- sys/vm/uma_dbg.c | 6 ++++-- 3 files changed, 38 insertions(+), 9 deletions(-) diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c index 9f9288f..9b92fb0 100644 --- a/sys/kern/uipc_mbuf.c +++ b/sys/kern/uipc_mbuf.c @@ -197,7 +197,7 @@ m_extadd(struct mbuf *mb, caddr_t buf, u_int size, * storage attached to them if the reference count hits 1. */ void -mb_free_ext(struct mbuf *m) +mb_free_ext(struct mbuf *m, void *arg) { int skipmbuf; @@ -264,7 +264,7 @@ mb_free_ext(struct mbuf *m) m->m_ext.ext_size = 0; m->m_ext.ext_type = 0; m->m_flags &= ~M_EXT; - uma_zfree(zone_mbuf, m); + uma_zfree_arg(zone_mbuf, m, arg); } /* diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h index f3f98b0..19ea5e8 100644 --- a/sys/sys/mbuf.h +++ b/sys/sys/mbuf.h @@ -362,7 +362,7 @@ static __inline void m_freem(struct mbuf *m); static __inline void m_clget(struct mbuf *m, int how); static __inline void *m_cljget(struct mbuf *m, int how, int size); static __inline void m_chtype(struct mbuf *m, short new_type); -void mb_free_ext(struct mbuf *); +void mb_free_ext(struct mbuf *, void *); static __inline struct mbuf *m_last(struct mbuf *m); static __inline int @@ -506,21 +506,39 @@ m_free_fast(struct mbuf *m) { KASSERT(SLIST_EMPTY(&m->m_pkthdr.tags), ("doing fast free of mbuf with tags")); +#ifdef INVARIANTS + uma_zfree_arg(zone_mbuf, m, (void *)(0xf00f0000 | MB_NOTAGS)); +#else uma_zfree_arg(zone_mbuf, m, (void *)MB_NOTAGS); +#endif } static __inline struct mbuf * -m_free(struct mbuf *m) +m_free_arg(struct mbuf *m, void *arg) { struct mbuf *n = m->m_next; if (m->m_flags & M_EXT) - mb_free_ext(m); + mb_free_ext(m, arg); else if ((m->m_flags & M_NOFREE) == 0) - uma_zfree(zone_mbuf, m); + uma_zfree_arg(zone_mbuf, m, arg); + return (n); } +static __inline struct mbuf * +m_free(struct mbuf *m) +{ + + return m_free_arg(m, 0); +} + +#ifdef INVARIANTS +#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) +#else +#define _THIS_IP_ 0 +#endif + /* * Free an entire chain of mbufs and associated external buffers, if * applicable. @@ -528,9 +546,18 @@ m_free(struct mbuf *m) static __inline void m_freem(struct mbuf *m) { + unsigned long this_ip = (_THIS_IP_ & 0x00ffff00) | (_THIS_IP_ & 0xff) << 24; + + while (m != NULL) + m = m_free_arg(m, (void *)this_ip); +} + +static __inline void +m_freem_arg(struct mbuf *m, void *arg) +{ while (m != NULL) - m = m_free(m); + m = m_free_arg(m, arg); } static __inline void diff --git a/sys/vm/uma_dbg.c b/sys/vm/uma_dbg.c index 9075bf9..669b1f5 100644 --- a/sys/vm/uma_dbg.c +++ b/sys/vm/uma_dbg.c @@ -61,6 +61,7 @@ static const u_int32_t uma_junk = 0xdeadc0de; int trash_ctor(void *mem, int size, void *arg, int flags) { +#if 0 int cnt; u_int32_t *p; @@ -72,6 +73,7 @@ trash_ctor(void *mem, int size, void *arg, int flags) mem, size, *p, p); return (0); } +#endif return (0); } @@ -90,7 +92,7 @@ trash_dtor(void *mem, int size, void *arg) cnt = size / sizeof(uma_junk); for (p = mem; cnt > 0; cnt--, p++) - *p = uma_junk; + *p = (unsigned long)arg; } /* @@ -102,7 +104,7 @@ trash_dtor(void *mem, int size, void *arg) int trash_init(void *mem, int size, int flags) { - trash_dtor(mem, size, NULL); + trash_dtor(mem, size, (void *)uma_junk); return (0); } -- 1.7.6.153.g78432
From c58e8728e4dd75b542fa870bfc7420a3c555d2ca Mon Sep 17 00:00:00 2001 From: Arnaud Lacombe <lacom...@gmail.com> Date: Wed, 31 Aug 2011 22:08:04 -0400 Subject: [PATCH 3/4] 64bits mbuf flags --- sys/kern/uipc_mbuf.c | 6 +++++- sys/sys/mbuf.h | 46 +++++++++++++++++++++++----------------------- 2 files changed, 28 insertions(+), 24 deletions(-) diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c index 9b92fb0..24f9ec8 100644 --- a/sys/kern/uipc_mbuf.c +++ b/sys/kern/uipc_mbuf.c @@ -50,6 +50,8 @@ __FBSDID("$FreeBSD$"); #include <security/mac/mac_framework.h> +#include <machine/_inttypes.h> + int max_linkhdr; int max_protohdr; int max_hdr; @@ -1405,10 +1407,12 @@ m_print(const struct mbuf *m, int maxlen) pdata = m2->m_len; if (maxlen != -1 && pdata > maxlen) pdata = maxlen; +#if 0 printf("mbuf: %p len: %d, next: %p, %b%s", m2, m2->m_len, m2->m_next, m2->m_flags, "\20\20freelist\17skipfw" "\11proto5\10proto4\7proto3\6proto2\5proto1\4rdonly" "\3eor\2pkthdr\1ext", pdata ? "" : "\n"); +#endif if (pdata) printf(", %*D\n", pdata, (u_char *)m2->m_data, "-"); if (len != -1) @@ -1835,7 +1839,7 @@ m_unshare(struct mbuf *m0, int how) * it anyway, we try to reduce the number of mbufs and * clusters so that future work is easier). */ - KASSERT(m->m_flags & M_EXT, ("m_flags 0x%x", m->m_flags)); + KASSERT(m->m_flags & M_EXT, ("m_flags 0x%" PRIx64, m->m_flags)); /* NB: we only coalesce into a cluster or larger */ if (mprev != NULL && (mprev->m_flags & M_EXT) && m->m_len <= M_TRAILINGSPACE(mprev)) { diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h index 19ea5e8..8ad09bb 100644 --- a/sys/sys/mbuf.h +++ b/sys/sys/mbuf.h @@ -91,7 +91,7 @@ struct m_hdr { struct mbuf *mh_nextpkt; /* next chain in queue/record */ caddr_t mh_data; /* location of data */ int mh_len; /* amount of data in this mbuf */ - int mh_flags; /* flags; see below */ + uint64_t mh_flags; /* flags; see below */ short mh_type; /* type of data in this mbuf */ uint8_t pad[M_HDR_PAD];/* word align */ }; @@ -169,28 +169,28 @@ struct mbuf { /* * mbuf flags. */ -#define M_EXT 0x00000001 /* has associated external storage */ -#define M_PKTHDR 0x00000002 /* start of record */ -#define M_EOR 0x00000004 /* end of record */ -#define M_RDONLY 0x00000008 /* associated data is marked read-only */ -#define M_PROTO1 0x00000010 /* protocol-specific */ -#define M_PROTO2 0x00000020 /* protocol-specific */ -#define M_PROTO3 0x00000040 /* protocol-specific */ -#define M_PROTO4 0x00000080 /* protocol-specific */ -#define M_PROTO5 0x00000100 /* protocol-specific */ -#define M_BCAST 0x00000200 /* send/received as link-level broadcast */ -#define M_MCAST 0x00000400 /* send/received as link-level multicast */ -#define M_FRAG 0x00000800 /* packet is a fragment of a larger packet */ -#define M_FIRSTFRAG 0x00001000 /* packet is first fragment */ -#define M_LASTFRAG 0x00002000 /* packet is last fragment */ -#define M_SKIP_FIREWALL 0x00004000 /* skip firewall processing */ -#define M_FREELIST 0x00008000 /* mbuf is on the free list */ -#define M_VLANTAG 0x00010000 /* ether_vtag is valid */ -#define M_PROMISC 0x00020000 /* packet was not for us */ -#define M_NOFREE 0x00040000 /* do not free mbuf, embedded in cluster */ -#define M_PROTO6 0x00080000 /* protocol-specific */ -#define M_PROTO7 0x00100000 /* protocol-specific */ -#define M_PROTO8 0x00200000 /* protocol-specific */ +#define M_EXT 0x00000001ULL /* has associated external storage */ +#define M_PKTHDR 0x00000002ULL /* start of record */ +#define M_EOR 0x00000004ULL /* end of record */ +#define M_RDONLY 0x00000008ULL /* associated data is marked read-only */ +#define M_PROTO1 0x00000010ULL /* protocol-specific */ +#define M_PROTO2 0x00000020ULL /* protocol-specific */ +#define M_PROTO3 0x00000040ULL /* protocol-specific */ +#define M_PROTO4 0x00000080ULL /* protocol-specific */ +#define M_PROTO5 0x00000100ULL /* protocol-specific */ +#define M_BCAST 0x00000200ULL /* send/received as link-level broadcast */ +#define M_MCAST 0x00000400ULL /* send/received as link-level multicast */ +#define M_FRAG 0x00000800ULL /* packet is a fragment of a larger packet */ +#define M_FIRSTFRAG 0x00001000ULL /* packet is first fragment */ +#define M_LASTFRAG 0x00002000ULL /* packet is last fragment */ +#define M_SKIP_FIREWALL 0x00004000ULL /* skip firewall processing */ +#define M_FREELIST 0x00008000ULL /* mbuf is on the free list */ +#define M_VLANTAG 0x00010000ULL /* ether_vtag is valid */ +#define M_PROMISC 0x00020000ULL /* packet was not for us */ +#define M_NOFREE 0x00040000ULL /* do not free mbuf, embedded in cluster */ +#define M_PROTO6 0x00080000ULL /* protocol-specific */ +#define M_PROTO7 0x00100000ULL /* protocol-specific */ +#define M_PROTO8 0x00200000ULL /* protocol-specific */ /* * For RELENG_{6,7} steal these flags for limited multiple routing table * support. In RELENG_8 and beyond, use just one flag and a tag. -- 1.7.6.153.g78432
From d0530867a44e8029a4e64e69cacdcaa998a7e76f Mon Sep 17 00:00:00 2001 From: Arnaud Lacombe <lacom...@gmail.com> Date: Wed, 31 Aug 2011 22:10:56 -0400 Subject: [PATCH 4/4] verify mbuf's flags consistency after tx --- sys/dev/e1000/if_em.c | 14 +++++++++++++- sys/dev/e1000/if_igb.c | 14 +++++++++++++- sys/sys/mbuf.h | 3 +++ 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/sys/dev/e1000/if_em.c b/sys/dev/e1000/if_em.c index c5e936d..ca1647e 100644 --- a/sys/dev/e1000/if_em.c +++ b/sys/dev/e1000/if_em.c @@ -74,6 +74,7 @@ #include <netinet/udp.h> #include <machine/in_cksum.h> +#include <machine/_inttypes.h> #include <dev/led/led.h> #include <dev/pci/pcivar.h> #include <dev/pci/pcireg.h> @@ -1907,6 +1908,8 @@ em_xmit(struct tx_ring *txr, struct mbuf **m_headp) ctxd->lower.data |= htole32(E1000_TXD_CMD_VLE); } + m_head->m_flags |= M_UNUSED; + tx_buffer->m_head = m_head; tx_buffer_mapped->map = tx_buffer->map; tx_buffer->map = map; @@ -3535,7 +3538,16 @@ em_txeof(struct tx_ring *txr) BUS_DMASYNC_POSTWRITE); bus_dmamap_unload(txr->txtag, tx_buffer->map); - m_freem(tx_buffer->m_head); + { + KASSERT((tx_buffer->m_head->m_flags & M_UNUSED) == M_UNUSED, + ("Corrupted unused flags, expected 0x%" PRIx64 ", got 0x%" PRIx64 ", flags 0x%" PRIx64, + M_UNUSED, tx_buffer->m_head->m_flags & M_UNUSED, + tx_buffer->m_head->m_flags)); + tx_buffer->m_head->m_flags &= ~M_UNUSED; + + m_freem_arg(tx_buffer->m_head, (void *)0x00babe00); + } + tx_buffer->m_head = NULL; } tx_buffer->next_eop = -1; diff --git a/sys/dev/e1000/if_igb.c b/sys/dev/e1000/if_igb.c index 4700829..9ad616c 100644 --- a/sys/dev/e1000/if_igb.c +++ b/sys/dev/e1000/if_igb.c @@ -80,6 +80,7 @@ #include <netinet/udp.h> #include <machine/in_cksum.h> +#include <machine/_inttypes.h> #include <dev/led/led.h> #include <dev/pci/pcivar.h> #include <dev/pci/pcireg.h> @@ -1655,6 +1656,8 @@ igb_xmit(struct tx_ring *txr, struct mbuf **m_headp) txr->next_avail_desc = i; txr->tx_avail -= nsegs; + m_head->m_flags |= M_UNUSED; + tx_buffer->m_head = m_head; tx_buffer_mapped->map = tx_buffer->map; tx_buffer->map = map; @@ -3374,7 +3377,16 @@ igb_txeof(struct tx_ring *txr) bus_dmamap_unload(txr->txtag, tx_buffer->map); - m_freem(tx_buffer->m_head); + { + KASSERT((tx_buffer->m_head->m_flags & M_UNUSED) == M_UNUSED, + ("Corrupted unused flags, expected 0x%" PRIx64 ", got 0x%" PRIx64 ", flags 0x%" PRIx64, + M_UNUSED, tx_buffer->m_head->m_flags & M_UNUSED, + tx_buffer->m_head->m_flags)); + tx_buffer->m_head->m_flags &= ~M_UNUSED; + + m_freem_arg(tx_buffer->m_head, (void *)0x00babe00); + } + tx_buffer->m_head = NULL; } tx_buffer->next_eop = -1; diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h index 8ad09bb..299e691 100644 --- a/sys/sys/mbuf.h +++ b/sys/sys/mbuf.h @@ -191,6 +191,9 @@ struct mbuf { #define M_PROTO6 0x00080000ULL /* protocol-specific */ #define M_PROTO7 0x00100000ULL /* protocol-specific */ #define M_PROTO8 0x00200000ULL /* protocol-specific */ + +#define M_UNUSED (~(0xffffffffULL)) + /* * For RELENG_{6,7} steal these flags for limited multiple routing table * support. In RELENG_8 and beyond, use just one flag and a tag. -- 1.7.6.153.g78432
_______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"