On Wed, Dec 11, 2024 at 02:14:39AM +0100, nick wrote: > Thank you! I just ran a quick test and encountered the same crash at the > same line. I’ll have more time to investigate tomorrow and can provide > additional details then. Do you have any other ideas I could try in the > meantime?
A part of my previous explanation was wrong, the issue is not related to net_copy(), as the crash happened even before. For some reason, struct bgp_prefix is u64-aligned. That makes sense in the original code, where u64 alignment is forced by struct net_addr, but i have no idea why this is true even after the patch removing the forced alignment. The slab allocator in sl_alloc() just returns u32-aligned memory, because it is configured for sizeof(struct bgp_prefix) + sizeof(net_addr_ip6) and the second is 20 as net_addr_ip6 is u32-aligned. The sl_alloc() internally only enforces word-size alignment, wich is u32 in your case. Please disregard the previous patch. Can you try the first attached patch (log-alignment.patch) to log alignment of different structures with or without the second patch (net-addr-u32-align.patch) and report me results? > On 12/11/24 1:41 AM, Ondrej Zajicek wrote: > > On Tue, Dec 10, 2024 at 09:15:46PM +0100, nick via Bird-users wrote: > > > I also uploaded the coredumpfile: > > > https://github.com/PolynomialDivision/coredumpupload/blob/main/bird_coredump > > Thanks. This seems like an interesting issue. In BIRD, generic net_addr > > structure is explicitly u64-aligned (to accomodate VPN variants), while > > specific net_addr_ip4 and net_addr_ip6 are just u32-aligned. In this case > > net_addr_ip6 is allocated with u32 alignment, but then copied with > > net_copy(), which assumes generic net_addr for arguments, and compiler > > probably used some u64-optimized copying, which required 64-bit alignment > > despite being on 32-bit platform, > > > > For starters, try the attached patch. But it is preliminary, we will revisit > > alignment of these structures. > > > > > > > > > > The root cause appears to be insufficient alignment of memory > > > > > > allocated for > > > > > > structures, specifically in this line: > > > > > > > > > > > > ```c > > > > > > px = mb_alloc(c->pool, sizeof(struct bgp_prefix) + net->length); > > > > > > ``` > > Note that it is really allocated two lines above, here: > > > > px = sl_alloc(c->prefix_slab); > > -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org) "To err is human -- to blame it on a computer is even more so."
diff --git a/proto/bgp/bgp.c b/proto/bgp/bgp.c index c18a73fe..3c18f80a 100644 --- a/proto/bgp/bgp.c +++ b/proto/bgp/bgp.c @@ -2771,5 +2771,12 @@ struct protocol proto_bgp = { void bgp_build(void) { + log(L_WARN "ALIGN: %d %d %d %d %d", + (int) _Alignof(net_addr), + (int) _Alignof(net_addr_ip4), + (int) _Alignof(net_addr_ip6), + (int) _Alignof(node), + (int) _Alignof(struct bgp_prefix)); + proto_build(&proto_bgp); }
diff --git a/lib/net.h b/lib/net.h index 61ed37ba..eee59de5 100644 --- a/lib/net.h +++ b/lib/net.h @@ -51,7 +51,7 @@ typedef struct net_addr { u8 pxlen; u16 length; u8 data[20]; - u64 align[0]; + u32 align[0]; } net_addr; typedef struct net_addr_ip4 {