Hi Dave, We ran a good number of scale tests with the fix. We didn't hit this crash.
Thanks a lot for the fix. Regards, Rajith On Wed, Jun 3, 2020 at 5:40 PM Dave Barach (dbarach) <[email protected]> wrote: > Please test https://gerrit.fd.io/r/c/vpp/+/27407 and report results. > > -----Original Message----- > From: [email protected] <[email protected]> On Behalf Of Dave Barach > via lists.fd.io > Sent: Wednesday, June 3, 2020 7:08 AM > To: Benoit Ganne (bganne) <[email protected]>; [email protected] > Cc: vpp-dev <[email protected]>; Neale Ranns (nranns) <[email protected]> > Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get() > > +1, can't tell which poison pattern is involved without a scorecard. > > load_balance_alloc_i (...) is clearly not thread-safe due to calls to > pool_get_aligned (...) and vlib_validate_combined_counter(...). > > Judicious use of pool_get_aligned_will_expand(...), > _vec_resize_will_expand(...) and a manual barrier sync will fix this > problem without resorting to draconian measures. > > It'd sure be nice to hear from Neale before we code something like that. > > D. > > -----Original Message----- > From: Benoit Ganne (bganne) <[email protected]> > Sent: Wednesday, June 3, 2020 3:17 AM > To: [email protected]; Dave Barach (dbarach) <[email protected]> > Cc: vpp-dev <[email protected]>; Neale Ranns (nranns) <[email protected]> > Subject: RE: [vpp-dev] SEGMENTATION FAULT in load_balance_get() > > Neale is away and might be slow to react. > I suspect the issue is when creating new load balance entry through > load_blance_create(), which will get a new element from the load balance > pool. This in turn will update the pool free bitmap, which can grow. As it > is backed by a vector, it can be reallocated somewhere else to fit the new > size. > If it is done concurrently with dataplane processing, bad things happen. > The pattern 0x131313 is filled by dlmalloc free() and will happen in that > case. I think the same could happen to the pool itself, not only the bitmap. > If I am correct, I am not sure how we should fix that: fib update API is > marked as mp_safe, so we could create a fixed-size load balance pool to > prevent runtime reallocation, but it would waste memory and impose a > maximum size. > > ben > > > -----Original Message----- > > From: [email protected] <[email protected]> On Behalf Of Rajith PR > > via lists.fd.io > > Sent: mercredi 3 juin 2020 05:46 > > To: Dave Barach (dbarach) <[email protected]> > > Cc: vpp-dev <[email protected]>; Neale Ranns (nranns) > > <[email protected]> > > Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get() > > > > Hi Dave/Neal, > > > > The adj_poison seems to be a filling pattern - - 0xfefe. Am I looking > > into the right code or I have interpreted it incorrectly? > > > > Thanks, > > Rajith > > > > On Tue, Jun 2, 2020 at 7:44 PM Dave Barach (dbarach) > > <[email protected] <mailto:[email protected]> > wrote: > > > > > > The code manages to access a poisoned adjacency – 0x131313 fill > > pattern – copying Neale for an opinion. > > > > > > > > D. > > > > > > > > From: [email protected] <mailto:[email protected]> <vpp- > > [email protected] <mailto:[email protected]> > On Behalf Of Rajith PR > > via lists.fd.io <http://lists.fd.io> > > Sent: Tuesday, June 2, 2020 10:00 AM > > To: vpp-dev <[email protected] <mailto:[email protected]> > > > Subject: [vpp-dev] SEGMENTATION FAULT in load_balance_get() > > > > > > > > Hello All, > > > > > > > > In 19.08 VPP version we are seeing a crash while accessing the > > load_balance_pool in load_balanc_get() function. This is happening > > after enabling worker threads. > > > > As such the FIB programming is happening in the main thread and in > > one of the worker threads we see this crash. > > > > Also, this is seen when we scale to 300K+ ipv4 routes. > > > > > > > > Here is the complete stack, > > > > > > > > Thread 10 "vpp_wk_0" received signal SIGSEGV, Segmentation fault. > > > > [Switching to Thread 0x7fbe4aa8e700 (LWP 333)] > > 0x00007fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313, > i=61) > > at /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201 > > 201 return i0 < vec_len (ai) && 0 != ((ai[i0] >> i1) & 1); > > > > > > > > Thread 10 (Thread 0x7fbe4aa8e700 (LWP 333)): > > #0 0x00007fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313, > > i=61) at /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201 > > #1 0x00007fbef10676a8 in load_balance_get (lbi=61) at > > /home/ubuntu/Scale/libvpp/src/vnet/dpo/load_balance.h:222 > > #2 0x00007fbef106890c in ip4_lookup_inline (vm=0x7fbe8a5aa080, > > node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40) at > > /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.h:369 > > #3 0x00007fbef1068ead in ip4_lookup_node_fn_avx2 > (vm=0x7fbe8a5aa080, > > node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40) > > at /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.c:95 > > #4 0x00007fbef0c6afec in dispatch_node (vm=0x7fbe8a5aa080, > > node=0x7fbe8b3fd380, type=VLIB_NODE_TYPE_INTERNAL, > > dispatch_state=VLIB_NODE_STATE_POLLING, > > frame=0x7fbe8a5edb40, last_time_stamp=381215594286358) at > > /home/ubuntu/Scale/libvpp/src/vlib/main.c:1207 > > #5 0x00007fbef0c6b7ad in dispatch_pending_node > (vm=0x7fbe8a5aa080, > > pending_frame_index=2, last_time_stamp=381215594286358) > > at /home/ubuntu/Scale/libvpp/src/vlib/main.c:1375 > > #6 0x00007fbef0c6d3f0 in vlib_main_or_worker_loop > > (vm=0x7fbe8a5aa080, is_main=0) at > > /home/ubuntu/Scale/libvpp/src/vlib/main.c:1826 > > #7 0x00007fbef0c6dc73 in vlib_worker_loop (vm=0x7fbe8a5aa080) at > > /home/ubuntu/Scale/libvpp/src/vlib/main.c:1934 > > #8 0x00007fbef0cac791 in vlib_worker_thread_fn > (arg=0x7fbe8de2a340) > > at /home/ubuntu/Scale/libvpp/src/vlib/threads.c:1754 > > #9 0x00007fbef092da48 in clib_calljmp () from > > /home/ubuntu/Scale/libvpp/build-root/install-vpp_debug- > > native/vpp/lib/libvppinfra.so.1.0.1 > > #10 0x00007fbe4aa8dec0 in ?? () > > #11 0x00007fbef0ca700c in vlib_worker_thread_bootstrap_fn > > (arg=0x7fbe8de2a340) at > > /home/ubuntu/Scale/libvpp/src/vlib/threads.c:573 > > > > Thanks in Advance, > > > > Rajith > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16706): https://lists.fd.io/g/vpp-dev/message/16706 Mute This Topic: https://lists.fd.io/mt/74627827/21656 Group Owner: [email protected] Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
