Hi Dave,

We ran a good number of scale tests with the fix. We didn't hit this crash.

Thanks a lot for the fix.

Regards,
Rajith



On Wed, Jun 3, 2020 at 5:40 PM Dave Barach (dbarach) <[email protected]>
wrote:

> Please test https://gerrit.fd.io/r/c/vpp/+/27407 and report results.
>
> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Dave Barach
> via lists.fd.io
> Sent: Wednesday, June 3, 2020 7:08 AM
> To: Benoit Ganne (bganne) <[email protected]>; [email protected]
> Cc: vpp-dev <[email protected]>; Neale Ranns (nranns) <[email protected]>
> Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
>
> +1, can't tell which poison pattern is involved without a scorecard.
>
> load_balance_alloc_i (...) is clearly not thread-safe due to calls to
> pool_get_aligned (...) and vlib_validate_combined_counter(...).
>
> Judicious use of pool_get_aligned_will_expand(...),
> _vec_resize_will_expand(...) and a manual barrier sync will fix this
> problem without resorting to draconian measures.
>
> It'd sure be nice to hear from Neale before we code something like that.
>
> D.
>
> -----Original Message-----
> From: Benoit Ganne (bganne) <[email protected]>
> Sent: Wednesday, June 3, 2020 3:17 AM
> To: [email protected]; Dave Barach (dbarach) <[email protected]>
> Cc: vpp-dev <[email protected]>; Neale Ranns (nranns) <[email protected]>
> Subject: RE: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
>
> Neale is away and might be slow to react.
> I suspect the issue is when creating new load balance entry through
> load_blance_create(), which will get a new element from the load balance
> pool. This in turn will update the pool free bitmap, which can grow. As it
> is backed by a vector, it can be reallocated somewhere else to fit the new
> size.
> If it is done concurrently with dataplane processing, bad things happen.
> The pattern 0x131313 is filled by dlmalloc free() and will happen in that
> case. I think the same could happen to the pool itself, not only the bitmap.
> If I am correct, I am not sure how we should fix that: fib update API is
> marked as mp_safe, so we could create a fixed-size load balance pool to
> prevent runtime reallocation, but it would waste memory and impose a
> maximum size.
>
> ben
>
> > -----Original Message-----
> > From: [email protected] <[email protected]> On Behalf Of Rajith PR
> > via lists.fd.io
> > Sent: mercredi 3 juin 2020 05:46
> > To: Dave Barach (dbarach) <[email protected]>
> > Cc: vpp-dev <[email protected]>; Neale Ranns (nranns)
> > <[email protected]>
> > Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
> >
> > Hi Dave/Neal,
> >
> > The adj_poison seems to be a filling pattern - - 0xfefe. Am I looking
> > into the right code or I have interpreted it incorrectly?
> >
> > Thanks,
> > Rajith
> >
> > On Tue, Jun 2, 2020 at 7:44 PM Dave Barach (dbarach)
> > <[email protected] <mailto:[email protected]> > wrote:
> >
> >
> >       The code manages to access a poisoned adjacency – 0x131313 fill
> > pattern – copying Neale for an opinion.
> >
> >
> >
> >       D.
> >
> >
> >
> >       From: [email protected] <mailto:[email protected]>  <vpp-
> > [email protected] <mailto:[email protected]> > On Behalf Of Rajith PR
> > via lists.fd.io <http://lists.fd.io>
> >       Sent: Tuesday, June 2, 2020 10:00 AM
> >       To: vpp-dev <[email protected] <mailto:[email protected]> >
> >       Subject: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
> >
> >
> >
> >       Hello All,
> >
> >
> >
> >       In 19.08 VPP version we are seeing a crash while accessing the
> > load_balance_pool  in load_balanc_get() function. This is happening
> > after enabling worker threads.
> >
> >       As such the FIB programming is happening in the main thread and in
> > one of the worker threads we see this crash.
> >
> >       Also, this is seen when we scale to 300K+ ipv4 routes.
> >
> >
> >
> >       Here is the complete stack,
> >
> >
> >
> >       Thread 10 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
> >
> >       [Switching to Thread 0x7fbe4aa8e700 (LWP 333)]
> >       0x00007fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313,
> i=61)
> > at /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
> >       201  return i0 < vec_len (ai) && 0 != ((ai[i0] >> i1) & 1);
> >
> >
> >
> >       Thread 10 (Thread 0x7fbe4aa8e700 (LWP 333)):
> >       #0  0x00007fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313,
> > i=61) at /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
> >       #1  0x00007fbef10676a8 in load_balance_get (lbi=61) at
> > /home/ubuntu/Scale/libvpp/src/vnet/dpo/load_balance.h:222
> >       #2  0x00007fbef106890c in ip4_lookup_inline (vm=0x7fbe8a5aa080,
> > node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40) at
> > /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.h:369
> >       #3  0x00007fbef1068ead in ip4_lookup_node_fn_avx2
> (vm=0x7fbe8a5aa080,
> > node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40)
> >           at /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.c:95
> >       #4  0x00007fbef0c6afec in dispatch_node (vm=0x7fbe8a5aa080,
> > node=0x7fbe8b3fd380, type=VLIB_NODE_TYPE_INTERNAL,
> > dispatch_state=VLIB_NODE_STATE_POLLING,
> >           frame=0x7fbe8a5edb40, last_time_stamp=381215594286358) at
> > /home/ubuntu/Scale/libvpp/src/vlib/main.c:1207
> >       #5  0x00007fbef0c6b7ad in dispatch_pending_node
> (vm=0x7fbe8a5aa080,
> > pending_frame_index=2, last_time_stamp=381215594286358)
> >           at /home/ubuntu/Scale/libvpp/src/vlib/main.c:1375
> >       #6  0x00007fbef0c6d3f0 in vlib_main_or_worker_loop
> > (vm=0x7fbe8a5aa080, is_main=0) at
> > /home/ubuntu/Scale/libvpp/src/vlib/main.c:1826
> >       #7  0x00007fbef0c6dc73 in vlib_worker_loop (vm=0x7fbe8a5aa080) at
> > /home/ubuntu/Scale/libvpp/src/vlib/main.c:1934
> >       #8  0x00007fbef0cac791 in vlib_worker_thread_fn
> (arg=0x7fbe8de2a340)
> > at /home/ubuntu/Scale/libvpp/src/vlib/threads.c:1754
> >       #9  0x00007fbef092da48 in clib_calljmp () from
> > /home/ubuntu/Scale/libvpp/build-root/install-vpp_debug-
> > native/vpp/lib/libvppinfra.so.1.0.1
> >       #10 0x00007fbe4aa8dec0 in ?? ()
> >       #11 0x00007fbef0ca700c in vlib_worker_thread_bootstrap_fn
> > (arg=0x7fbe8de2a340) at
> > /home/ubuntu/Scale/libvpp/src/vlib/threads.c:573
> >
> >       Thanks in Advance,
> >
> >       Rajith
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16706): https://lists.fd.io/g/vpp-dev/message/16706
Mute This Topic: https://lists.fd.io/mt/74627827/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to