Hi Damjan,Dave

I have tried running (mheap validation )CLI  "test heap-validate now" and
it leads to crash that means mheap is corrupted when system is coming up.

Then I have created a mheap_validation function and put the same in various
places , just to check which code leg is causing mheap corruption.

I found below code which is called for 12 Workers is causing issue
once "fm->fp_per_worker_sessions"
is "1M" , but works fine when we set it to 524288.

mheap_validation_check();
pool_alloc_aligned(wk->fp_sessions_pool, fm->fp_per_worker_sessions ,
CLIB_CACHE_LINE_BYTES);
mheap_validation_check(); --> panic here

Any suggestion/advise is really helpful.


Thanks,
Chetan Bhasin

On Tue, Jan 29, 2019 at 5:35 PM Damjan Marion <dmar...@me.com> wrote:

> Please search this mailing list archive, Dave provided some hints some
> time ago....
>
> 90M is not terribly high, but it can also be victim of something else
> holding memory.
>
>
> On 29 Jan 2019, at 12:54, chetan bhasin <chetan.bhasin...@gmail.com>
> wrote:
>
> Hi Damjan,
>
> Thanks for the reply.
>
> what should be a typical way of debugging a corrupt vector pointer eg. can
> we set a watchpoint on some field in vector header which will most
> likelygetting disturbed so that we can nab who is corrupting the vector.
>
> With 1M entries do you think 90M is an issue.
>
>
> Clearly we have a lurking bug somewhere.
>
> Thanks,
> Chetan Bhasin
>
>
> On Tue, Jan 29, 2019, 16:53 Damjan Marion <dmar...@me.com wrote:
>
>>
>> typically this happens when you run out of memory / main heap size or you
>> have corrupted vector pointer..
>>
>> It will be easier to read your traceback if it is captured with debug
>> image, but according to frame 11, your vector is already 90MB big.
>> Is this expected to be?
>>
>>
>> On 29 Jan 2019, at 11:31, chetan bhasin <chetan.bhasin...@gmail.com>
>> wrote:
>>
>> Hello Everyone, I know 18.01 is not supported now , but just want to
>> understand what could be the reason for the below crash, we are adding
>> entries in pool using pool_get_alligned which is causing vec_resize. This
>> issue comes when reaches around 1M entries. Whether it is due to limited
>> memory or some memory corruption or something else? Core was generated by 
>> `bin/vpp
>> -c co'.
>> Program terminated with signal 6, Aborted.
>> #0  0x00002ab534028207 in __GI_raise (sig=sig@entry=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> 56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>> Missing separate debuginfos, use: debuginfo-install
>> OPWVmepCR-7.0-el7.x86_64
>> (gdb) bt
>> #0  0x00002ab534028207 in __GI_raise (sig=sig@entry=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> #1  0x00002ab5340298f8 in __GI_abort () at abort.c:90
>> #2  0x0000000000405ea9 in os_panic () at
>> /bfs-build/build-area.42/builds/LinuxNBngp_7.X_RH7/2019-01-07-2044/third-party/vpp/vpp_1801/build-data/../src/vpp/vnet/main.c:266
>> #3  0x00002ab53213aad9 in unix_signal_handler (signum=<optimized out>,
>> si=<optimized out>, uc=<optimized out>)
>>     at vpp/vpp_1801/build-data/../src/vlib/unix/main.c:126
>> #4  <signal handler called>
>> #5  _mm_storeu_si128 (__B=..., __P=<optimized out>) at
>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/emmintrin.h:702
>> #6  clib_mov16 (src=<optimized out>, dst=<optimized out>)
>>     at vpp/vpp_1801/build-data/../src/vppinfra/memcpy_sse3.h:60
>> #7  clib_mov32 (src=<optimized out>, dst=<optimized out>)
>>     at vpp/vpp_1801/build-data/../src/vppinfra/memcpy_sse3.h:66
>> #8  clib_mov64 (src=0x2ab62d1b04e0 "", dst=0x2ab5426e1fe0 "")
>>     at vpp/vpp_1801/build-data/../src/vppinfra/memcpy_sse3.h:74
>> #9  clib_mov128 (src=0x2ab62d1b04e0 "", dst=0x2ab5426e1fe0 "")
>>     at vpp/vpp_1801/build-data/../src/vppinfra/memcpy_sse3.h:80
>> #10 clib_mov256 (src=0x2ab62d1b04e0 "", dst=0x2ab5426e1fe0 "")
>>     at vpp/vpp_1801/build-data/../src/vppinfra/memcpy_sse3.h:87
>> #11 clib_memcpy (n=90646888, src=0x2ab62d1b04e0, dst=0x2ab5426e1fe0)
>>     at vpp/vpp_1801/build-data/../src/vppinfra/memcpy_sse3.h:325
>> #12 vec_resize_allocate_memory (v=<optimized out>,
>> length_increment=length_increment@entry=1, data_bytes=<optimized out>,
>> header_bytes=<optimized out>, header_bytes@entry=48,
>>     data_align=data_align@entry=64) at
>> vpp/vpp_1801/build-data/../src/vppinfra/vec.c:95
>> #13 0x00002ab7b74a61c1 in _vec_resize (data_align=64, header_bytes=48,
>> data_bytes=<optimized out>, length_increment=1, v=<optimized out>)
>>     at include/vppinfra/vec.h:142
>> #14 xxx_allocate_flow (fm=0x2ab7b76c8fc0 <fp_main>)
>> atvpp/plugins/src/fastpath/fastpath.c:1502 Regards, Chetan Bhasin
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>>
>> View/Reply Online (#12039): https://lists.fd.io/g/vpp-dev/message/12039
>> Mute This Topic: https://lists.fd.io/mt/29580803/675642
>> Group Owner: vpp-dev+ow...@lists.fd.io
>> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [dmar...@me.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>
>>
>> --
>> Damjan
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
>
> View/Reply Online (#12042): https://lists.fd.io/g/vpp-dev/message/12042
> Mute This Topic: https://lists.fd.io/mt/29580803/675642
> Group Owner: vpp-dev+ow...@lists.fd.io
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [dmar...@me.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>
> --
> Damjan
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12292): https://lists.fd.io/g/vpp-dev/message/12292
Mute This Topic: https://lists.fd.io/mt/29580803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to