Just so I understand, does the patch not fix the epoll issues or does it fix the issues but it doesn’t fix select, which apparently crashes in a different way.
Second, what is your usecase/app? Are you actually trying to share epoll/select between multiple threads? That is, multiple threads might want to call epoll_wait/select at the same time? That is not supported. The implicit assumption is that only the dispatcher thread is to call the two functions the rest of the threads do only io work. If all the threads must handle async communication via epoll/select, then they should register themselves as workers with vcl and get their own epoll fd. LDP does not support that. Florin > On Mar 29, 2019, at 12:13 PM, Sharath Kumar > <sharathkumarboyanapa...@gmail.com> wrote: > > No, it doesn't work. > > Attaching the applications being used. > > "Select" also has similar kind of issue when called from non-main thread > > Thread 9 "nstack_select" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffd77fe700 (LWP 63170)] > 0x00007ffff4e1d032 in ldp_select_init_maps (original=0x7fffbc0008c0, > resultb=0x7fffe002e514, libcb=0x7fffe002e544, vclb=0x7fffe002e52c, nfds=34, > minbits=64, n_bytes=5, si_bits=0x7fffd77fdc20, > libc_bits=0x7fffd77fdc28) at > /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:601 > 601 clib_bitmap_validate (*vclb, minbits); > (gdb) bt > #0 0x00007ffff4e1d032 in ldp_select_init_maps (original=0x7fffbc0008c0, > resultb=0x7fffe002e514, libcb=0x7fffe002e544, vclb=0x7fffe002e52c, nfds=34, > minbits=64, n_bytes=5, si_bits=0x7fffd77fdc20, > libc_bits=0x7fffd77fdc28) at > /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:601 > #1 0x00007ffff4e1db47 in ldp_pselect (nfds=34, readfds=0x7fffbc0008c0, > writefds=0x7fffbc000cd0, exceptfds=0x7fffbc0010e0, timeout=0x7fffd77fdcb0, > sigmask=0x0) > at > /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:723 > #2 0x00007ffff4e1e5d5 in select (nfds=34, readfds=0x7fffbc0008c0, > writefds=0x7fffbc000cd0, exceptfds=0x7fffbc0010e0, timeout=0x7fffd77fdd20) > at > /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:857 > #3 0x00007ffff7b4c42a in nstack_select_thread (arg=0x0) at > /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/select/nstack_select.c:651 > #4 0x00007ffff78ed6ba in start_thread (arg=0x7fffd77fe700) at > pthread_create.c:333 > #5 0x00007ffff741b41d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 > > > Before https://gerrit.fd.io/r/#/c/18597/ <https://gerrit.fd.io/r/#/c/18597/> > I have tried to fix the issue. > > The below changes fixed epoll_wait and epoll_ctl issues for me.[doesn't > include the changes of https://gerrit.fd.io/r/#/c/18597/ > <https://gerrit.fd.io/r/#/c/18597/>] > > diff --git a/src/vcl/vcl_locked.c b/src/vcl/vcl_locked.c > index fb19b5d..e6c891b 100644 > --- a/src/vcl/vcl_locked.c > +++ b/src/vcl/vcl_locked.c > @@ -564,7 +564,10 @@ vls_attr (vls_handle_t vlsh, uint32_t op, void *buffer, > uint32_t * buflen) > > if (!(vls = vls_get_w_dlock (vlsh))) > return VPPCOM_EBADFD; > + > + vls_mt_guard (0, VLS_MT_OP_XPOLL); > rv = vppcom_session_attr (vls_to_sh_tu (vls), op, buffer, buflen); > + vls_mt_unguard (); > vls_get_and_unlock (vlsh); > return rv; > } > @@ -773,8 +776,10 @@ vls_epoll_ctl (vls_handle_t ep_vlsh, int op, > vls_handle_t vlsh, > vls_table_rlock (); > ep_vls = vls_get_and_lock (ep_vlsh); > vls = vls_get_and_lock (vlsh); > + vls_mt_guard (0, VLS_MT_OP_XPOLL); > ep_sh = vls_to_sh (ep_vls); > sh = vls_to_sh (vls); > + vls_mt_unguard (); > > if (PREDICT_FALSE (!vlsl->epoll_mp_check)) > vls_epoll_ctl_mp_checks (vls, op); > > Thanks, > Sharath. > > On Fri, Mar 29, 2019 at 9:15 PM Florin Coras <fcoras.li...@gmail.com > <mailto:fcoras.li...@gmail.com>> wrote: > Interesting. What application are you running and does this [1] fix the issue > for you? > > In short, many of vls’ apis check if the call is coming in on a new pthread > and program vcl accordingly if yes. The patch makes sure vls_attr does that > as well. > > Thanks, > Florin > > [1] https://gerrit.fd.io/r/#/c/18597/ <https://gerrit.fd.io/r/#/c/18597/> > >> On Mar 29, 2019, at 4:29 AM, Dave Barach via Lists.Fd.Io >> <http://lists.fd.io/> <dbarach=cisco....@lists.fd.io >> <mailto:dbarach=cisco....@lists.fd.io>> wrote: >> >> For whatever reason, the vls layer received an event notification which >> didn’t end well. vcl_worker_get (wrk_index=4294967295) [aka 0xFFFFFFFF] will >> never work. >> >> I’ll let Florin comment further. He’s in the PDT time zone, so don’t expect >> to hear from him for a few hours. >> >> D. >> >> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io >> <mailto:vpp-dev@lists.fd.io>> On Behalf Of sharath kumar >> Sent: Friday, March 29, 2019 12:18 AM >> To: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>; csit-...@lists.fd.io >> <mailto:csit-...@lists.fd.io> >> Subject: [vpp-dev] multi-threaded application, "epoll_wait" and "epoll_ctl" >> have "received signal SIGABRT, Aborted". >> >> Hello all, >> >> I am a newbie to VPP. >> >> I am trying to run VPP with a multi-threaded application. >> "recv" works fine from non-main threads, >> whereas "epoll_wait" and "epoll_ctl" have "received signal SIGABRT, Aborted". >> >> Is this a known issue? >> Or am I doing something wrong? >> >> Attaching backtrace for "epoll_wait" and "epoll_ctl" >> >> Thread 9 "dmm_vcl_epoll" received signal SIGABRT, Aborted. >> [Switching to Thread 0x7fffd67fe700 (LWP 56234)] >> 0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:54 >> 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >> (gdb) bt >> #0 0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:54 >> #1 0x00007ffff734b02a in __GI_abort () at abort.c:89 >> #2 0x00007ffff496d873 in os_panic () at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/unix-misc.c:176 >> #3 0x00007ffff48ce42c in debugger () at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:84 >> #4 0x00007ffff48ce864 in _clib_error (how_to_die=2, function_name=0x0, >> line_number=0, fmt=0x7ffff4bfe0e0 "%s:%d (%s) assertion `%s' fails") >> at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:143 >> #5 0x00007ffff4bcca7d in vcl_worker_get (wrk_index=4294967295) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:540 >> #6 0x00007ffff4bccabe in vcl_worker_get_current () at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:554 >> #7 0x00007ffff4bd7c49 in vppcom_session_attr (session_handle=4278190080, >> op=6, buffer=0x0, buflen=0x0) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vppcom.c:2606 >> #8 0x00007ffff4bfc7fd in vls_attr (vlsh=0, op=6, buffer=0x0, buflen=0x0) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_locked.c:569 >> #9 0x00007ffff4e21736 in ldp_epoll_pwait (epfd=32, events=0x7fffd67fad20, >> maxevents=1024, timeout=100, sigmask=0x0) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:2203 >> #10 0x00007ffff4e21948 in epoll_wait (epfd=32, events=0x7fffd67fad20, >> maxevents=1024, timeout=100) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:2257 >> #11 0x00007ffff4e13041 in dmm_vcl_epoll_thread (arg=0x0) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/dmm_vcl_adpt.c:75 >> #12 0x00007ffff78ed6ba in start_thread (arg=0x7fffd67fe700) at >> pthread_create.c:333 >> #13 0x00007ffff741b41d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >> >> >> >> >> Thread 11 "vs_epoll" received signal SIGABRT, Aborted. >> 0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:54 >> 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >> (gdb) bt >> #0 0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:54 >> #1 0x00007ffff734b02a in __GI_abort () at abort.c:89 >> #2 0x00007ffff496d873 in os_panic () at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/unix-misc.c:176 >> #3 0x00007ffff48ce42c in debugger () at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:84 >> #4 0x00007ffff48ce864 in _clib_error (how_to_die=2, function_name=0x0, >> line_number=0, fmt=0x7ffff4bfe1a0 "%s:%d (%s) assertion `%s' fails") >> at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:143 >> #5 0x00007ffff4bcca7d in vcl_worker_get (wrk_index=4294967295) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:540 >> #6 0x00007ffff4bccabe in vcl_worker_get_current () at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:554 >> #7 0x00007ffff4bd597a in vppcom_epoll_ctl (vep_handle=4278190080, op=1, >> session_handle=4278190082, event=0x7fffd4dfb3b0) >> at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vppcom.c:2152 >> #8 0x00007ffff4bfd061 in vls_epoll_ctl (ep_vlsh=0, op=1, vlsh=2, >> event=0x7fffd4dfb3b0) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_locked.c:787 >> #9 0x00007ffff4e213b6 in epoll_ctl (epfd=32, op=1, fd=34, >> event=0x7fffd4dfb3b0) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:2118 >> #10 0x00007ffff4e12f88 in vpphs_ep_ctl_ops (epFD=-1, proFD=34, ctl_ops=0, >> events=0x7fffd5190078, pdata=0x7fffd53f01d0) >> at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/dmm_vcl_adpt.c:48 >> #11 0x00007ffff7b4d502 in nsep_epctl_triggle (epi=0x7fffd5190018, >> info=0x7fffd53f01d0, triggle_ops=0) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/epoll/nstack_eventpoll.c:134 >> #12 0x00007ffff7b4de31 in nsep_insert_node (ep=0x7fffd50bffa8, >> event=0x7fffd4dfb5a0, fdInfo=0x7fffd53f01d0) >> at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/epoll/nstack_eventpoll.c:250 >> #13 0x00007ffff7b4e480 in nsep_epctl_add (ep=0x7fffd50bffa8, fd=22, >> events=0x7fffd4dfb5a0) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/epoll/nstack_eventpoll.c:294 >> #14 0x00007ffff7b44db0 in nstack_epoll_ctl (epfd=21, op=1, fd=22, >> event=0x7fffd4dfb630) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/nstack_socket.c:2499 >> #15 0x0000000000401e65 in process_server_msg_thread (pArgv=<optimized out>) >> at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/app_example/perf-test/multi_tcp_epoll_app_Ser.c:369 >> #16 0x00007ffff78ed6ba in start_thread (arg=0x7fffd4dff700) at >> pthread_create.c:333 >> #17 0x00007ffff741b41d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >> >> Thanks and Regards, >> Sharath. >> -=-=-=-=-=-=-=-=-=-=-=- >> Links: You receive all messages sent to this group. >> >> View/Reply Online (#12665): https://lists.fd.io/g/vpp-dev/message/12665 >> <https://lists.fd.io/g/vpp-dev/message/12665> >> Mute This Topic: https://lists.fd.io/mt/30819724/675152 >> <https://lists.fd.io/mt/30819724/675152> >> Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io> >> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub >> <https://lists.fd.io/g/vpp-dev/unsub> [fcoras.li...@gmail.com >> <mailto:fcoras.li...@gmail.com>] >> -=-=-=-=-=-=-=-=-=-=-=- > > <multi_tcp_epoll_app_Ser.c><multi_tcp_select_app_Ser.c>
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#12670): https://lists.fd.io/g/vpp-dev/message/12670 Mute This Topic: https://lists.fd.io/mt/30819724/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-