Hi Sarath,
> On Mar 29, 2019, at 8:47 PM, Sharath Kumar > <sharathkumarboyanapa...@gmail.com> wrote: > > Hi Florin, > > The patch doesn't fix any of the issues with epoll/select. > > The usecase is like below > 1.Main thread calls epoll_create. > 2.One of the non-main threads calls epoll_ctl. > 3.And another non-main thread calls epoll_wait. > > All the 3 threads above operate on a single epoll fd. > > Is this usecase supported? No, that usecase is not supported. > > How to register these non-main threads [2 and 3] as workers with vcl? > I am a newbie to VPP, I have no idea about this. Can you give me some input > on this? > > Would registering these non-main threads [2,3] as workers with vcl resolve my > problem? > Did you mean LDP doesn't support this kind of registration? There are several layers that help integrating applications with vpp’s host stack: - VCL(VPPCOM library): it facilitates the interaction with the session layer in vpp by exposing a set of apis that are similar to POSIX socket apis. That means applications don’t have to interact with vpp’s binary api, don’t have to directly work with shared memory fifos and more importantly they get implicit support for async communication mechanisms like epoll/select. For performance reasons, VCL avoids as much as possible locking. As as result, it doesn’t allow sharing of sessions (or session handles/fds from app perspective) between app threads or processes (in case the apps fork). However, if they need more workers for performance reasons, applications can register their worker threads with vcl (see vppcom_worker_register). Sessions cannot be shared between workers but each worker can have its own epoll/select loop. - VLS (VCL locked sessions): as the name suggests, it employs a set of locks that allow: 1) multi threaded apps that have one dispatcher thread and N ‘worker’ threads to transparently work with vcl. In this scenario, vcl “sees” only one worker. Expectation is that only the dispatcher thread (main thread) interacts with epoll/select. 2) multi-process apps to work with vcl, but for that it employs additional logic when applications fork. Every child/forked process is registered with vcl by vls, so vcl sees more workers. - LDP (LD_PRELOAD): this is a shim that intercepts network related syscalls and redirects them into vls. Its goal is to have applications work unchanged with vpp's host stack. Since there are no POSIX apis for registering workers with the kernel, ldp cannot register app workers with vls/vcl. As far as I can tell, you’re running ldp through dmm. Thus, to support your usecase, you’d probably have to change your app to directly work with vls or vcl. Hope this helps, Florin > > Thanks, > Sharath. > > > On Sat 30 Mar, 2019, 1:27 AM Florin Coras, <fcoras.li...@gmail.com > <mailto:fcoras.li...@gmail.com>> wrote: > Just so I understand, does the patch not fix the epoll issues or does it fix > the issues but it doesn’t fix select, which apparently crashes in a different > way. > > Second, what is your usecase/app? Are you actually trying to share > epoll/select between multiple threads? That is, multiple threads might want > to call epoll_wait/select at the same time? That is not supported. The > implicit assumption is that only the dispatcher thread is to call the two > functions the rest of the threads do only io work. > > If all the threads must handle async communication via epoll/select, then > they should register themselves as workers with vcl and get their own epoll > fd. LDP does not support that. > > Florin > >> On Mar 29, 2019, at 12:13 PM, Sharath Kumar >> <sharathkumarboyanapa...@gmail.com >> <mailto:sharathkumarboyanapa...@gmail.com>> wrote: >> >> No, it doesn't work. >> >> Attaching the applications being used. >> >> "Select" also has similar kind of issue when called from non-main thread >> >> Thread 9 "nstack_select" received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0x7fffd77fe700 (LWP 63170)] >> 0x00007ffff4e1d032 in ldp_select_init_maps (original=0x7fffbc0008c0, >> resultb=0x7fffe002e514, libcb=0x7fffe002e544, vclb=0x7fffe002e52c, nfds=34, >> minbits=64, n_bytes=5, si_bits=0x7fffd77fdc20, >> libc_bits=0x7fffd77fdc28) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:601 >> 601 clib_bitmap_validate (*vclb, minbits); >> (gdb) bt >> #0 0x00007ffff4e1d032 in ldp_select_init_maps (original=0x7fffbc0008c0, >> resultb=0x7fffe002e514, libcb=0x7fffe002e544, vclb=0x7fffe002e52c, nfds=34, >> minbits=64, n_bytes=5, si_bits=0x7fffd77fdc20, >> libc_bits=0x7fffd77fdc28) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:601 >> #1 0x00007ffff4e1db47 in ldp_pselect (nfds=34, readfds=0x7fffbc0008c0, >> writefds=0x7fffbc000cd0, exceptfds=0x7fffbc0010e0, timeout=0x7fffd77fdcb0, >> sigmask=0x0) >> at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:723 >> #2 0x00007ffff4e1e5d5 in select (nfds=34, readfds=0x7fffbc0008c0, >> writefds=0x7fffbc000cd0, exceptfds=0x7fffbc0010e0, timeout=0x7fffd77fdd20) >> at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:857 >> #3 0x00007ffff7b4c42a in nstack_select_thread (arg=0x0) at >> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/select/nstack_select.c:651 >> #4 0x00007ffff78ed6ba in start_thread (arg=0x7fffd77fe700) at >> pthread_create.c:333 >> #5 0x00007ffff741b41d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >> >> >> Before https://gerrit.fd.io/r/#/c/18597/ <https://gerrit.fd.io/r/#/c/18597/> >> I have tried to fix the issue. >> >> The below changes fixed epoll_wait and epoll_ctl issues for me.[doesn't >> include the changes of https://gerrit.fd.io/r/#/c/18597/ >> <https://gerrit.fd.io/r/#/c/18597/>] >> >> diff --git a/src/vcl/vcl_locked.c b/src/vcl/vcl_locked.c >> index fb19b5d..e6c891b 100644 >> --- a/src/vcl/vcl_locked.c >> +++ b/src/vcl/vcl_locked.c >> @@ -564,7 +564,10 @@ vls_attr (vls_handle_t vlsh, uint32_t op, void *buffer, >> uint32_t * buflen) >> >> if (!(vls = vls_get_w_dlock (vlsh))) >> return VPPCOM_EBADFD; >> + >> + vls_mt_guard (0, VLS_MT_OP_XPOLL); >> rv = vppcom_session_attr (vls_to_sh_tu (vls), op, buffer, buflen); >> + vls_mt_unguard (); >> vls_get_and_unlock (vlsh); >> return rv; >> } >> @@ -773,8 +776,10 @@ vls_epoll_ctl (vls_handle_t ep_vlsh, int op, >> vls_handle_t vlsh, >> vls_table_rlock (); >> ep_vls = vls_get_and_lock (ep_vlsh); >> vls = vls_get_and_lock (vlsh); >> + vls_mt_guard (0, VLS_MT_OP_XPOLL); >> ep_sh = vls_to_sh (ep_vls); >> sh = vls_to_sh (vls); >> + vls_mt_unguard (); >> >> if (PREDICT_FALSE (!vlsl->epoll_mp_check)) >> vls_epoll_ctl_mp_checks (vls, op); >> >> Thanks, >> Sharath. >> >> On Fri, Mar 29, 2019 at 9:15 PM Florin Coras <fcoras.li...@gmail.com >> <mailto:fcoras.li...@gmail.com>> wrote: >> Interesting. What application are you running and does this [1] fix the >> issue for you? >> >> In short, many of vls’ apis check if the call is coming in on a new pthread >> and program vcl accordingly if yes. The patch makes sure vls_attr does that >> as well. >> >> Thanks, >> Florin >> >> [1] https://gerrit.fd.io/r/#/c/18597/ <https://gerrit.fd.io/r/#/c/18597/> >> >>> On Mar 29, 2019, at 4:29 AM, Dave Barach via Lists.Fd.Io >>> <http://lists.fd.io/> <dbarach=cisco....@lists.fd.io >>> <mailto:dbarach=cisco....@lists.fd.io>> wrote: >>> >>> For whatever reason, the vls layer received an event notification which >>> didn’t end well. vcl_worker_get (wrk_index=4294967295) [aka 0xFFFFFFFF] >>> will never work. >>> >>> I’ll let Florin comment further. He’s in the PDT time zone, so don’t expect >>> to hear from him for a few hours. >>> >>> D. >>> >>> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io >>> <mailto:vpp-dev@lists.fd.io>> On Behalf Of sharath kumar >>> Sent: Friday, March 29, 2019 12:18 AM >>> To: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>; csit-...@lists.fd.io >>> <mailto:csit-...@lists.fd.io> >>> Subject: [vpp-dev] multi-threaded application, "epoll_wait" and "epoll_ctl" >>> have "received signal SIGABRT, Aborted". >>> >>> Hello all, >>> >>> I am a newbie to VPP. >>> >>> I am trying to run VPP with a multi-threaded application. >>> "recv" works fine from non-main threads, >>> whereas "epoll_wait" and "epoll_ctl" have "received signal SIGABRT, >>> Aborted". >>> >>> Is this a known issue? >>> Or am I doing something wrong? >>> >>> Attaching backtrace for "epoll_wait" and "epoll_ctl" >>> >>> Thread 9 "dmm_vcl_epoll" received signal SIGABRT, Aborted. >>> [Switching to Thread 0x7fffd67fe700 (LWP 56234)] >>> 0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at >>> ../sysdeps/unix/sysv/linux/raise.c:54 >>> 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >>> (gdb) bt >>> #0 0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at >>> ../sysdeps/unix/sysv/linux/raise.c:54 >>> #1 0x00007ffff734b02a in __GI_abort () at abort.c:89 >>> #2 0x00007ffff496d873 in os_panic () at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/unix-misc.c:176 >>> #3 0x00007ffff48ce42c in debugger () at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:84 >>> #4 0x00007ffff48ce864 in _clib_error (how_to_die=2, function_name=0x0, >>> line_number=0, fmt=0x7ffff4bfe0e0 "%s:%d (%s) assertion `%s' fails") >>> at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:143 >>> #5 0x00007ffff4bcca7d in vcl_worker_get (wrk_index=4294967295) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:540 >>> #6 0x00007ffff4bccabe in vcl_worker_get_current () at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:554 >>> #7 0x00007ffff4bd7c49 in vppcom_session_attr (session_handle=4278190080, >>> op=6, buffer=0x0, buflen=0x0) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vppcom.c:2606 >>> #8 0x00007ffff4bfc7fd in vls_attr (vlsh=0, op=6, buffer=0x0, buflen=0x0) >>> at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_locked.c:569 >>> #9 0x00007ffff4e21736 in ldp_epoll_pwait (epfd=32, events=0x7fffd67fad20, >>> maxevents=1024, timeout=100, sigmask=0x0) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:2203 >>> #10 0x00007ffff4e21948 in epoll_wait (epfd=32, events=0x7fffd67fad20, >>> maxevents=1024, timeout=100) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:2257 >>> #11 0x00007ffff4e13041 in dmm_vcl_epoll_thread (arg=0x0) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/dmm_vcl_adpt.c:75 >>> #12 0x00007ffff78ed6ba in start_thread (arg=0x7fffd67fe700) at >>> pthread_create.c:333 >>> #13 0x00007ffff741b41d in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >>> >>> >>> >>> >>> Thread 11 "vs_epoll" received signal SIGABRT, Aborted. >>> 0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at >>> ../sysdeps/unix/sysv/linux/raise.c:54 >>> 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >>> (gdb) bt >>> #0 0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at >>> ../sysdeps/unix/sysv/linux/raise.c:54 >>> #1 0x00007ffff734b02a in __GI_abort () at abort.c:89 >>> #2 0x00007ffff496d873 in os_panic () at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/unix-misc.c:176 >>> #3 0x00007ffff48ce42c in debugger () at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:84 >>> #4 0x00007ffff48ce864 in _clib_error (how_to_die=2, function_name=0x0, >>> line_number=0, fmt=0x7ffff4bfe1a0 "%s:%d (%s) assertion `%s' fails") >>> at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:143 >>> #5 0x00007ffff4bcca7d in vcl_worker_get (wrk_index=4294967295) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:540 >>> #6 0x00007ffff4bccabe in vcl_worker_get_current () at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:554 >>> #7 0x00007ffff4bd597a in vppcom_epoll_ctl (vep_handle=4278190080, op=1, >>> session_handle=4278190082, event=0x7fffd4dfb3b0) >>> at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vppcom.c:2152 >>> #8 0x00007ffff4bfd061 in vls_epoll_ctl (ep_vlsh=0, op=1, vlsh=2, >>> event=0x7fffd4dfb3b0) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_locked.c:787 >>> #9 0x00007ffff4e213b6 in epoll_ctl (epfd=32, op=1, fd=34, >>> event=0x7fffd4dfb3b0) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:2118 >>> #10 0x00007ffff4e12f88 in vpphs_ep_ctl_ops (epFD=-1, proFD=34, ctl_ops=0, >>> events=0x7fffd5190078, pdata=0x7fffd53f01d0) >>> at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/dmm_vcl_adpt.c:48 >>> #11 0x00007ffff7b4d502 in nsep_epctl_triggle (epi=0x7fffd5190018, >>> info=0x7fffd53f01d0, triggle_ops=0) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/epoll/nstack_eventpoll.c:134 >>> #12 0x00007ffff7b4de31 in nsep_insert_node (ep=0x7fffd50bffa8, >>> event=0x7fffd4dfb5a0, fdInfo=0x7fffd53f01d0) >>> at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/epoll/nstack_eventpoll.c:250 >>> #13 0x00007ffff7b4e480 in nsep_epctl_add (ep=0x7fffd50bffa8, fd=22, >>> events=0x7fffd4dfb5a0) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/epoll/nstack_eventpoll.c:294 >>> #14 0x00007ffff7b44db0 in nstack_epoll_ctl (epfd=21, op=1, fd=22, >>> event=0x7fffd4dfb630) at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/nstack_socket.c:2499 >>> #15 0x0000000000401e65 in process_server_msg_thread (pArgv=<optimized out>) >>> at >>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/app_example/perf-test/multi_tcp_epoll_app_Ser.c:369 >>> #16 0x00007ffff78ed6ba in start_thread (arg=0x7fffd4dff700) at >>> pthread_create.c:333 >>> #17 0x00007ffff741b41d in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >>> >>> Thanks and Regards, >>> Sharath. >>> -=-=-=-=-=-=-=-=-=-=-=- >>> Links: You receive all messages sent to this group. >>> >>> View/Reply Online (#12665): https://lists.fd.io/g/vpp-dev/message/12665 >>> <https://lists.fd.io/g/vpp-dev/message/12665> >>> Mute This Topic: https://lists.fd.io/mt/30819724/675152 >>> <https://lists.fd.io/mt/30819724/675152> >>> Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io> >>> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub >>> <https://lists.fd.io/g/vpp-dev/unsub> [fcoras.li...@gmail.com >>> <mailto:fcoras.li...@gmail.com>] >>> -=-=-=-=-=-=-=-=-=-=-=- >> >> <multi_tcp_epoll_app_Ser.c><multi_tcp_select_app_Ser.c> >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#12678): https://lists.fd.io/g/vpp-dev/message/12678 Mute This Topic: https://lists.fd.io/mt/30819724/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-