Hi Hanlin, Thanks to Dave, we can now have per thread binary api connections to vpp. I’ve updated the socket client and vcl to leverage this so, after [1] we have per vcl worker thread binary api sockets that are used to exchange fds.
Let me know if you’re still hitting the issue. Regards, Florin [1] https://gerrit.fd.io/r/c/vpp/+/23687 > On Nov 22, 2019, at 10:30 AM, Florin Coras <fcoras.li...@gmail.com> wrote: > > Hi Hanlin, > > Okay, that’s a different issue. The expectation is that each vcl worker has a > different binary api transport into vpp. This assumption holds for > applications with multiple process workers (like nginx) but is not completely > satisfied for applications with thread workers. > > Namely, for each vcl worker we connect over the socket api to vpp and > initialize the shared memory transport (so binary api messages are delivered > over shared memory instead of the socket). However, as you’ve noted, the > socket client is currently not multi-thread capable, consequently we have an > overlap of socket client fds between the workers. The first segment is > assigned properly but the subsequent ones will fail in this scenario. > > I wasn’t aware of this so we’ll have to either fix the socket binary api > client, for multi-threaded apps, or change the session layer to use different > fds for exchanging memfd fds. > > Regards, > Florin > >> On Nov 21, 2019, at 11:47 PM, wanghanlin <wanghan...@corp.netease.com >> <mailto:wanghan...@corp.netease.com>> wrote: >> >> Hi Florin, >> Regarding 3), I think main problem maybe in function >> vl_socket_client_recv_fd_msg called by vcl_session_app_add_segment_handler. >> Mutiple worker threads share the same scm->client_socket.fd, so B2 may >> receive the segment memfd belong to A1. >> >> >> Regards, >> Hanlin >> >> >> wanghanlin >> >> wanghan...@corp.netease.com >> >> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=wanghanlin&uid=wanghanlin%40corp.netease.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22wanghanlin%40corp.netease.com%22%5D&logoUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png> >> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >> On 11/22/2019 01:44,Florin Coras<fcoras.li...@gmail.com> >> <mailto:fcoras.li...@gmail.com> wrote: >> Hi Hanlin, >> >> As Jon pointed out, you may want to register with gerrit. >> >> You comments with respect to points 1) and 2) are spot on. I’ve updated the >> patch to fix them. >> >> Regarding 3), if I understood your scenario correctly, it should not happen. >> The ssvm infra forces applications to map segments at fixed addresses. That >> is, for the scenario you’re describing lower, if B2 is processed first, >> ssvm_slave_init_memfd will map the segment at A2. Note how we first map the >> segment to read the shared header (sh) and then use sh->ssvm_va (which >> should be A2) to remap the segment at a fixed virtual address (va). >> >> Regards, >> Florin >> >>> On Nov 21, 2019, at 2:49 AM, wanghanlin <wanghan...@corp.netease.com >>> <mailto:wanghan...@corp.netease.com>> wrote: >>> >>> Hi Florin, >>> I have applied the patch, and found some problems in my case. I have not >>> right to post it in gerrit, so I post here. >>> 1)evt->event_type should be set with SESSION_CTRL_EVT_APP_DEL_SEGMENT >>> rather than SESSION_CTRL_EVT_APP_ADD_SEGMENT. File: >>> src/vnet/session/session_api.c, Line: 561, Function:mq_send_del_segment_cb >>> 2)session_send_fds may been called in the end of function >>> mq_send_add_segment_cb, otherwise lock of app_mq can't been free here.File: >>> src/vnet/session/session_api.c, Line: 519, Function:mq_send_add_segment_cb >>> 3) When vcl_segment_attach called in each worker thread, then >>> ssvm_slave_init_memfd can been called in each worker thread and then >>> ssvm_slave_init_memfd map address sequentially through map segment once in >>> advance. It's OK in only one thread, but maybe wrong in multiple worker >>> threads. Suppose following scene: VPP allocate segment at address A1 and >>> notify worker thread B1 to expect B1 also map segment at address A1, and >>> simultaneously VPP allocate segment at address A2 and notify worker thread >>> B2 to expect B2 map segment at address A2. If B2 first process notify >>> message, then ssvm_slave_init_memfd may map segment at address A1. Maybe >>> VPP can add segment map address in notify message, and then worker thread >>> just map segment at this address. >>> >>> Regards, >>> Hanlin >>> >>> wanghanlin >>> >>> wanghan...@corp.netease.com >>> >>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=wanghanlin&uid=wanghanlin%40corp.netease.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22wanghanlin%40corp.netease.com%22%5D&logoUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png> >>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >>> On 11/19/2019 09:50,wanghanlin<wanghan...@corp.netease.com> >>> <mailto:wanghan...@corp.netease.com> wrote: >>> Hi Florin, >>> VPP vsersion is v19.08. >>> I'll apply this patch and check it. Thanks a lot! >>> >>> Regards, >>> Hanlin >>> >>> wanghanlin >>> >>> wanghan...@corp.netease.com >>> >>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=wanghanlin&uid=wanghanlin%40corp.netease.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22wanghanlin%40corp.netease.com%22%5D&logoUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png> >>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >>> On 11/16/2019 00:50,Florin Coras<fcoras.li...@gmail.com> >>> <mailto:fcoras.li...@gmail.com> wrote: >>> Hi Hanlin, >>> >>> Just to make sure, are you running master or some older VPP? >>> >>> Regarding the issue you could be hitting lower, here’s [1] a patch that I >>> have not yet pushed for merging because it leads to api changes for >>> applications that directly use the session layer application interface >>> instead of vcl. I haven’t tested it extensively, but the goal with it is to >>> signal segment allocation/deallocation over the mq instead of the binary >>> api. >>> >>> Finally, I’ve never tested LDP with Envoy, so not sure if that works >>> properly. There’s ongoing work to integrate Envoy with VCL, so you may want >>> to get in touch with the authors. >>> >>> Regards, >>> Florin >>> >>> [1] https://gerrit.fd.io/r/c/vpp/+/21497 >>> <https://gerrit.fd.io/r/c/vpp/+/21497> >>> >>>> On Nov 15, 2019, at 2:26 AM, wanghanlin <wanghan...@corp.netease.com >>>> <mailto:wanghan...@corp.netease.com>> wrote: >>>> >>>> hi ALL, >>>> I accidentally got following crash stack when I used VCL with hoststack >>>> and memfd. But corresponding invalid rx_fifo address (0x2f42e2480) is >>>> valid in VPP process and also can be found in /proc/map. That is, shared >>>> memfd segment memory is not consistent between hoststack app and VPP. >>>> Generally, VPP allocate/dealloc the memfd segment and then notify >>>> hoststack app to attach/detach. But If just after VPP dealloc memfd >>>> segment and notify hoststack app, and then VPP allocate same memfd segment >>>> at once because of session connected, and then what happened now? Because >>>> hoststack app process dealloc message and connected message with diffrent >>>> threads, maybe rx_thread_fn just detach the memfd segment and not attach >>>> the same memfd segment, then unfortunately worker thread get the connected >>>> message. >>>> >>>> These are just my guess, maybe I misunderstand. >>>> >>>> (gdb) bt >>>> #0 0x00007f7cde21ffbf in raise () from >>>> /lib/x86_64-linux-gnu/libpthread.so.0 >>>> #1 0x0000000001190a64 in Envoy::SignalAction::sigHandler (sig=11, >>>> info=<optimized out>, context=<optimized out>) at >>>> source/common/signal/signal_action.cc:73 <http://signal_action.cc:73/> >>>> #2 <signal handler called> >>>> #3 0x00007f7cddc2e85e in vcl_session_connected_handler >>>> (wrk=0x7f7ccd4bad00, mp=0x224052f4a) at >>>> /home/wanghanlin/vpp-new/src/vcl/vppcom.c:471 >>>> #4 0x00007f7cddc37fec in vcl_epoll_wait_handle_mq_event >>>> (wrk=0x7f7ccd4bad00, e=0x224052f48, events=0x395000c, >>>> num_ev=0x7f7cca49e5e8) >>>> at /home/wanghanlin/vpp-new/src/vcl/vppcom.c:2658 >>>> #5 0x00007f7cddc3860d in vcl_epoll_wait_handle_mq (wrk=0x7f7ccd4bad00, >>>> mq=0x224042480, events=0x395000c, maxevents=63, wait_for_time=0, >>>> num_ev=0x7f7cca49e5e8) >>>> at /home/wanghanlin/vpp-new/src/vcl/vppcom.c:2762 >>>> #6 0x00007f7cddc38c74 in vppcom_epoll_wait_eventfd (wrk=0x7f7ccd4bad00, >>>> events=0x395000c, maxevents=63, n_evts=0, wait_for_time=0) >>>> at /home/wanghanlin/vpp-new/src/vcl/vppcom.c:2823 >>>> #7 0x00007f7cddc393a0 in vppcom_epoll_wait (vep_handle=33554435, >>>> events=0x395000c, maxevents=63, wait_for_time=0) at >>>> /home/wanghanlin/vpp-new/src/vcl/vppcom.c:2880 >>>> #8 0x00007f7cddc5d659 in vls_epoll_wait (ep_vlsh=3, events=0x395000c, >>>> maxevents=63, wait_for_time=0) at >>>> /home/wanghanlin/vpp-new/src/vcl/vcl_locked.c:895 >>>> #9 0x00007f7cdeb4c252 in ldp_epoll_pwait (epfd=67, events=0x3950000, >>>> maxevents=64, timeout=32, sigmask=0x0) at >>>> /home/wanghanlin/vpp-new/src/vcl/ldp.c:2334 >>>> #10 0x00007f7cdeb4c334 in epoll_wait (epfd=67, events=0x3950000, >>>> maxevents=64, timeout=32) at /home/wanghanlin/vpp-new/src/vcl/ldp.c:2389 >>>> #11 0x0000000000fc9458 in epoll_dispatch () >>>> #12 0x0000000000fc363c in event_base_loop () >>>> #13 0x0000000000c09b1c in Envoy::Server::WorkerImpl::threadRoutine >>>> (this=0x357d8c0, guard_dog=...) at source/server/worker_impl.cc:104 >>>> <http://worker_impl.cc:104/> >>>> #14 0x0000000001193485 in std::function<void ()>::operator()() const >>>> (this=0x7f7ccd4b8544) >>>> at >>>> /usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:706 >>>> #15 Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::function<void >>>> ()>)::$_0::operator()(void*) const (this=<optimized out>, arg=0x2f42e2480) >>>> at source/common/common/posix/thread_impl.cc:33 >>>> <http://thread_impl.cc:33/> >>>> #16 Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::function<void >>>> ()>)::$_0::__invoke(void*) (arg=0x2f42e2480) at >>>> source/common/common/posix/thread_impl.cc:32 <http://thread_impl.cc:32/> >>>> #17 0x00007f7cde2164a4 in start_thread () from >>>> /lib/x86_64-linux-gnu/libpthread.so.0 >>>> #18 0x00007f7cddf58d0f in clone () from /lib/x86_64-linux-gnu/libc.so.6 >>>> (gdb) f 3 >>>> #3 0x00007f7cddc2e85e in vcl_session_connected_handler >>>> (wrk=0x7f7ccd4bad00, mp=0x224052f4a) at >>>> /home/wanghanlin/vpp-new/src/vcl/vppcom.c:471 >>>> 471 rx_fifo->client_session_index = session_index; >>>> (gdb) p rx_fifo >>>> $1 = (svm_fifo_t *) 0x2f42e2480 >>>> (gdb) p *rx_fifo >>>> Cannot access memory at address 0x2f42e2480 >>>> (gdb) >>>> >>>> >>>> Regards, >>>> Hanlin >>>> >>>> wanghanlin >>>> >>>> wanghan...@corp.netease.com >>>> >>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=wanghanlin&uid=wanghanlin%40corp.netease.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22wanghanlin%40corp.netease.com%22%5D&logoUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png> >>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >>>> -=-=-=-=-=-=-=-=-=-=-=- >>>> Links: You receive all messages sent to this group. >>>> >>>> View/Reply Online (#14604): https://lists.fd.io/g/vpp-dev/message/14604 >>>> <https://lists.fd.io/g/vpp-dev/message/14604> >>>> Mute This Topic: https://lists.fd.io/mt/59126583/675152 >>>> <https://lists.fd.io/mt/59126583/675152> >>>> Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io> >>>> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub >>>> <https://lists.fd.io/g/vpp-dev/unsub> [fcoras.li...@gmail.com >>>> <mailto:fcoras.li...@gmail.com>] >>>> -=-=-=-=-=-=-=-=-=-=-=- >>> >>> -=-=-=-=-=-=-=-=-=-=-=- >>> Links: You receive all messages sent to this group. >>> >>> View/Reply Online (#14654): https://lists.fd.io/g/vpp-dev/message/14654 >>> <https://lists.fd.io/g/vpp-dev/message/14654> >>> Mute This Topic: https://lists.fd.io/mt/59126583/675152 >>> <https://lists.fd.io/mt/59126583/675152> >>> Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io> >>> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub >>> <https://lists.fd.io/g/vpp-dev/unsub> [fcoras.li...@gmail.com >>> <mailto:fcoras.li...@gmail.com>] >>> -=-=-=-=-=-=-=-=-=-=-=- >> >> -=-=-=-=-=-=-=-=-=-=-=- >> Links: You receive all messages sent to this group. >> >> View/Reply Online (#14666): https://lists.fd.io/g/vpp-dev/message/14666 >> <https://lists.fd.io/g/vpp-dev/message/14666> >> Mute This Topic: https://lists.fd.io/mt/59126583/675152 >> <https://lists.fd.io/mt/59126583/675152> >> Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io> >> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub >> <https://lists.fd.io/g/vpp-dev/unsub> [fcoras.li...@gmail.com >> <mailto:fcoras.li...@gmail.com>] >> -=-=-=-=-=-=-=-=-=-=-=- >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14872): https://lists.fd.io/g/vpp-dev/message/14872 Mute This Topic: https://lists.fd.io/mt/59126583/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-