I have noticed that after I kill stuck ping, the process spawned with cap_init() remains. I cannot even kill it with SIGKILL. This is the output of procstat on such a process.
vasily 969 0.0 0.1 26428 6532 v0 I 22:43 0:00.00 ping vonbraun.local vasily 983 0.0 0.1 26428 6532 v0 I 22:43 0:00.00 ping resurrected.local vasily 1024 0.0 0.1 26428 6532 v0 I 22:49 0:00.00 ping resurrected.local vasily 1028 0.0 0.1 26428 6532 v0 I 22:49 0:00.00 ping resurrected.local root 1089 0.0 0.0 12976 2512 v1 S+ 22:58 0:00.01 grep ping PID TID COMM TDNAME KSTACK 1028 100579 ping - mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _sleep+0x2aa umtxq_sleep+0x19e do_lock_umutex+0x744 __umtx_op_wait_umutex+0x49 sys__umtx_op+0x7a amd64_syscall+0x12e fast_syscall_common+0xf8 I checked ZeroMQ on which my NSS module is based. It does not use pthread_atfork(), but uses lots of other unusual pthread functions, like pthread_setaffinity_np() or pthread_setschedparam(). Do not know if it matters. Also I do not quite understand when the code in my module is executed. It should be executed after the capsicumized sandbox is created, should it not? And I got hang in the process of creating the sandbox. So I do not understand how my code affects this process :) пт, 8 янв. 2021 г. в 18:45, Mark Johnston <ma...@freebsd.org>: > > On Wed, Jan 06, 2021 at 07:08:14PM +0300, Vasily Postnicov wrote: > > That's what I found. > > > > At first, ping calls cap_init() in capdns_setup(). cap_init() forks a > > process, then the parent returns and the child calls casper_main_loop(). > > The child and the parent both have a socket to communicate. > > casper_main_loop() calls zygote_init() and that one blocks on fork(). I do > > not know how it could be. How can fork() block? > > Does you module somehow use pthread_atfork()? > > > The parent process later calls cap_service_open() and that function calls > > cap_xfer_nvlist(). Because the child process is stuck somewhere in > > zygote_init() it never sends an nvlist back. So ping blocks. > > Can you show output from "procstat -kk <pid>" when this hang occurs? > > > All this is figured out by inserting printf()'s. LLDB refuses to run ping > > with 'error: Child exec failed'. > > Presumably it needs to be run as root since ping(8) is a setuid > executable. > > > вт, 5 янв. 2021 г. в 17:43, Mark Johnston <ma...@freebsd.org>: > > > > > On Tue, Jan 05, 2021 at 10:02:37AM +0300, Vasily Postnicov wrote: > > > > Hello. I wrote a simple daemon called ZeroDNS which provides > > > functionality > > > > similar to multicast DNS, namely it discovers other participating > > > machines > > > > over the LAN and stores their hostname and IPv4 address pairs. > > > > > > > > Here is a NSS module which allows the system to use information from > > > > that > > > > daemon: > > > > https://github.com/shamazmazum/nss-zero-dns > > > > > > > > You need to modify /etc/nsswitch.conf, changing the line 'hosts: files > > > dns' > > > > to 'hosts: files dns zerodns'. > > > > > > > > It all works on FreeBSD 12.2-RELEASE, but sometimes not on 13.0-CURRENT. > > > > For example, ping(8) just blocks when trying to ping a host whose name > > > > is > > > > resolvable with ZeroDNS. Turns out that programs built with casper > > > support > > > > (like ping(8) and some others) stop working with my NSS module (they > > > > just > > > > block trying to resolve the name). > > > > > > Presumably it's the casper process (i.e., cap_dns) that uses your > > > module? If the main ping process is blocked trying to resolve a name, > > > it's waiting for the cap_dns process - where exactly is it getting > > > stuck? > > > > > > > Is there some kind of manual on how to write casper-compatible NSS > > > modules? > > > _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"