On Wed, Oct 15, 2014 at 11:41:33AM +0200, el...@sentor.se wrote: > > Hi! > > Today the problem reoccurred. > I've now debugged the problem a little furter. > > I'm starting snort (as root). > > <<<lots of startup logs for pid 22646>>> > Oct 15 08:46:59 snort[22646]: Initializing daemon mode > Oct 15 08:46:59 snort[22648]: Daemon initialized, signaled parent > pid: 22646 > Oct 15 08:46:59 snort[22648]: Reload thread starting... > Oct 15 08:46:59 snort[22648]: Reload thread started, thread > 0x8146e8800 (22648) > End of log. > > Error! Nothing more happens with the snort process! > Normally it should continue and log these lines as well: > > > snort[nnn]: Decoding Ethernet > snort[nnn]: Checking PID path... > snort[nnn]: PID path stat checked out ok, PID path set to /var/run/ > snort[nnn]: Writing PID "7627" to file "/var/run//snort_mon0.pid" > snort[nnn]: Chroot directory = /usr/foobar/log > snort[nnn]: Set gid to 100 > snort[nnn]: Set uid to 100 > snort[nnn]: > snort[nnn]: --== Initialization Complete ==-- > snort[nnn]: Commencing packet processing (pid=nnn) > > > > > > When looking at this half-started snort process with 'ps', it looks > like this: > > ps faxulwwj 22648 > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > UID PPID CPU PRI NI MWCHAN PGID SID JOBC > root 22648 51.8 1.1 488552 179344 - Rs 8:46AM 53:06.52 > /usr/local/bin/s 0 1 0 88 0 - 22648 22648 0 > > > The process is still owned by root, so just as the missing log lines > are saying, it has not yet performed any change of uid/gid. > > > > > So there seem to be two questions. > > Q1) > What happens between "Reload thread started, thread 0x8146e8800 > (22648)" and "Decoding Ethernet"? > Apparently something goes wrong here on FreeBSD 10.0. > (this problem does not always occur, sometimes snort start just fine) >
Nobody can tell with this data. > Q2) > When the process has frozen in this half-started state, it can't be > killed even with a -9. Why? > Temporarily unkillable processes are standard, you can encounter them in BSDs, Solaris, Linux and I would not be surprised if Windows as well. In short, here and there the kernel does something blocking which is not supposed to fail. For instance in a lot of places FreeBSD kernel just waits for memory to be free when allocating. However, if stuff blocks indefinitely, we may be dealing with a bug. > > > > John-Mark asked me for some debugging info. Here it is: > > I now run 'kill 22648' on the above semi-started process: > > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME > COMMAND UID PPID CPU PRI NI MWCHAN PGID SID JOBC > old root 22648 51.8 1.1 488552 179344 - Rs 8:46AM 53:06.52 > /usr/local/bin/s 0 1 0 88 0 - 22648 22648 0 > new root 22648 52.3 1.1 488552 179344 - Rs 8:46AM 53:36.48 > /usr/local/bin/s 0 1 0 52 0 - 22648 22648 0 > > No change. > > > > kill -9 22648 > > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME > COMMAND UID PPID CPU PRI NI MWCHAN PGID SID JOBC > old root 22648 51.8 1.1 488552 179344 - Rs 8:46AM 53:06.52 > /usr/local/bin/s 0 1 0 88 0 - 22648 22648 0 > new root 22648 37.7 1.1 488552 179344 - Ts 8:46AM 53:50.87 > /usr/local/bin/s 0 1 0 52 0 - 22648 22648 0 > > Less CPU-usage and STAT changed to "Ts". > > > > > kill -CONT 22648 > > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME > COMMAND UID PPID CPU PRI NI MWCHAN PGID SID JOBC > old root 22648 51.8 1.1 488552 179344 - Rs 8:46AM 53:06.52 > /usr/local/bin/s 0 1 0 88 0 - 22648 22648 0 > new root 22648 0.0 1.1 488552 179344 - Ts 8:46AM 53:50.88 > /usr/local/bin/s 0 1 0 52 0 - 22648 22648 0 > > No change except cpu is down to 0. > > > I now start 'kgdb' > info threads > I found two threads for snort, doing a bt for both of them: > 372 Thread 100602 (PID=22648: snort) sched_switch > (td=0xfffff802c061f490, newtd=<value optimized out>, flags=<value > optimized out>) at /usr/src/sys/kern/sched_ule.c:1962 > 371 Thread 100598 (PID=22648: snort) sched_switch > (td=0xfffff80221857000, newtd=<value optimized out>, flags=<value > optimized out>) at /usr/src/sys/kern/sched_ule.c:1962 > thread 372 > [Switching to thread 372 (Thread 100602)]#0 sched_switch > (td=0xfffff802c061f490, newtd=<value optimized out>, flags=<value > optimized out>) at /usr/src/sys/kern/sched_ule.c:1962 > 1962 in /usr/src/sys/kern/sched_ule.c > bt > #0 sched_switch (td=0xfffff802c061f490, newtd=<value optimized out>, > flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1962 > #1 0xffffffff808b8c1e in mi_switch (flags=266, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:494 > #2 0xffffffff808c04b0 in thread_suspend_switch > (td=0xfffff802c061f490) at /usr/src/sys/kern/kern_thread.c:883 > #3 0xffffffff808c0276 in thread_single (mode=1) at > /usr/src/sys/kern/kern_thread.c:713 > #4 0xffffffff8087c1bb in exit1 (td=0xfffff802c061f490, rv=9) at > /usr/src/sys/kern/kern_exit.c:180 > #5 0xffffffff808b2faf in sigexit (td=<value optimized out>, > sig=<value optimized out>) at /usr/src/sys/kern/kern_sig.c:2935 > #6 0xffffffff808b3669 in postsig (sig=<value optimized out>) at > /usr/src/sys/kern/kern_sig.c:2822 > #7 0xffffffff808f6f57 in ast (framep=<value optimized out>) at > /usr/src/sys/kern/subr_trap.c:271 > #8 0xffffffff80c75870 in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:416 > #9 0x0000000801d6f19a in ?? () > Previous frame inner to this frame (corrupt stack?) > > > thread 371 > [Switching to thread 371 (Thread 100598)]#0 sched_switch > (td=0xfffff80221857000, newtd=<value optimized out>, flags=<value > optimized out>) at /usr/src/sys/kern/sched_ule.c:1962 > 1962 in /usr/src/sys/kern/sched_ule.c > bt > #0 sched_switch (td=0xfffff80221857000, newtd=<value optimized out>, > flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1962 > #1 0xffffffff808b8c1e in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:494 > #2 0xffffffff808f2e3a in sleepq_wait (wchan=0x0, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:620 > #3 0xffffffff80864aad in _cv_wait (cvp=0xffffffff8147a500, > lock=0xffffffff8147a480) at /usr/src/sys/kern/kern_condvar.c:139 Here your thread got blocked trying to acquire a mutex. You can find the owner by inspecting vm (vm=0xffffffff8147a480) -> vm_lock -> mtx_lock. You may need to cast along the way. Roughly speaking (untested): f 4 p (struct vmem *)0xffffffff8147a480)->vm_lock->mtx_lock > #4 0xffffffff808fb05f in vmem_xalloc (vm=0xffffffff8147a480, > size0=<value optimized out>, align=<value optimized out>, phase=0, > nocross=<value optimized out>, minaddr=0, > maxaddr=18446735286768857088, flags=8194, addrp=<value optimized > out>) at /usr/src/sys/kern/subr_vmem.c:1196 > #5 0xffffffff808fae6b in vmem_alloc (vm=0x0, size=0, flags=<value > optimized out>, addrp=0xfffffe0466e1d6e8) at > /usr/src/sys/kern/subr_vmem.c:1082 > #6 0xffffffff80b0fa58 in kmem_malloc (vmem=0xffffffff8147a480, > size=2139729920, flags=2) at /usr/src/sys/vm/vm_kern.c:314 > #7 0xffffffff80b08dfb in uma_large_malloc (size=<value optimized > out>, wait=2) at /usr/src/sys/vm/uma_core.c:1006 > #8 0xffffffff80898cf3 in malloc (size=2139729920, > mtp=0xffffffff813a0450, flags=0) at > /usr/src/sys/kern/kern_malloc.c:520 > #9 0xffffffff8096307b in bpf_buffer_ioctl_sblen > (d=0xfffff80159ea9000, i=<value optimized out>) at > /usr/src/sys/net/bpf_buffer.c:183 > #10 0xffffffff80960a3c in bpfioctl (dev=0x0, cmd=<value optimized > out>, addr=0xfffff801fbd06b40 "", flags=0, td=0xfffff80221857000) at > /usr/src/sys/net/bpf.c:408 > #11 0xffffffff807ac1df in devfs_ioctl_f (fp=0xfffff8002b3d9d20, > com=3221504614, data=0xfffff801fbd06b40, cred=<value optimized out>, > td=0xfffff80221857000) at /usr/src/sys/fs/devfs/devfs_vnops.c:757 > #12 0xffffffff808fdfae in kern_ioctl (td=0xfffff80221857000, > fd=<value optimized out>, com=0) at file.h:319 > #13 0xffffffff808fdd2f in sys_ioctl (td=0xfffff80221857000, > uap=0xfffffe0466e1da40) at /usr/src/sys/kern/sys_generic.c:702 > #14 0xffffffff80c8f117 in amd64_syscall (td=0xfffff80221857000, > traced=0) at subr_syscall.c:134 > #15 0xffffffff80c7580b in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:391 > #16 0x0000000801d8f08a in ?? () > Previous frame inner to this frame (corrupt stack?) > > -- Mateusz Guzik <mjguzik gmail.com> _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"