Unresponsive jails issues
I have a server running 13 jails for various system services. Recently I added two jails to run simple go applications for testing. They open a network socket and nginx, which is in another jail, and which round robin balances requests to them. I mention that because it may be related, however not necessarily because it was happening earlier. The problem is that every 2-3 days jails in my servers stop responding. "jexec jailname tcsh" hangs forever, "service jail stop jailname" hangs forever as well. "top" doesn't show anything suspicious. I can login through SSH to the main server fine. I don't login to jails through SSH so I can't check but it seems that when that happens they stop responding because the services that are running in them stop too (e.g. web server, imap, ...). I tried to "kill -9" the "jexec" process that hangs but that doesn't work. My first question is what evidence should I gather when that happens so that I can investigate the issue later on after the server is restarted? And the second question, any idea why that might be happening in the first place? I am running FreeBSD 10.3 AMD64 updated from 10.2 a couple of weeks ago. Grzegorz ___ freebsd-jail@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-jail To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"
Re: Unresponsive jails issues
> On 16 May 2016, at 12:55 , Grzegorz Junka wrote: > > I have a server running 13 jails for various system services. Recently I > added two jails to run simple go applications for testing. They open a > network socket and nginx, which is in another jail, and which round robin > balances requests to them. I mention that because it may be related, however > not necessarily because it was happening earlier. > > The problem is that every 2-3 days jails in my servers stop responding. > "jexec jailname tcsh" hangs forever, "service jail stop jailname" hangs > forever as well. "top" doesn't show anything suspicious. I can login through > SSH to the main server fine. I don't login to jails through SSH so I can't > check but it seems that when that happens they stop responding because the > services that are running in them stop too (e.g. web server, imap, ...). I > tried to "kill -9" the "jexec" process that hangs but that doesn't work. > > My first question is what evidence should I gather when that happens so that > I can investigate the issue later on after the server is restarted? > > And the second question, any idea why that might be happening in the first > place? > > I am running FreeBSD 10.3 AMD64 updated from 10.2 a couple of weeks ago. If you can log into the base system and issue commands there; try to see what procstat (-k) thinks about various jailed processes. You could also check ps axl for the WCHAN and see if anything suspicious shows up. /bz ___ freebsd-jail@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-jail To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"
Jails and unionfs
I have been using unionfs to host jails for quite a while now and in general they work as expected, apart from three issues. The setup is as below (example for one jail dev2): _*jail.conf*_ exec.start = "/bin/sh /etc/rc"; exec.stop = "/bin/sh /etc/rc.shutdown"; exec.clean; mount.devfs; mount.fstab = "/usr/local/etc/fstab/$name"; devfs_ruleset = 4; path = "/j/$name"; host.hostname = "$name.*myhost*.*mydomain*.com"; exec.consolelog = "/var/log/jail/$name"; dev2 { ip4.addr = 192.168.1.71; interface = lagg0; } _*/usr/local/etc/fstab/dev2*_ /j/_ro3 /j/dev2 nullfs ro 0 0 /j/_dev2 /j/dev2 unionfs rw,noatime 0 0 devfs/j/dev2/dev devfs rw,ruleset=4 0 0 _*df gives*_ tank1/j/_dev2 1198584120 131255 1198452864 0% /j/_dev2 /j/_ro3 1198722545 269680 1198452864 0%/j/dev2 :/j/_dev2 2397306665 1198853800 119845286450%/j/dev2 devfs 1 1 0 100%/j/dev2/dev devfs 1 1 0 100%/j/dev2/dev _*zfs list | grep dev2*_ tank1/j/_dev2 128M 1.12T 128M /j/_dev2 As can be seen I need to mount devfs twice, once in jail.conf and once in the jail's fstab, otherwise it isn't mounted at all. That's the first (smaller) issue. The second issue is that the disks are not mounted/unmounted automatically when I start/stop the jail. To make sure that all disks are mounted properly after starting a jail I need to: mount -F /usr/local/etc/fstab/dev2 -a When stopping the jail sometimes the disks are unmounted but sometimes I have to: umount -F /usr/local/etc/fstab/dev2 -a But the third, most annoying issue is that if I forget to unmount all disks after stopping a jail and then I start the jail, the unionfs is mounted twice. Once that happens and I need to stop the jail, unmounting disks for that jail causes kernel panic. Does anyone have experience with that setup? Are those issues known and are there any possible fixes or workarounds? Grzegorz ___ freebsd-jail@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-jail To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"
Re: Unresponsive jails issues
On 2016-05-16 08:55, Grzegorz Junka wrote: > I have a server running 13 jails for various system services. Recently I > added two jails to run simple go applications for testing. They open a > network socket and nginx, which is in another jail, and which round > robin balances requests to them. I mention that because it may be > related, however not necessarily because it was happening earlier. > > The problem is that every 2-3 days jails in my servers stop responding. > "jexec jailname tcsh" hangs forever, "service jail stop jailname" hangs > forever as well. "top" doesn't show anything suspicious. I can login > through SSH to the main server fine. I don't login to jails through SSH > so I can't check but it seems that when that happens they stop > responding because the services that are running in them stop too (e.g. > web server, imap, ...). I tried to "kill -9" the "jexec" process that > hangs but that doesn't work. > > My first question is what evidence should I gather when that happens so > that I can investigate the issue later on after the server is restarted? > > And the second question, any idea why that might be happening in the > first place? > > I am running FreeBSD 10.3 AMD64 updated from 10.2 a couple of weeks ago. > > Grzegorz > > ___ > freebsd-jail@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-jail > To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org" When you issue the jexec and it hangs, try pressing 'control+t' to see what the waitchan is. Along with what Bjoern said, using procstat -k to examine other processes etc. -- Allan Jude ___ freebsd-jail@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-jail To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"