Unresponsive jails issues

2016-05-16 Thread Grzegorz Junka
I have a server running 13 jails for various system services. Recently I 
added two jails to run simple go applications for testing. They open a 
network socket and nginx, which is in another jail, and which round 
robin balances requests to them. I mention that because it may be 
related, however not necessarily because it was happening earlier.


The problem is that every 2-3 days jails in my servers stop responding. 
"jexec jailname tcsh" hangs forever, "service jail stop jailname" hangs 
forever as well. "top" doesn't show anything suspicious. I can login 
through SSH to the main server fine. I don't login to jails through SSH 
so I can't check but it seems that when that happens they stop 
responding because the services that are running in them stop too (e.g. 
web server, imap, ...). I tried to "kill -9" the "jexec" process that 
hangs but that doesn't work.


My first question is what evidence should I gather when that happens so 
that I can investigate the issue later on after the server is restarted?


And the second question, any idea why that might be happening in the 
first place?


I am running FreeBSD 10.3 AMD64 updated from 10.2 a couple of weeks ago.

Grzegorz

___
freebsd-jail@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: Unresponsive jails issues

2016-05-16 Thread Bjoern A. Zeeb

> On 16 May 2016, at 12:55 , Grzegorz Junka  wrote:
> 
> I have a server running 13 jails for various system services. Recently I 
> added two jails to run simple go applications for testing. They open a 
> network socket and nginx, which is in another jail, and which round robin 
> balances requests to them. I mention that because it may be related, however 
> not necessarily because it was happening earlier.
> 
> The problem is that every 2-3 days jails in my servers stop responding. 
> "jexec jailname tcsh" hangs forever, "service jail stop jailname" hangs 
> forever as well. "top" doesn't show anything suspicious. I can login through 
> SSH to the main server fine. I don't login to jails through SSH so I can't 
> check but it seems that when that happens they stop responding because the 
> services that are running in them stop too (e.g. web server, imap, ...). I 
> tried to "kill -9" the "jexec" process that hangs but that doesn't work.
> 
> My first question is what evidence should I gather when that happens so that 
> I can investigate the issue later on after the server is restarted?
> 
> And the second question, any idea why that might be happening in the first 
> place?
> 
> I am running FreeBSD 10.3 AMD64 updated from 10.2 a couple of weeks ago.

If you can log into the base system and issue commands there;  try to see what 
procstat (-k) thinks about various jailed processes.  You could also check ps 
axl for the WCHAN and see if anything suspicious shows up.

/bz


___
freebsd-jail@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Jails and unionfs

2016-05-16 Thread Grzegorz Junka
I have been using unionfs to host jails for quite a while now and in 
general they work as expected, apart from three issues. The setup is as 
below (example for one jail dev2):


_*jail.conf*_

exec.start = "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.clean;

mount.devfs;
mount.fstab = "/usr/local/etc/fstab/$name";
devfs_ruleset = 4;

path = "/j/$name";
host.hostname = "$name.*myhost*.*mydomain*.com";
exec.consolelog = "/var/log/jail/$name";

dev2 {
  ip4.addr = 192.168.1.71;
  interface = lagg0;
}

_*/usr/local/etc/fstab/dev2*_

/j/_ro3  /j/dev2 nullfs  ro   0 0
/j/_dev2 /j/dev2 unionfs rw,noatime   0 0
devfs/j/dev2/dev devfs   rw,ruleset=4 0 0

_*df gives*_

tank1/j/_dev2 1198584120 131255 1198452864 0%
/j/_dev2

/j/_ro3   1198722545 269680 1198452864 0%/j/dev2
:/j/_dev2  2397306665 1198853800 119845286450%/j/dev2
devfs  1  1 0   100%/j/dev2/dev
devfs  1  1 0   100%/j/dev2/dev

_*zfs list | grep dev2*_

tank1/j/_dev2   128M  1.12T   128M  /j/_dev2

As can be seen I need to mount devfs twice, once in jail.conf and once 
in the jail's fstab, otherwise it isn't mounted at all. That's the first 
(smaller) issue.


The second issue is that the disks are not mounted/unmounted 
automatically when I start/stop the jail. To make sure that all disks 
are mounted properly after starting a jail I need to:


mount -F /usr/local/etc/fstab/dev2 -a

When stopping the jail sometimes the disks are unmounted but sometimes I 
have to:


umount -F /usr/local/etc/fstab/dev2 -a

But the third, most annoying issue is that if I forget to unmount all 
disks after stopping a jail and then I start the jail, the unionfs is 
mounted twice. Once that happens and I need to stop the jail, unmounting 
disks for that jail causes kernel panic.


Does anyone have experience with that setup? Are those issues known and 
are there any possible fixes or workarounds?


Grzegorz


___
freebsd-jail@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: Unresponsive jails issues

2016-05-16 Thread Allan Jude
On 2016-05-16 08:55, Grzegorz Junka wrote:
> I have a server running 13 jails for various system services. Recently I
> added two jails to run simple go applications for testing. They open a
> network socket and nginx, which is in another jail, and which round
> robin balances requests to them. I mention that because it may be
> related, however not necessarily because it was happening earlier.
> 
> The problem is that every 2-3 days jails in my servers stop responding.
> "jexec jailname tcsh" hangs forever, "service jail stop jailname" hangs
> forever as well. "top" doesn't show anything suspicious. I can login
> through SSH to the main server fine. I don't login to jails through SSH
> so I can't check but it seems that when that happens they stop
> responding because the services that are running in them stop too (e.g.
> web server, imap, ...). I tried to "kill -9" the "jexec" process that
> hangs but that doesn't work.
> 
> My first question is what evidence should I gather when that happens so
> that I can investigate the issue later on after the server is restarted?
> 
> And the second question, any idea why that might be happening in the
> first place?
> 
> I am running FreeBSD 10.3 AMD64 updated from 10.2 a couple of weeks ago.
> 
> Grzegorz
> 
> ___
> freebsd-jail@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-jail
> To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"

When you issue the jexec and it hangs, try pressing 'control+t' to see
what the waitchan is. Along with what Bjoern said, using procstat -k
 to examine other processes etc.

-- 
Allan Jude
___
freebsd-jail@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"