On Thu, May 3, 2012 at 9:35 AM, Robert Bonomi <bon...@mail.r-bonomi.com> wrote: > > Alejandro Imass <a...@p2ee.org> wrote: > > [ megasnip ] > >> > Things to investigate : >> > - When was the last time this box was rebooted normally ? Did it went fine >> > ? >> >> After I moved the jails to the right place I archived the jails with >> ezjail-admin and rebooted the server several times, and everything >> worked as expected. > > Rephrasing -- when was the last time _before_the_problem_was_discovered_ > that the machine was re-booted? >
The jails moved Friday 27th so the last reboot before that was Apr 4 and before Feb 29 Feb 29 10:18:46 nune reboot: rebooted by aimass Apr 4 19:45:03 nune reboot: rebooted by aimass Apr 27 19:47:06 nune reboot: rebooted by aimass Apr 28 02:03:57 nune reboot: rebooted by aimass >> > Were the jails created at this time ? >> >> No. Most of these jails have been operational for over a year on this >> server without any incidents. > > Clarifying the question -- were the jails created at the time of the last > _prior_ reboot? i.e., had the machine been re-booted successfully _after_ > the jails were installed, or was this the _first_ such reboot? > No not at all. Most of these jails were created last year, but here is the detail. cmm_php52_1 is the problematic jail with the MySQL, you will see a recent date in the config file because I recently added some cpuset as a band-aid to limit the jail's ability to bring down the whole system, leaving at least a couple of CPUs free to be able to ssh and shut it down. There is however a new jail corcaribe_php53 and was the reason we rebboted the server on Apr 4th, just to make sure that eveything would boot OK after reboot. -rw-r--r-- 1 root wheel 917 Feb 16 2011 cat58base -rw-r--r-- 1 root wheel 917 Apr 29 2011 cm_idvida -rw-r--r-- 1 root wheel 937 Apr 3 2011 cm_website -rw-r--r-- 1 root wheel 960 May 2 09:48 cmm_php52_1 -rw-r--r-- 1 root wheel 1037 Apr 4 20:00 corcaribe_php53 -rw-r--r-- 1 root wheel 950 Feb 16 2011 http_proxy -rw-r--r-- 1 root wheel 917 Aug 3 2011 mcs_cat58 -rw-r--r-- 1 root wheel 917 Feb 10 2011 php52base -rw-r--r-- 1 root wheel 917 Feb 12 2011 php53base -rw-r--r-- 1 root wheel 877 Dec 27 20:33 pyugmao -rw-r--r-- 1 root wheel 877 Mar 21 22:03 testbed -rw-r--r-- 1 root wheel 1017 May 13 2011 yabarana_cat58 -rw-r--r-- 1 root wheel 1017 Feb 13 2011 yabarana_php52 -rw-r--r-- 1 root wheel 1017 Feb 13 2011 yabarana_php53 > It appears you misunderstood the 'at this time' reference -- it did ot > mean 'at the time of the incident', but 'at the time of the last prior > reboot'. If English is not your primary language, it is an understandable > misread. > >> As I told you earlier, this server has been running for over a year >> and we have rebooted many times. > > I don't believe you ever mentioed that particular point (multiple > successful reboots after istallation) before. Repeating a prior > question, _how_long_ before the problem showed up was the most recent > re-boot? (Doesn't have to be exact -- an 'order of magnitude' estimate > [a day, a week, a month, several months] is sufficient.) > Apr 4th >> If there are such problems they exist >> by using the EzJail commands and I find this unlikely. > > What you 'find unlikely' is irrelevant. The entire situation is 'unlikely', > yet it happened. So one -has- to look at unlikely things. <wry grin> > funny >> here is the mount output is that's of any help: > > [ first disk, and 'fdescfs', and 'procfs' references removed, for clarity ] > >> /dev/ad6s1.journal on /usr/jails (ufs, asynchronous, local, gjournal) >> /usr/jails/basejail on /usr/jails/yabarana-php53/basejail (nullfs, [...] > > Yes, that is a good start at useful detail. It is, presumably, _after_ > the problem, and _after_ you had restored things to their proper places. > Yes. > Is it safe to assume that you do -not- have such a 'mount' output from > some time 'before' the problem? ( There's no rational reason why you > -would- have such, but _if_ it existed, and there were any differences > between 'then' and 'now', it could be very informative.) > No, but from what I remember it's mostly very similar. I can pull off similar mount statement from other server(s) where we run similar set-ups and that have never failed if needed. > Aother critical piece of information is what diretories -- by full path > name -- disappeared from 'where they were', and where -- by full path name, > again -- did you find them, and _with_what_names_? If everything was > moved from the same source point to the same destination, it's not necessary > to itemize each one, but the details of _one_ 'typicaal' migration is needed. > It is also significant if there was 'anything else' in the 'where they > belonged' directory that was -not- moved. *OR* if there was anything else > (something other than the '/' of a jail) there, that was _also_ moved. > I took a screen shot because I somehow suspected no one would believe me, I don't know if I can attach it here but I can send it to you privately if not. > "Narrative" descriptions, as previously provided, and while clear to someone > familiar with the machcine in question, are not sufficiently precise to allow > an 'outsider' to follow the events without 'logically' replicating the setup, > and then guessing at the meaning of any shorthands employed. > OK. I can provide mostly any information required. > > > One comment: for 'defensive' purposes it would be useful to break ad6 up > into two slices, putting 'basejail' in it's own slice. Then, for production > use, that slice can be mounted RO, and with the 'system immutable' flag > set on everything in that filesystem. > Yes. From one of your posts that became somewhat clear to me: Having all the jails on a single 150GB slice seems like a bad idea. Thanks! Let me know if I can provide anything else to help determine the root cause. -- Alejandro Imass _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"