On Wed, Jun 04, 2008 at 06:07:47PM +0300, Andriy Gapon wrote: > > I wouldn't report this if not for one coincidence (which is described > below). I have too little facts, so this is more of a mystery problem > tale than a real problem report. > > There are two systems: > 1. old, slow, i386, UP, 7-STABLE > 2. new, fast, amd64, MP, 6.3-RELEASE > > Systems are located at different physical locations. > > What is common between them: > 1. they both have the same backup strategy where dumps of certain levels > are performed on certain days; there are monthly dumps of level 2 (on > first day of each month), weekly dumps of level 4 (each Sunday) and > daily dumps of levels > 5 (each day except for Sunday - but including > the firsts). > dumps are done on live filesystems using -L. > dumps are initially done to the same disk and only later are transfered > to archive media. > 2. both kernels are compiled with softupdates support but there are no > filesystems with it enabled > 3. both systems have root partition gmirror-ed, it is dumped > 4. both systems have gjournal support (on 6.X it is added via a > "non-official" patch), there are gjournaled filesystems on both systems > and they are dumped. > > On June 1 (Sunday) exactly the same thing happened on both systems. > At 4AM monthly level 2 dump was started and successfully performed. > At 5AM weekly level 4 dump was started. > Somewhere in the process of it system locked up. > When I physically accessed the systems I found the following: keyboard > didn't respond[*], screen froze, no pings. After reset I found that logs > stopped being updated at some timer shortly after 5AM. > [*] - although on amd64 system I was able to switch exactly once between > virtual terminals (actually from X terminal to console terminal). But > that's all, no led responses, no special combinations (like break to > debugger - it is compiled in / enabled). > > This coincidence in details (and that one successful VT switch) lead me > to believe that this was some lock up in kernel rather than a hardware > problem. Also, I use that backup scheme for almost a year and never had > such a problem before. I just checked and this was the first time that > the 1st of a month fell on Sunday, so this was the first time when level > 2 dump was followed by level 4 dump. In previous months it was followed > by level > 6 dumps. > > All in all, quite strange.
Do you use snapshots on the gjournaled fs ? I believe this is problematic.
pgpD0v2fTTdNN.pgp
Description: PGP signature