Thanks for the detailed instructions. I assembled the machine at issue one and half years ago on a Supermicro H8QCE with AMD cpus, mdadm raid1 WDRaptor 150GB, 24GB Kingston ECC, amd64 etch. I used a non-dedicated cage, actually a 4U-rack dismissed from my institution, other than dismissed cpus and ram; it was a very cheap machine. After one year of work and upgrading to lenny, one 1GB mem slot died and it was painful to detect. I succeeded thanks to the much help on this site. Even the latest memtest gave no straightforward indications.
Recently one WDRaptor died and I replaced both with Seagate Barracuda 500GB (I do not need fast access, while I need space). I could recover previous OS and installations, again from much help here. The faulty boot described below was the single issue after. Unfortunately, here in Europe, in my experience, Supermicro support supports very little, both in the above circumstances and in honoring my request of hardware detail about their chip for monitoring with "sensors". Without details about the ohmic resistance they employ, sensors suffers from unpredictable offset. The vendor (Supermicro can't be accessed directly, everything goes through the vendor, in this case TWP Computers in Amsterdam) insisted that their Superdoctor should be used, which requires a strong window system, while I have only installed the X server. After that, the vendor did no more answer. Another trouble of the mainboard is the Intel Boot Agent, which can't be removed. Also, the fans plugs on the mainboard are three-cable plugs, which means no steady regulation (one can only ask to the BIOS to drop the voltage to 6V unless overheating occurs). Finally, there is much unused hardware (to afford hardware raid to Microsoft). I can't say the mainboard is bad. It does the job, benchmarks are excellent for that hardware. With openmpi it runs the fastest molecular dynamics code, which means as having twice the number of processors with respect to normal code. I say that to emphasize that memory control myst be absolutely in order, otherwise libnuma could not do that excellent job. However, for my next machine i would be happy to find an alternative brand, just in tho hpe to get full support for debian linux (I mean "sensors" for example). I would like to assemble a four quad motherboard, to get 16 processors. And to join present machine and another Tyan with two socktes to get 28 logical processors. Probably. however, without an expensive Infiniband interconnection, parallelization for molecular dynamics will not work (it would be easier with single cpus). Is any chance that in the near future quad will be superseded by oct, getting 32 processor with four sockets on a single mainboard? thanks francesco On Fri, May 1, 2009 at 3:59 PM, Douglas A. Tutty <[email protected]> wrote: > On Tue, Apr 28, 2009 at 08:12:14PM +0200, Francesco Pietra wrote: >> I wonder whether a failure to boot (amd64 lenny, multiprocessor, >> raid1) requires attention. On resetting, the boot was ok. > > Having the follow-on boot OK is good and bad: good that you booted OK, > bad in that it's an intermittant problem. > >> The message was >> >> kernel panic - not syncing: attempted to kill the idle task! > > I have no idea what _would_ cause this; I would suspect either an > intermittant (or random) hardware issue or freak of nature (planets not > alligned correctly, sun spots, whatever). Hope that its an isolated > incident but plan for it not being so. > >> I was not at the screen during the attempted boot, so that I can't say >> more to this concer. >> >> >> I have looked at /var/log/syslog not finding a clear trace of the >> failure. The machine was not used today and all of today in syslog >> relates to 28 April 19.56-19.57. > > Well, during boot, until /var is mounted rw, nothing will appear in > syslog. > > > If you have a separate machine available (it doesn't have to be > dedicated to this), and if you plan to reboot this problem machine soon, > I'd set it up for serial console (boot messages going out the serial > port instead of to the vga screen), and capture it with the other > machine. > > In /boot/grub/menu.list, you'd add an altoptions line: > > # altoptions=(serial console) console=tty0 console=ttyS1,38400n8 > > the first console command says to send info to tty0, the second to ttyS1 > (a serial port). Check the docs for the order, this is for my server > when I run it from another box and I need to talk to the boot process > (for LUKS password), you may need the other order so that you can type > on the tty0 console but have messages go to ttyS1. Adjust the ttyS1 for > whatever serial port you use and the speed, parity, and data bits (here > 38400n8). > > Once you have things set up and working, which will involve rebooting > the suspect machine, you'll see what happens. > > If it were me, I'd also schedule some downtime overnight on the box and > run memtest (the memtest86+ package that installs into grub, or boot a > live CD such as grml that includes memtest as a boot option). > > Good luck. > > Doug. > > > -- > To UNSUBSCRIBE, email to [email protected] > with a subject of "unsubscribe". Trouble? Contact [email protected] > > -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

