I have some output from the serial port but it is completely nonsensical which points to mem corruption or bad dma or something. I farted around with IPI to see if the error was at that level but could not find anything. Also tried some other things that I suspected were biting the Dell R610 but alas they were unrelated but with the same symptoms.
The only thing I know for certain after last week's reboot fest is that I am more frustrated with the darn box... I'll look at it some more in the future. On Sun, Dec 06, 2009 at 10:26:15PM -0500, Daniel Ouellet wrote: > Marco Peereboom wrote: >> I did test i386 on it and that seemed to work ok but I did not run it >> for more than a few builds. amd64 UP seems fine too. > > For the i386.mp or single kernel, it does run fine. I run it for two > years so far no problem. The i386.mp needed to be rebooted twice with > 4.6 on it in the few months. I put the 4.6 on July 4 on it when it was > tag as 4.6 and run ever sense no problem other then 2 reboot, but > doesn't look to be related to the same issue. Before that, it ran well > and I have them for 3= years by now no problem what so ever. > I ran amd64 as well well, only the mp give problem in the last 3 years. > Just can't get a ddb output to get more details. > >> These machines are of questionable quality. Theo has one that will >> crash just sitting at the boot prompt. > > With the amd64.mp, yes it will crash at the boot prompt, it simply need > to access the drive a little and will go south, but does run well for > years on if the kernel is not installed. I used them on pretty heavy > database for years on well as long as I agree to either use amg64 and > let go of the extra core on both cpu's or run the i386 and I am fine. > > Only one time so far did I get a bit more output on the console, but I > can't say what it was and couldn't get a screen shut at the time. I can > only recall something in regards to initializing the second cpu or > something in these lines, but it shouldn't be consider as valid feedback > as I sadly simply can't recall the output well to be of any value. I > only kind of recall that, but take it as such, not more weight should be > given to that part. > > The only way I can get more output on the console is if I let it reboot > constantly and watch it, sometime it will crash and giv more details on > the console and freeze there, and some time it will freeze for may be 5 > or 10 minutes and reboot then. So, if I see it, I can grab it, but most > of the time it just reboot all the time as soon as it gets to the line > with > > /dev/rsd0a: file system is clean; not checking > > but 1 out of may be 40 times, +- 10 I guess it will crash a bit later > and give more on the console and if you are lucky, you will get more > output. However in all cases, it's not possible to get ddb, or trace or > anything out of the console what so ever. I tried many times without > success yet. Put different bios, different ilom, with raid or not, etc. > All the same results. > > Not much help I know, but that's all I have got so far. > > May be a one second wait at each step pass that may give more, but > that's just a stupid idea I guess.