On Wed, 23 Feb 2005, Ammar T. Al-Sayegh wrote: > ----- Original Message ----- From: "Hugh Dickins" <[EMAIL PROTECTED]> > > though quite possibly you cannot afford > > such experiments on this server, and will revert to 2.4 for now. > > The problem is that my server is already in production > mode. I'm running great portion of my business on it, > where there is very little tolerance for downtime.
I feared as much. > Because the server is located in a remote datacenter, > every time it goes down it takes several hours to have > someone sent up there to manually reboot it for a hefty > emergency fee. So this bug has already cost me a lot of > money, and I'm worried that it will cost me a lot of my > clients as well if it persists. I'm very sorry for that. > Remote hands are rather expensive, so it will cost me > $100/hr to have someone runs memtest86 on my server > since I can't perform it remotely. I'll do it though > since that's your recommendation for the time being. > Hope it will not take more than an hour to run the > test, and hope it turns out as bad memory modules as > you expect because I hate to downgrade after all the > time and money I expended on the upgrade. One hour will be enough if it does find a problem in that time, worth a shot; but not enough to give confidence in the memory if it does not find one, 12 hours better. I actually wonder whether rmap.c:483 is the best memory tester (serious answer would be, in some cases yes, but not in all). Do let me know. If I can find time to rejig the debug patch against your kernel, it would itself keep your server running, replacing the BUG_ON by printks and safety. But without knowing what it will report, I can't judge how satisfactory that would be (and it's unlikely to lead us to the final answer in one go). Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/