> Hi, > > * have you filed a PR? > * is the crash easily reproducable? > * are you able to boot some ramdisk-only FreeBSD-8.2 images (eg create > a ramdisk image using nanobsd?) and do some stress testing inside > that? > > It sounds like you've established it's a storage issue, or at least > interrupt handling for storage issue. So I'd definitely try the > ramdisk-only boot and thrash it using lighttpd/httperf or something. > If that survives fine, I'd look at trying to establish whether there's > something wrong in the disk driver(s) freebsd is using. I'm not that > cluey on ESXi, but there may be some PIC/APIC/ACPI change between 7.x > and 8.0 which has caused this to surface.
We've seen this. Or something that seems really like it. We run dozens of FreeBSD VM's, many of which are 8.mumble. We have a scripted build environment dating back many years, so generally servers come out in a fairly reproducible form. After several months of smooth running, we had need to shuffle some things around, and migrated some servers to a different datastore. Suddenly, one particular VM, our corp Jabber server, started randomly disconnecting people every morning. Some inspection showed that the machine was running, but disk I/O in the VM was freezing up. Subsequent inspection suggested that it was happening during the periodic daily, though we never managed to get it to happen by manually forcing periodic daily, so that's only a theory. Given that several times it appeared that one of the find commands was running, I was guessing that something in the thin provisioned disk image for the system had gone bad, but reading the entire disk with dd didn't cause a hang, running the periodic daily by hand didn't cause a hang, etc. Migrating the VM to a different host and datastore did not fix the issue. Migrating the VM from an Opteron to a Xeon host with all the latest ESXi 4 patches also didn't make any difference. Migrating the disk image from thin to full seemed to fix it, but I only gave it a day or two, then decided there were other good reasons to reload the VM, so I nuked the VM, which, of course, fixed it. In the meantime, a dozen other similar VM's alongside it run just fine. My conclusion was that it was something specific that had gone awry in the virtual machine, probably in the disk image, but I could not identify it without significant digging that I had no particular reason or inclination to do; since it appeared to be a VMware problem, the "reload it and be done with it" seemed the quickest path to resolution. That having been said, if anyone has any brilliant ideas about what would constitute useful further steps to isolate this, I can look at recovering the faulty VM from backup and seeing if it still exhibits the problem. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples. _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"