Which VMware August bug you mean? This one or different? http://communities.vmware.com/thread/162377?tstart=0&start=0
On Fri, Mar 5, 2010 at 1:47 AM, Nick Holland <n...@holland-consulting.net> wrote: > Tomas Bodzar wrote: >> You just think that it's running perfectly under Linux ;-) See eg. this post >> http://marc.info/?l=openbsd-misc&m=125783114503531&w=2 > > I've been waiting for an excuse to update that story... :) > > First of all, I want you to note that was posted in November. B It is > now March, almost four months later, and it had been going on for > quite some time back in November. > > Recap: > Bad firmware -> locking system. > New firmware -> rebooting system. > Newer firmware -> still reboots, now trashes file systems > Newer firmware -> still reboots, trashes file systems less often. > At time of that posting, new firmware which has diagnostic code in it > to capture critical info so Adaptec can figure out why their cards are > crashing my system. > > So, for a couple months, things were going pretty well. B We got a few > crashes out of the system and data to the vendor to pass up to > Adaptec, but no really big events. B Then one weekend, one of the > machines falls over and can't get back up. B I figure "surprise", VPN > into work, remove it from the cluster, and I'll worry about it Monday. > > > Ok, now look at this from Adaptec's perspective... B You have pissed > off your customer and your customer's customer. You can't find the > problem, so you have asked them to run special diagnostic firmware to > have them help you do your job. B What can you possibly do to further > impress them with your incompetence now? > > > So Monday, I go into work, cable up the machine and...it's hung in the > RAID controller boot (not the system boot, but since HW manufacturers > think it is so f*ing cool that OSs boot, of course they want their > RAID controller to have a well advertised boot process too). B And it > hangs. B Not even trying to read an OS off the disks, just hung. B Power > off, back on, still hangs. B Reseat card, still hangs. > > I call our vendor, tell 'em the symptoms, they agree that it is the > RAID controller that failed. B I start thinking, well, maybe I was a > little hard on Adaptec, publicly bashing them like this and in > reality, maybe I just had a defective RAID card all along. B It might > explain why a large majority (though certainly not all!) of the > crashes happened on this one machine...and now the card is totally > dead. B Hm. B Maybe just bad hardware. B I'm starting to consider how > I'll word my semi-retraction. > > Then the phone rings, it's my regular contact at the system vendor. > He's telling me there's something really strange going on, as these > cards are popping all over the country, all at people who have been > running the diagnostic firmware. B They can't believe the conclusion, > but it seems like there's a time bomb in the diagnostic firmware. > They have a call in to Adaptec, but the guy responsible for the > diagnostic firmware is on vacation, and it takes 'em a while to track > the guy down, "but it is possible". B Sure enough, a couple hours > later, I get a call back that confirms the firmware is actively > killing our cards, and thank goodness that I upgraded them over a > period of days and not all in a short period of time, and I do an > emergency reversion of all the other systems. > > How do you top your past levels of incompetence now? B Thank your > victim..er..customers who are helping you debug your product by > time-bombing the device so that sixty days after install, your adapter > breaks. B Can you top that? B Yeah. B Don't tell anyone about the time > bomb -- don't tell the VAR, or the end user, "if you help us debug our > crappy product, don't let it run this way for 60 days, or your > computer will start doing space heater imitations". > > (One could argue that they topped that one step further by actually > locking the boot process so one could not even boot up the firmware > update disk and downgrade the firmware to something that sucks less, > but I am willing to pass that off as a bug, not deliberate). > > > Think about this a bit. B These people DELIBERATELY put a feature in > their firmware to STOP me (and a lot of other people) from using this > card. B Legit user, but they felt that I was entitled to help them > debug their shit for no more than sixty days. B They worked hard at > putting this feature in. B This isn't a piece of software that has > access to the resources of a computer, like real-time clocks and > writable disks. B This is a fucking RAID controller, which they managed > to build a persistent time bomb into so that after 60 days of > operation, it destroyed itself!! (and again, note: it didn't just > crash and need to be power cycled, it DAMAGED THE CARD). B This took > some effort -- I can't think of any other reason to have a RTC in a > RAID card. B I also somehow doubt that the coder who did this sat down > and wrote the time bomb AFTER he was charged with coming up with the > diagnostic firmware. B No, I rather suspect he grabbed some > off-the-shelf code, something they put routinely into their diagnostic > and troubleshooting systems, but wasn't intended to get out into the > general public. B They obviously care more about things OTHER than your > system integrity and reliability. B This coder made an error in > judgment, but they obviously had the tools laying around for some reason. > > > Now, tell me again how horrible it is that OpenBSD doesn't let you > trust your data (and OpenBSD's reputation) to these incompetent assholes? > > (and compare this to the VMware August Surprise...another company who > was more afraid of you running their software against their will than > about time bombs escaping into the wild. B I'm STUNNED that people > still consider VMware an business grade product rather than a cute > development toy after that event exposed their thought process so > publicly) > > > Current status: system is running on non-diagnostic firmware. B Adaptec > and the our mail system vendor can produce this problem in the lab > with a couple days work, so they are closer to a good solution(?), but > B not fixed yet. B Vendor has come out with a new version of their mail > system software which handles corrupted file systems much better than > the old versions did (and works on a new line of hardware which uses > a different RAID vendor). B But we are still about six months into this > problem and it still exists. > > It does bring one part of the OpenBSD stance on aac(4) into question, > though. B The real reason they aren't giving us the errata for these > products may not be that they don't wish to or are embarrassed by how > bad it is, but that they don't even understand the problems in the > product themselves. B Doesn't change the conclusion: Adaptec products > can't be trusted (though might be suitable for vmware servers). > > Nick.