Which VMware August bug you mean? This one or different?

> I've been waiting for an excuse to update that story... :)
> First of all, I want you to note that was posted in November. B It is
> now March, almost four months later, and it had been going on for
> quite some time back in November.
> Recap:
> Bad firmware -> locking system.
> New firmware -> rebooting system.
> Newer firmware -> still reboots, now trashes file systems
> Newer firmware -> still reboots, trashes file systems less often.
> At time of that posting, new firmware which has diagnostic code in it
> to capture critical info so Adaptec can figure out why their cards are
> crashing my system.
> So, for a couple months, things were going pretty well. B We got a few
> crashes out of the system and data to the vendor to pass up to
> Adaptec, but no really big events. B Then one weekend, one of the
> machines falls over and can't get back up. B I figure "surprise", VPN
> into work, remove it from the cluster, and I'll worry about it Monday.
> Ok, now look at this from Adaptec's perspective... B You have pissed
> off your customer and your customer's customer. You can't find the
> problem, so you have asked them to run special diagnostic firmware to
> have them help you do your job. B What can you possibly do to further
> impress them with your incompetence now?
> So Monday, I go into work, cable up the machine and...it's hung in the
> RAID controller boot (not the system boot, but since HW manufacturers
> think it is so f*ing cool that OSs boot, of course they want their
> RAID controller to have a well advertised boot process too). B And it
> hangs. B Not even trying to read an OS off the disks, just hung. B Power
> off, back on, still hangs. B Reseat card, still hangs.
> I call our vendor, tell 'em the symptoms, they agree that it is the
> RAID controller that failed. B I start thinking, well, maybe I was a
> little hard on Adaptec, publicly bashing them like this and in
> reality, maybe I just had a defective RAID card all along. B It might
> explain why a large majority (though certainly not all!) of the
> crashes happened on this one machine...and now the card is totally
> dead. B Hm. B Maybe just bad hardware. B I'm starting to consider how
> I'll word my semi-retraction.
> Then the phone rings, it's my regular contact at the system vendor.
> He's telling me there's something really strange going on, as these
> cards are popping all over the country, all at people who have been
> running the diagnostic firmware. B They can't believe the conclusion,
> but it seems like there's a time bomb in the diagnostic firmware.
> They have a call in to Adaptec, but the guy responsible for the
> diagnostic firmware is on vacation, and it takes 'em a while to track
> the guy down, "but it is possible". B Sure enough, a couple hours
> later, I get a call back that confirms the firmware is actively
> killing our cards, and thank goodness that I upgraded them over a
> period of days and not all in a short period of time, and I do an
> emergency reversion of all the other systems.
> How do you top your past levels of incompetence now? B Thank your
> victim..er..customers who are helping you debug your product by
> time-bombing the device so that sixty days after install, your adapter
> breaks. B Can you top that? B Yeah. B Don't tell anyone about the time
> bomb -- don't tell the VAR, or the end user, "if you help us debug our
> crappy product, don't let it run this way for 60 days, or your
> computer will start doing space heater imitations".
> (One could argue that they topped that one step further by actually
> locking the boot process so one could not even boot up the firmware
> update disk and downgrade the firmware to something that sucks less,
> but I am willing to pass that off as a bug, not deliberate).
> Think about this a bit. B These people DELIBERATELY put a feature in
> their firmware to STOP me (and a lot of other people) from using this
> card. B Legit user, but they felt that I was entitled to help them
> debug their shit for no more than sixty days. B They worked hard at
> putting this feature in. B This isn't a piece of software that has
> access to the resources of a computer, like real-time clocks and
> writable disks. B This is a fucking RAID controller, which they managed
> to build a persistent time bomb into so that after 60 days of
> operation, it destroyed itself!! (and again, note: it didn't just
> crash and need to be power cycled, it DAMAGED THE CARD). B This took
> some effort -- I can't think of any other reason to have a RTC in a
> RAID card. B I also somehow doubt that the coder who did this sat down
> and wrote the time bomb AFTER he was charged with coming up with the
> diagnostic firmware. B No, I rather suspect he grabbed some
> off-the-shelf code, something they put routinely into their diagnostic
> and troubleshooting systems, but wasn't intended to get out into the
> general public. B They obviously care more about things OTHER than your
> system integrity and reliability. B This coder made an error in
> judgment, but they obviously had the tools laying around for some reason.
> Now, tell me again how horrible it is that OpenBSD doesn't let you
> trust your data (and OpenBSD's reputation) to these incompetent assholes?
> (and compare this to the VMware August Surprise...another company who
> was more afraid of you running their software against their will than
> about time bombs escaping into the wild. B I'm STUNNED that people
> still consider VMware an business grade product rather than a cute
> development toy after that event exposed their thought process so
> publicly)
> Current status: system is running on non-diagnostic firmware. B Adaptec
> and the our mail system vendor can produce this problem in the lab
> with a couple days work, so they are closer to a good solution(?), but
> B not fixed yet. B Vendor has come out with a new version of their mail
> system software which handles corrupted file systems much better than
> the old versions did (and works on a new line of hardware which uses
> a different RAID vendor). B But we are still about six months into this
> problem and it still exists.
> It does bring one part of the OpenBSD stance on aac(4) into question,
> though. B The real reason they aren't giving us the errata for these
> products may not be that they don't wish to or are embarrassed by how
> bad it is, but that they don't even understand the problems in the
> product themselves. B Doesn't change the conclusion: Adaptec products
> can't be trusted (though might be suitable for vmware servers).
> Nick.

