On Sat, Jul 20, 2024 at 9:46 PM The Wanderer <wande...@fastmail.fm> wrote: > > On 2024-07-20 at 09:19, jeremy ardley wrote: > > > On 20/7/24 18:35, George at Clug wrote: > > [...] > > The problem was not CrowdStrike as such. It happens in the best of > > operations. > > > > The problem is the Windows Systems Administrators who contracted for > > / allowed unattended remote updates of kernel drivers on live > > hardware systems. This is the height of folly and there is no > > recovery if it causes a BSOD. > [...] > > All the sysadmins involved did is agree to let an antivirus-equivalent > utility update itself, and its definitions. I would be surprised if this > could not have easily happened with *any* antivirus-type utility which > has self-update capability; I'm fairly sure all modern broad-spectrum > antivirus-etc. suites on Windows do kernel-level access in similar > fashion. CrowdStrike just happens to be the company involved when it > *did* happen.
I was around when Symantec Antivirus did about the same to about half the workstations at the Social Security Administration. A definition file update blue screened about half the Windows NT 4.0 and Windows 2000 hosts. That was about 50,000 machines, if I recall correctly. > That the sysadmins decided to deploy CrowdStrike does not make it > reasonable to fault them for this consequence, any more than e.g. if a > gamer decided to install a game, and then the game required a patch to > let them keep playing, and that patch silently included new/updated DRM > which installed a driver which broke the system (as I recall some past > DRM implementations have reportedly done), it would then be reasonable > to fault the gamer. In neither case was the consequence foreseeable from > the decision. Sysadmins don't make that decision in the Enterprise. That decision was made above the lowly sysadmin's pay grade. > > The situation is recoverable if all the windows machines are virtual > > with a good backup/restore plan. The situation is not recoverable if > > the kernel updates are on raw iron running Windows. > > The situation is trivially recoverable if you can get access to the > machine in a way which lets you either boot to safe mode and get > local-administrator access, or lets you boot an alternative environment > (e.g. live-boot media) from which you can read and write to the hard > drive. I don't think it's trivial for some enterprises due to the sheer number of machines and the remote workforce. I'm guessing the company I work for will spend the next week or month sorting things out. And the company is a medium size enterprise with about 30,000 employees. Imagine how bad it's going to be for an enterprise with 100,000 employees. > I've spent a fair chunk of my workday today going around to affected > computers and performing a variant of the latter process. > > Once you've done that, the fix is simple: delete, or move out of the > way, a single file whose name claims that it's a driver. With that file > gone, you can reboot, and Windows will come up normally without the > bluescreen. Unfortunately, I don't see this as scalable. It works fine for a small business with 100 employees, but not an enterprise. > > Heads should roll but obviously won't > > What good would decapitation do, here? I think it's a figure of speech; not a literal. > At most, CrowdStrike's people are > guilty of rolling out an insufficiently-tested update, or of designing a > system such that it's too easy for an update to break things in this > way, or that it's possible to break things in this way not with an > actual new client version (which goes through a release cascade, with > each organization deciding which of the most recent three versions each > of their computers will get) but just with a data-files update (which, > as we have seen here, appears to go out to all clients regardless of > version). At minimum, it is negligence. > The first would be poor institutional practice; the others would be > potentially-questionable software design, although it's hard to know > without seeing the internal architecture of the software in question and > understanding *why* it's designed that way. > > In either case, it's not obvious to me why decapitating a few scapegoats > would *improve* the situation going forward, unless it can be determined > that specific people were actually negligent. The incident affected the company's share price. Shares were down $10 or $15. If the potential issues were not detailed in company literature and prospectus, then the Securities and Exchange Commission might get involved for misrepresenting risk and liabilities. There could be big fines, and that will cost the shareholders more money. All this points to an incompetent board. If someone's head is going to be taken (figuratively), then it should start with the CEO and other executives. Jeff