Re: reinstallation and restore after catastrophic mistake or failure; was: 1 Currently unreadable (pending) sectors How worried should I be?

David Christensen Sat, 06 Jan 2024 15:30:35 -0800

On 1/6/24 04:36, Michael Kjörling wrote:

On 6 Jan 2024 00:37 -0800, from dpchr...@holgerdanske.com (David Christensen):

I suggest taking an image (backup) with dd(1), Clonezilla, etc., when you're
done.  This will allow you to restore the image later -- to roll-back a
change you do not like, to recovery from a disaster, to clone the image to
another device, to facilitate experiments, (such as doing a secure erase to
see if it resolves the SSD pending sector issue), etc..


If you also keep your system configuration files in a version control
system, restoring an image is faster than wipe/ fresh install/ configure/
restore data.


I would go even farther. Backups should be designed such that
recovering from a catastrophic storage failure, such as getting hit by
ransomware, unintentionally doing a destructive badblocks write test
or the sudden failure of a storage device, is possible by at most
something very similar to:

* Boot some kind of live environment

I wanted more tools than what the Debian installer rescue shell provides(e.g. BusyBox) and I am too lazy to learn yet another live system (e.g.Knoppix), so I installed Debian with Xfce onto two USB drives -- onewith BIOS/MBR and the other with secure UEFI/GPT. They are bothcomplete installs, so they are familiar and I can add whatever I want.

* Set up file systems on the storage device to be restored onto
   (partitioning, setting up LUKS containers, formatting, whatever else
   might be called for)
* Within the live environment, install and configure the software
   needed to access the backup (if any) (this may include things like
   cryptographic keys, access passphrases and the likes)
* Perform the restoration from the most recent backup (this is the
   part that likely will take a significant amount of time)

I keep my Debian instances small, simple, and self-contained (1 GB ext4boot, 1 GB dm-crypt swap, and 12 GB LUKS ext4 root on one 16+ GB 2.5"SATA SSD). dd(1) meets all of my imaging needs. It's fast and requiresminimal storage -- less than 10 minutes using an old-school USB 2.0 HDD;each 100 GB holds 6+ images. (`apt-get autoremove`, `apt-getautoclean`, fstrim(8), and/or gzip(1) can reduce time and storagerequirements.)

If my OS instances were larger, more complex, shared disk space, etc. --e.g. multi-boot Windows, Debian, etc., with a shared data partition --e.g. what the OP likely had -- I would think about a tool such asClonezilla. Then I would get a big USB 3.0+ HDD/RAID, boot one of myDebian USB instances, look at the partition table, and take dd(1) imagesin chunks -- block 0 to the last block before ESP, the ESP, then eachpartition or contiguous span of related partitions, and finally thesecondary GPT header.

* Update the restored copies of /etc/fstab, /etc/crypttab and any
   other files that directly reference the partitions or file systems
   by some kind of ID (UUID, /dev/disk/by-*/*, ...)
* Reinstall the boot loader

When I take a dd(1) image of an MBR disk, I copy from block 0 throughthe end of the root partition. So:


1.  UUID's are preserved.

2.  All boot loader stages are preserved.

When I take an dd(1) image of a GPT disk with lots of zeros (fresh wipeand install), I copy the whole thing. Again, UUID's and boot loaderstages are preserved.

Using live media for UUID and/or boot loader surgery is non-trivial, asdiscussed in more than a few posts to this list. But, such may berequired after restoring an image onto a different disk and/or hardwarearrangement.

* Reboot
* Reinstall the boot loader again from within the restored environment
   to ensure that everything relating to it is in sync

For the simple case of restoring an image onto the exact same hardware,a restored MBR image just works. Same for GPT. If a GPT disk waszeroed or secure erased, a secondary GPT header will need to be neededwritten. I believe GRUB, Linux, or something on Debian did thisautomagically for me the last time I tried.

Such recovery should _not_ need to involve significant reconfiguration
of anything. Any such requirements will massively increase your time
to recovery, as I think we're seeing an example of here. And yes,
pretty much all of this could be scripted, but I strongly suspect that
few people need to do a bare-metal restore of their most recent backup
often enough for _that_ to be worth the effort to create and maintain.

AIUI the OP accidentally zeroed a Windows/ Debian multi-boot disk in arelatively new computer. Rebuilding from scratch is going to involvemore than twice the effort of rebuilding one OS from scratch, buthopefully there was no live data lost.

I have a half dozen computers in my SOHO network. I trash my dailydriver at least once a year and my workhorse more often than that.

I started with disaster preparedness/ recovery usinglowest-common-denominator tools -- tar(1), gzip(1), rsync(1), dd(1),etc.. I am a coder, so I wrapped those with shell and Perl scripts.For better or worse, I have built my own backup, recovery, image,archive, etc., suite and have tailored my work flow to match. The toolchain is Rube Goldberg, but the backup and archive products areidentifiable as standard Unix tool outputs and accessible by hand.

Which is not to say that keeping configuration files
version-controlled cannot provide benefits anyway; but given a proper,
frequent backup regime, the benefits even of that are reduced.

The goal is defense in depth -- version control, backup, restore,imaging, archive, zfs-auto-snapshot, replication, rotation, RAID, etc..



David

Re: reinstallation and restore after catastrophic mistake or failure; was: 1 Currently unreadable (pending) sectors How worried should I be?

Reply via email to