Karl Denninger wrote:
On 4/30/2019 03:09, Michelle Sullivan wrote:
Consider..

If one triggers such a fault on a production server, how can one justify 
transferring from backup multiple terabytes (or even petabytes now) of data to 
repair an unmountable/faulted array.... because all backup solutions I know 
currently would take days if not weeks to restore the sort of store ZFS is 
touted with supporting.
Had it happen on a production server a few years back with ZFS.  The
*hardware* went insane (disk adapter) and scribbled on *all* of the vdevs.

The machine crashed and would not come back up -- at all.  I insist on
(and had) emergency boot media physically in the box (a USB key) in any
production machine and it was quite-quickly obvious that all of the
vdevs were corrupted beyond repair.  There was no rational option other
than to restore.

It was definitely not a pleasant experience, but this is why when you
get into systems and data store sizes where it's a five-alarm pain in
the neck you must figure out some sort of strategy that covers you 99%
of the time without a large amount of downtime involved, and in the 1%
case accept said downtime.  In this particular circumstance the customer
didn't want to spend on a doubled-and-transaction-level protected
on-site (in the same DC) redundancy setup originally so restore, as
opposed to fail-over/promote and then restore and build a new
"redundant" box where the old "primary" resided was the most-viable
option.  Time to recover essential functions was ~8 hours (and over 24
hours for everything to be restored.)

How big was the storage area?

--
Michelle Sullivan
http://www.mhix.org/

_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to