I should clarify. I was addressing just the issue of 
virtualizing, not what the complete set of things to
do to prevent data loss is. 

> 2010/9/19 R.G. Keen <k...@geofex.com>
> > and last-generation hardware is very, very cheap.
> Yes, of course, it is. But, actually, is that a true
> statement? 
Yes, it is. Last-generation hardware is, in general, 
very cheap. But there is no implication either way 
about ECC in that. And in fact, there is a buyer's
market for last-generation *servers* with ECC that 
is very cheap too. I can get a single-unit rackmount
server setup for under $100 here in Austin that includes
ECC memory. 

That may not be the best of all possible things to do
on a number of levels. But for me, the likelihood of 
making a setup or operating mistake in a virtual machine 
setup server is far outweighs the hardware cost to put
another physical machine on the ground. 

>I've read that it's *NOT* advisable to run ZFS on systems 
>which do NOT have ECC  RAM. And those cheapo last-gen 
>hardware boxes quite often don't have ECC, do they?
Most of them, the ex-desktop boxes, do not. However, 
as I noted above, removed-from-service servers are also
quite cheap. They *do* have ECC. I say this just to 
illustrate the point that a statement about last generation
hardware says nothing about ECC, either positive or negative.

In fact, the issue goes further. Processor chipsets from both
Intel and AMD used to support ECC on an ad-hoc basis. It may
have been there, but may or may not have been supported
by the motherboard. Intels recent chipsets emphatically do 
not support ECC. AMDs do, in general. However, the motherboard
must still support the ECC reporting in hardware and BIOS for
ECC to actually work, and you have to buy the ECC memory. 
The newer the intel motherboard, the less likely and more
expensive ECC is. Older intel motherboards sometimes
did support ECC, as a side note. 

There's about sixteen more pages of typing to cover the issue 
even modestly correctly. The bottom line is this: for 
current-generation hardware, buy an AMD AM3 socket CPU,
ASUS motherboard, and ECC memory. DDR2 and DDR3 ECC
memory is only moderately more expensive than non-ECC.

I have this year built two Opensolaris servers from scratch.
They use the Athlon II processors, 4GB of ECC memory and
ASUS motherboards. This setup runs ECC, and supports ECC 
reporting and scrubbing. The cost of this is about $65 for
the CPU, $110 for memory, and $70-$120 for the motherboard. 
$300 more or less gets you new hardware that runs a 64bit
OS, ECC, and zfs, and does not give you worries about the 
hardware going into wearout. I also bought new, high quality
power supplies for $40-$60 per machine because the power
supply is a single point of failure, and wears out - that's a 
fact that many people ignore until the machine doesn't come
up one day.

> So, I wonder - what's the recommendation, or rather,
> experience as far as home users are concerned? Is it "safe 
>enough" now do use ZFS on non-ECC-RAM systems (if backups 
>are around)?
That's more a question about how much you trust your backups
than a question about ECC. 

ZFS is a layer of checking and recovery on disk writes. If your
memory/CPU tell it to carefully save and recover corrupted
data, it will. Memory corruption is something zfs does not 
address in any way, positive or negative. 

[i][b]The correct question is this: given how much value you put on
not losing your data to hardware or software errors, how much
time and money are you willing to spend to make sure you don't 
lose your data?[/b][/i]
ZFS prevents or mitigates many of the issues involved with disk
errors and bit rot. ECC prevents or mitigates many of the issues
involved with memory corruption. 

My recommendation is this: if you are playing around, fine, use
virtual machines for your data backup. If you want some amount
of real data backup security, address the issues of data corruption
on as many levels as you can. "Safe enough" is something only 
you can answer. My answer, for me and my data, is a separate
machine which does only data backup, which runs both ECC and
zfs, on new (and burnt-in) hardware, which runs only the data
management tasks to simplify the software interactions being run,
and that being two levels deep on different hardware setups, 
finally flushing out to offline DVDs which are themselves protected
by ECC (look up DVDisaster) and periodically scanned for errors
and recopied. 

That probably seems excessive. But I've been burned with subtle
data loss before. It only takes one or two flipped bits in the wrong
places to make a really ugly scenario. Losing an entire file is in 
many ways easier to live with than a quiet error that gets 
propagated silently into your backup stream. When that happens, 
you can't trust **any** file until you have manually checked it, if
that is even possible. Want a really paranoia inducing situation?
Think about what happens if you find a silent bit corruption in 
a file system that includes encrypted files. 

So - what's your data worth to you?
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to