>>>>> "rt" == Robert Thurlow <[EMAIL PROTECTED]> writes:
>>>>> "dm" == David Magda <[EMAIL PROTECTED]> writes:

    dm> Not of which helped Amazon when their S3 service went down due
    dm> to a flipped bit:

ok, I get that S3 went down due to corruption, and that the network
checksums I mentioned failed to prevent the corruption.  The missing
piece is: belief that the corruption occurred on the network rather
than somewhere else.

Their post-mortem sounds to me as though a bit flipped inside the
memory of one server could be spread via this ``gossip'' protocol to
infect the entire cluster.  The replication and spreadability of the
data makes their cluster into a many-terabyte gamma ray detector.

I wonder if they even use a meaningful VPN.

      > Modern NFS runs over a TCP connection, which includes its own
      > data validation.  This surely helps.

Yeah fine, but IP and UDP and Ethernet also have checksums.  The one
in TCP isn't much fancier.

    rt> The TCP checksum isn't very strong, and we've seen corruption
    rt> tied to a broken router, where the Ethernet checksum was
    rt> recomputed on bad data, and the TCP checksum didn't help.  It
    rt> sucked.

That's more like what I was looking for.

The other concept from your first post of ``protection domains'' is
interesting, too (of one domain including ZFS and NFS).  Of course,
what do you do when you get an error on an NFS client, throw ``stale
NFS file handle?''  Even speaking hypothetically, it depends on good
exception handling for its value, which has been a big trouble spot
for ZFS so far.

This ``protection domain'' concept is already enshrined in IEEE
802.1d---bridges are not supposed to recalculate the FCS, and if they
need to mangle the packet they're supposed to update the FCS
algorithmically based on fancy math and only the bits they changed,
not just recalculate it over the whole packet.  They state this is to
protect against bad RAM inside the bridge.  I don't know if anyone
DOES that, but it's written into the spec.

But if the network is L3, then FCS and IP checksums (ttl decrement)
will have to be recalculated, so the ``protection domain'' is partly
split leaving only the UDP/TCP checksum contiguous.

Attachment: pgpDuHpk4l3x2.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to