> One thing that we're doing for (guaranteed) immutable data is to use MD5 > signatures as keys... this will also prevent duplication, and it will allow > detection (if not correction) of bitrot at the app level easy.
Yes. Another option is to checksum keys and/or values themselves by effectively encoding each in a self-verifying format. But that makes the data a lot more opaque to tools/humans. Also consider that arbitrary data corruption could have other effects than modifying a value or a key. I'm not sure the row skipping on deserialization issues is good enough to handle absolutely arbitrary corruption (anyone?). -- / Peter Schuller