As far as I know, Riak does not provide any checksum on your data. I
hope I am wrong.

Riak does a SHA1 of the bucket – key value (
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-January/002820.html
), but this is not done for data integrity checking, only to get hash
values to spread objects among the nodes.

You can do checksums yourself (MD5, SHA) on your data, but how do you
handle an error? Can you get data from the other replicas? Do you want
to complicate client code with this?

Wouldn’t it be best if Riak had checksums built-in, and would consider
only data with correct checksums? Of course this can get complicated,
as not only object reads can get corrupt: Other data, like indexes,
can get corrupted too. Also, for small objects check-sums would take a
lot of additional space.

Google mentions the problem with data integrity in various papers:
http://www.odbms.org/download/dean-keynote-ladis2009.pdf
http://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V01Y200905CAC006

Runar
blog.epigent.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to