At first glance, this appeared to be a very egregious bug, but the effect is actually minimal: since the size of the buffer is deterministic based on the size of the data, you will have equal amounts of excess/junk data for equal rows. Combined with the fact that 0.6 doesn't reuse these buffers, I don't think we're actually doing any extra repair.
The problem is fixed in 0.7, but I've opened CASSANDRA-1729 to fix it in 0.6, in case we start reusing row buffers. Thanks for the report! Stu -----Original Message----- From: "Schubert Zhang" <zson...@gmail.com> Sent: Thursday, November 11, 2010 2:19am To: dev@cassandra.apache.org, u...@cassandra.apache.org Subject: MerkleTree.RowHash maybe a bug. Hi JE, 0.6.6: org.apache.cassandra.service.AntiEntropyService I found the rowHash method uses "row.buffer.getData()" directly. Since row.buffer.getData() is a byte[], and there may have some junk bytes in the end by the buffer, I think we should use the exact length. private MerkleTree.RowHash rowHash(CompactedRow row) { validated++; // MerkleTree uses XOR internally, so we want lots of output bits here byte[] rowhash = FBUtilities.hash("SHA-256", row.key.key.getBytes(), row.buffer.getData()); return new MerkleTree.RowHash(row.key.token, rowhash); } schubert.zh...@gmail.com