Re: Self-healing data integrity?

DuyHai Doan Mon, 11 Sep 2017 01:13:52 -0700

Agree

 A tricky detail about streaming is that:


1) On the sender side, the node just send the SSTable (without any other
components like CRC files, partition index, partition summary etc...)
2) The sender does not even bother to de-serialize the SSTable data, it is
just sending the stream of bytes by reading directly SSTables content from
disk
3) On the receiver side, the node receives the bytes stream and needs to
serialize it in memory to rebuild all the SSTable components (CRC files,
partition index, partition summary ...)

So the consequences are:

a. there is a bottleneck on receiving side because of serialization
b. if there is a bit rot in SSTables, since CRC files are not sent, no
chance to detect it from receiving side
c. if we want to include CRC checks in the streaming path, it's a whole
review of the streaming architecture, not only adding some feature

On Sat, Sep 9, 2017 at 10:06 PM, Jeff Jirsa <jji...@gmail.com> wrote:

> (Which isn't to say that someone shouldn't implement this; they should,
> and there's probably a JIRA to do so already written, but it's a project of
> volunteers, and nobody has volunteered to do the work yet)
>
> --
> Jeff Jirsa
>
>
> On Sep 9, 2017, at 12:59 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
> There is, but they aren't consulted on the streaming paths (only on normal
> reads)
>
>
> --
> Jeff Jirsa
>
>
> On Sep 9, 2017, at 12:02 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
> Jeff,
>
>  With default compression enabled on each table, isn't there CRC files
> created along side with SSTables that can help detecting bit-rot ?
>
>
> On Sat, Sep 9, 2017 at 7:50 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
>> Cassandra doesn't do that automatically - it can guarantee consistency on
>> read or write via ConsistencyLevel on each query, and it can run active
>> (AntiEntropy) repairs. But active repairs must be scheduled (by human or
>> cron or by third party script like http://cassandra-reaper.io/), and to
>> be pedantic, repair only fixes consistency issue, there's some work to be
>> done to properly address/support fixing corrupted replicas (for example,
>> repair COULD send a bit flip from one node to all of the others)
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Sep 9, 2017, at 1:07 AM, Ralph Soika <ralph.so...@imixs.com> wrote:
>>
>> Hi,
>>
>> I am searching for a big data storage solution for the Imixs-Workflow
>> project. I started with Hadoop until I became aware of the
>> 'small-file-problem'. So I am considering using Cassandra now.
>>
>> But Hadoop has one important feature for me. The replicator continuously
>> examines whether data blocks are consistent across all datanodes. This will
>> detect disk errors and automatically move data from defective blocks to
>> working blocks. I think this is called 'self-healing mechanism'.
>>
>> Is there a similar feature in Cassandra too?
>>
>>
>> Thanks for help
>>
>> Ralph
>>
>>
>>
>> --
>>
>>
>

Re: Self-healing data integrity?

Reply via email to