Re: Self-healing data integrity?

Carlos Rolo Thu, 14 Sep 2017 04:23:35 -0700

Wouldn't be easier for

1) The CRC to be checked by the sender, and don't send if it doesn't match?


2) And once the stream ends, you could compare the 2 CRCs to see if
something got weird during transfer?

Also you could implement this in 2 pieces instead of reviewing the
streaming architecture as a whole. I have no familiarity with Cassandra
code for making this assumptions, so just wanting to contribute (And
actually trying to implement at least the first part).

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Mobile: +351 918 918 100
www.pythian.com

On Mon, Sep 11, 2017 at 9:12 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Agree
>
>  A tricky detail about streaming is that:
>
> 1) On the sender side, the node just send the SSTable (without any other
> components like CRC files, partition index, partition summary etc...)
> 2) The sender does not even bother to de-serialize the SSTable data, it is
> just sending the stream of bytes by reading directly SSTables content from
> disk
> 3) On the receiver side, the node receives the bytes stream and needs to
> serialize it in memory to rebuild all the SSTable components (CRC files,
> partition index, partition summary ...)
>
> So the consequences are:
>
> a. there is a bottleneck on receiving side because of serialization
> b. if there is a bit rot in SSTables, since CRC files are not sent, no
> chance to detect it from receiving side
> c. if we want to include CRC checks in the streaming path, it's a whole
> review of the streaming architecture, not only adding some feature
>
> On Sat, Sep 9, 2017 at 10:06 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
>> (Which isn't to say that someone shouldn't implement this; they should,
>> and there's probably a JIRA to do so already written, but it's a project of
>> volunteers, and nobody has volunteered to do the work yet)
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Sep 9, 2017, at 12:59 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>
>> There is, but they aren't consulted on the streaming paths (only on
>> normal reads)
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Sep 9, 2017, at 12:02 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>>
>> Jeff,
>>
>>  With default compression enabled on each table, isn't there CRC files
>> created along side with SSTables that can help detecting bit-rot ?
>>
>>
>> On Sat, Sep 9, 2017 at 7:50 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> Cassandra doesn't do that automatically - it can guarantee consistency
>>> on read or write via ConsistencyLevel on each query, and it can run active
>>> (AntiEntropy) repairs. But active repairs must be scheduled (by human or
>>> cron or by third party script like http://cassandra-reaper.io/), and to
>>> be pedantic, repair only fixes consistency issue, there's some work to be
>>> done to properly address/support fixing corrupted replicas (for example,
>>> repair COULD send a bit flip from one node to all of the others)
>>>
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Sep 9, 2017, at 1:07 AM, Ralph Soika <ralph.so...@imixs.com> wrote:
>>>
>>> Hi,
>>>
>>> I am searching for a big data storage solution for the Imixs-Workflow
>>> project. I started with Hadoop until I became aware of the
>>> 'small-file-problem'. So I am considering using Cassandra now.
>>>
>>> But Hadoop has one important feature for me. The replicator continuously
>>> examines whether data blocks are consistent across all datanodes. This will
>>> detect disk errors and automatically move data from defective blocks to
>>> working blocks. I think this is called 'self-healing mechanism'.
>>>
>>> Is there a similar feature in Cassandra too?
>>>
>>>
>>> Thanks for help
>>>
>>> Ralph
>>>
>>>
>>>
>>> --
>>>
>>>
>>
>

-- 


--

Re: Self-healing data integrity?

Reply via email to