Re: what's the difference between repair CF separately and repair the entire node?

Sasha Dolgy Wed, 14 Sep 2011 01:16:00 -0700
It was mentioned in another thread that Twitter uses 0.8 in
production....for me that was a fairly strong testimonial...
On Sep 14, 2011 9:28 AM, "Yan Chunlu" <[email protected]> wrote:
> is 0.8 ready for production use? as I know currently many companies
> including reddit.com are using 0.7, how does they get rid of the repair
> problem?
>
> On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne <[email protected]
>wrote:
>
>> On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu <[email protected]>
wrote:
>> > me neither don't want to repair one CF at the time.
>> > the "node repair" took a week and still running, compactionstats and
>> > netstream shows nothing is running on every node, and also no error
>> > message, no exception, really no idea what was it doing,
>>
>> To add to the list of things repair does wrong in 0.7, we'll have to add
>> that
>> if one of the node participating in the repair (so any node that share a
>> range
>> with the node on which repair was started) goes down (even for a short
>> time),
>> then the repair will simply hang forever doing nothing. And no specific
>> error message will be logged. That could be what happened. Again, recent
>> releases of 0.8 fix that too.
>>
>> --
>> Sylvain
>>
>> > I stopped yesterday. maybe I should run repair again while disable
>> > compaction on all nodes?
>> > thanks!
>> >
>> > On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
>> > <[email protected]> wrote:
>> >>
>> >> > I think it is a serious problem since I can not "repair"..... I am
>> >> > using cassandra on production servers. is there some way to fix it
>> >> > without upgrade? I heard of that 0.8.x is still not quite ready in
>> >> > production environment.
>> >>
>> >> It is a serious issue if you really need to repair one CF at the time.
>> >> However, looking at your original post it seems this is not
>> >> necessarily your issue. Do you need to, or was your concern rather the
>> >> overall time repair took?
>> >>
>> >> There are other things that are improved in 0.8 that affect 0.7. In
>> >> particular, (1) in 0.7 compaction, including validating compactions
>> >> that are part of repair, is non-concurrent so if your repair starts
>> >> while there is a long-running compaction going it will have to wait,
>> >> and (2) semi-related is that the merkle tree calculation that is part
>> >> of repair/anti-entropy may happen "out of synch" if one of the nodes
>> >> participating happen to be busy with compaction. This in turns causes
>> >> additional data to be sent as part of repair.
>> >>
>> >> That might be why your immediately following repair took a long time,
>> >> but it's difficult to tell.
>> >>
>> >> If you're having issues with repair and large data sets, I would
>> >> generally say that upgrading to 0.8 is recommended. However, if you're
>> >> on 0.7.4, beware of
>> >> https://issues.apache.org/jira/browse/CASSANDRA-3166
>> >>
>> >> --
>> >> / Peter Schuller (@scode on twitter)
>> >
>> >
>>
Re: what's the difference between repair CF separately and repair the entire node?

Reply via email to