me neither don't want to repair one CF at the time. the "node repair" took a week and still running, compactionstats and netstream shows nothing is running on every node, and also no error message, no exception, really no idea what was it doing, I stopped yesterday. maybe I should run repair again while disable compaction on all nodes?
thanks! On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller <peter.schul...@infidyne.com > wrote: > > I think it is a serious problem since I can not "repair"..... I am > > using cassandra on production servers. is there some way to fix it > > without upgrade? I heard of that 0.8.x is still not quite ready in > > production environment. > > It is a serious issue if you really need to repair one CF at the time. > However, looking at your original post it seems this is not > necessarily your issue. Do you need to, or was your concern rather the > overall time repair took? > > There are other things that are improved in 0.8 that affect 0.7. In > particular, (1) in 0.7 compaction, including validating compactions > that are part of repair, is non-concurrent so if your repair starts > while there is a long-running compaction going it will have to wait, > and (2) semi-related is that the merkle tree calculation that is part > of repair/anti-entropy may happen "out of synch" if one of the nodes > participating happen to be busy with compaction. This in turns causes > additional data to be sent as part of repair. > > That might be why your immediately following repair took a long time, > but it's difficult to tell. > > If you're having issues with repair and large data sets, I would > generally say that upgrading to 0.8 is recommended. However, if you're > on 0.7.4, beware of > https://issues.apache.org/jira/browse/CASSANDRA-3166 > > -- > / Peter Schuller (@scode on twitter) >