Re: (unofficial) Community Poll for Production Operators : Repair

2013-08-05 Thread Robert Coli
On Fri, May 10, 2013 at 11:24 AM, Robert Coli wrote: > I have been wondering how Repair is actually used by operators. If > people operating Cassandra in production could answer the following > questions, I would greatly appreciate it. > https://issues.apache.org/jira/browse/CASSANDRA-5850 File

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-16 Thread Alain RODRIGUEZ
I indeed had some of those in the past. But my point is not that much to understand how I can get different counts depending on the node (I consider this as a weakness of counters and I am aware of it), my wonder is more why those inconsistent, distinct counters never converge even after a repair.

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-16 Thread Janne Jalkanen
Might you be experiencing this? https://issues.apache.org/jira/browse/CASSANDRA-4417 /Janne On May 16, 2013, at 14:49 , Alain RODRIGUEZ wrote: > @Rob: Thanks about the feedback. > > Yet I have a weird behavior still unexplained about repairing. Are counters > supposed to be "repaired" too ?

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-16 Thread Alain RODRIGUEZ
@Rob: Thanks about the feedback. Yet I have a weird behavior still unexplained about repairing. Are counters supposed to be "repaired" too ? I mean, while reading at CL.ONE I can have different values depending on what node is answering. Even after a read repair or a full repair. Shouldn't a repai

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-15 Thread Edward Capriolo
http://basho.com/introducing-riak-1-3/ Introduced Active Anti-Entropy. Riak now has active anti-entropy. In distributed systems, inconsistencies can arise between replicas due to failure modes, concurrent updates, and physical data loss or corruption. Pre-1.3 Riak already had several features for

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-15 Thread Robert Coli
On Wed, May 15, 2013 at 1:27 AM, Alain RODRIGUEZ wrote: > Rob, I was wondering something. Are you a commiter working on improving the > repair or something similar ? I am not a committer [1], but I have an active interest in potential improvements to the best practices for repair. The specific ch

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-15 Thread Edward Capriolo
I have actually tested repair in many interesting scenarios: Once I joined a node and forgot autobootstrap=true So the data looked like this in the ring left node 8GB new node 0GB right node 8GB After repair left node 10 GB new node 13 gb right node 12 gb We do not run repair at all. It is better

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-15 Thread André Cruz
On May 10, 2013, at 7:24 PM, Robert Coli wrote: > 1) What version of Cassandra do you run, on what hardware? 1.1.5 - 6 nodes, 32GB RAM, 300GB data per node, 900GB 10k RAID1, Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz. > 2) What consistency level do you write at? Do you do DELETEs? QUORUM. Yes,

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-15 Thread horschi
Hi Alain, have you had a look at the following tickets? CASSANDRA-4905 - Repair should exclude gcable tombstones from merkle-tree computation CASSANDRA-4932 - Agree on a gcbefore/expirebefore value for all replica during validation compaction CASSANDRA-4917 - Optimize tombstone creation for Expir

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-15 Thread Alain RODRIGUEZ
> *From: *"Dean Hiller" > *To: *user@cassandra.apache.org > *Sent: *Tuesday, May 14, 2013 4:48:02 AM > > *Subject: *Re: (unofficial) Community Poll for Production Operators : > Repair > > We had to roll out a fix in cassandra as a slow node was slowing dow

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-14 Thread Wei Zhu
-z and it works for our test. -Wei - Original Message - From: "Dean Hiller" To: user@cassandra.apache.org Sent: Tuesday, May 14, 2013 4:48:02 AM Subject: Re: (unofficial) Community Poll for Production Operators : Repair We had to roll out a fix in cassandra as a slow

RE: (unofficial) Community Poll for Production Operators : Repair

2013-05-14 Thread Viktor Jevdokimov
> 1) What version of Cassandra do you run, on what hardware? 1.0.12 (upgrade to 1.2.x is planned) Blade servers with 1x6 CPU cores with HT (12 vcores) (upgradable to 2x CPUs) 96GB RAM (upgrade is planned to 128GB, 256GB max) 1x300GB 15k Data and 1x300GB 10k CommitLog/System SAS HDDs > 2) Wha

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-14 Thread Hiller, Dean
cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: (unofficial) Community Poll for Production Operators : Repair Hi Rob, 1) 1.2.2 on 6 to 12 EC2 m1.xlarge 2) Quorum R&W . Almost no deletes (just some TTL) 3) Yes 4) On each nod

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-14 Thread Alain RODRIGUEZ
Hi Rob, 1) 1.2.2 on 6 to 12 EC2 m1.xlarge 2) Quorum R&W . Almost no deletes (just some TTL) 3) Yes 4) On each node once a week (rolling repairs using crontab) 5) The only behavior that is quite odd or unexplained to me is why a repair doesn't fix a counter mismatch between 2 nodes. I mean when I r