Re: nodetool repair caused high disk space usage

2011-08-20 Thread Philippe
Péter, In our case they get created exclusively during repairs. Compactionstats showed a huge number of sstable build compactions On Aug 20, 2011 1:23 AM, "Peter Schuller" wrote: >> Is there any chance that the entire file from source node got streamed to >> destination node even though only smal

Re: node restart taking too long

2011-08-20 Thread Yan Chunlu
any suggestion? thanks! On Fri, Aug 19, 2011 at 10:26 PM, Yan Chunlu wrote: > the log file shows as follows, not sure what does 'Couldn't find cfId=1000' > means(google just returned useless results): > > > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) > Found table data

Re: 0.7.4: Replication assertion error after removetoken, removetoken force and a restart

2011-08-20 Thread Anand Somani
0.7.4/ 3 node cluster/ RF -3 /Quorum read/write After I re-introduced a corrupted node, followed the process as (thanks to folks on the mailing list for helping me) listed on the operations wiki to handle failures. Still doing a cleanup on one node at this point. But I noticed that I am seeing thi

Re: node restart taking too long

2011-08-20 Thread Peter Schuller
> the log file shows as follows, not sure what does 'Couldn't find cfId=1000' > means(google just returned useless results): Those should be the indication that the schema is wrong on the node. Reads and writes are being received from other nodes pertaining to column families it does not know abou

Re: node restart taking too long

2011-08-20 Thread Peter Schuller
Can you post the complete Cassandra log starting with the initial start-up of the node after having removed schema/migrations? -- / Peter Schuller (@scode on twitter)

Re: Occasionally getting old data back with ConsistencyLevel.ALL

2011-08-20 Thread Peter Schuller
> Do you mean the cassandra log, or just logging in the script itself? The script itself. I.e, some "independent" verification that the line of code after the insert is in fact running, just in case there's some kind of silent failure. Sounds like you've tried to address it though with the E-Mail

Re: nodetool repair caused high disk space usage

2011-08-20 Thread Peter Schuller
> In our case they get created exclusively during  repairs. Compactionstats > showed a huge number of sstable build compactions Do you have an indication that at least the disk space is in fact consistent with the amount of data being streamed between the nodes? I think you had 90 -> ~ 450 gig wit

Re: node restart taking too long

2011-08-20 Thread Jonathan Ellis
This means you should upgrade, because we've fixed bugs about ignoring deleted CFs since 0.7.4. On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu wrote: > the log file shows as follows, not sure what does 'Couldn't find cfId=1000' > means(google just returned useless results): > > INFO [main] 2011-08-1

Re: Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-20 Thread Anand Somani
Thanks for the help, this seems to have worked. Except that while adding the new node we added the same token to a different IP (operational script goofup) and brought the node up, so now the other nodes just had the message that a new IP had taken over the token. - So we brought it down and f

Re: node restart taking too long

2011-08-20 Thread Yan Chunlu
that could be the reason, I did nodetool repair(unfinished, data size increased 6 times bigger 30G vs 170G) and there should be some unclean sstables on that node. however upgrade it a tough work for me right now. could the nodetool scrub help? or decommission the node and join it again? On Su

Re: node restart taking too long

2011-08-20 Thread Jonathan Ellis
I'm not sure what problem you're trying to solve. The exception you pasted should stop once your clients are no longer trying to use the dropped CF. On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu wrote: > that could be the reason, I did nodetool repair(unfinished, data size > increased 6 times big

Re: nodetool repair caused high disk space usage

2011-08-20 Thread Philippe
> > Do you have an indication that at least the disk space is in fact > consistent with the amount of data being streamed between the nodes? I > think you had 90 -> ~ 450 gig with RF=3, right? Still sounds like a > lot assuming repairs are not running concurrently (and compactions are > able to run