Re: Occasionally getting old data back with ConsistencyLevel.ALL

2011-08-19 Thread Kyle Gibson
The cron script doesn't do much. It pulls new IPNs (usually only 1 in a given 5 minute period), inserts a row, and then sends an email. As for failure handling in the script itself, I rely on python exception handling, and whenever an exception occurs I do get an email with the exception details.

Re: memory overhead of vector clocks vs timestamps and running *without* either to save memory?

2011-08-19 Thread Jonathan Ellis
The problem with naive last write wins is that writes don't always arrive at each replica in the same order. So no, that's a non-starter. Vector clocks are a series of (client id, clock) entries, and usually a timestamp so you can prune old entries. Obviously implementations can vary, but to pic

memory overhead of vector clocks vs timestamps and running *without* either to save memory?

2011-08-19 Thread Kevin Burton
I have a few questions which I can't seem to find answers to... I know that the memory overhead of timestamps is 8 bytes per row/column. What is the memory overhead of vector clocks? Is it possible (at least in theory) to run without timestamps on your values? I'm fine with last writer wins se

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller
> Is there any chance that the entire file from source node got streamed to > destination node even though only small amount of data in hte file from > source node is supposed to be streamed destination node? Yes, but the thing that's annoying me is that even if so - you should not be seeing a 40

Completely removing a node from the cluster

2011-08-19 Thread Bryce Godfrey
I'm on 0.8.4 I have removed a dead node from the cluster using nodetool removetoken command, and moved one of the remaining nodes to rebalance the tokens. Everything looks fine when I run nodetool ring now, as it only lists the remaining 2 nodes and they both look fine, owning 50% of the token

Different cluster gossiping to each other

2011-08-19 Thread Hefeng Yuan
Symptom is that when we populate data into the non-prod cluster, after a while, we start seeing this warning message from the prod cluster: "WARN [GossipStage:1] 2011-08-19 19:47:35,730 GossipDigestSynVerbHandler.java (line 63) ClusterName mismatch from non-prod-node-ip non-prod-Cluster!=prod-C

Re: How can I patch a single issue

2011-08-19 Thread Jonathan Ellis
I think this is what you want: https://github.com/stuhood/cassandra/tree/file-format-and-promotion On Fri, Aug 19, 2011 at 1:28 PM, Peter Schuller wrote: >> https://issues.apache.org/jira/browse/CASSANDRA-674 >> But when I downloaded the patch file I can't find the correct trunk to >> patch... >

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Huy Le
> > To confirm - are you saying the data directory size is huge, but the > live size as reported by nodetool ring and nodetool info does NOT > reflect this inflated size? > That's correct. > What files *do* you have in the data directory? Any left-over *tmp* > files for example? > > The files th

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller
> There were few Compacted files.  I thought that might have been the cause, > but it wasn't it.  We have a CF that is 23GB, and while repair is running, > there are multiple instances of that CF created along with other CFs. To confirm - are you saying the data directory size is huge, but the liv

Re: Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-19 Thread Anand Somani
ok I will go with the IP change strategy and keep you posted. Not going to manually copy any data, just bring up the node and let it bootstrap. Thanks On Fri, Aug 19, 2011 at 11:46 AM, Peter Schuller < peter.schul...@infidyne.com> wrote: > > (Yes, this should definitely be easier. Maybe the most

Re: Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-19 Thread Peter Schuller
> (Yes, this should definitely be easier. Maybe the most generally > useful fix would be for Cassandra to support a node joining the wring > in "write-only" mode. This would be useful in other cases, such as > when you're trying to temporarily off-load a node by dissabling > gossip). I knew I had

Re: Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-19 Thread Peter Schuller
> From what I understand, Peter's recommendation should work for you. They > have both worked for me. No need to copy anything by hand on the new node. > Bootstrap/repair does that for you. From the Wiki: Right - it's just that the complication comes from the fact that he's using the same machine,

Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-19 Thread Peter Schuller
> I am running read/write at quorum. At this point I have turned off my > clients from talking to this node. So if that is the case I can potentially > just nodetool repair (without changing IP). But would it be better if I No, other nodes in the cluster will still be sending reads to the node. >

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Huy Le
There were few Compacted files. I thought that might have been the cause, but it wasn't it. We have a CF that is 23GB, and while repair is running, there are multiple instances of that CF created along with other CFs. I checked the stream directory across cluster of four nodes, but it was empty.

Re: Unable to repair a node

2011-08-19 Thread Peter Schuller
> Somewhere I remember discussions about issues with the merkle tree range > splitting or some such that resulted in repair always thinking a little bit > of data was out of sync. https://issues.apache.org/jira/browse/CASSANDRA-2324 - fixed for early 0.8. I don't *think* there's a know open bug t

Re: Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-19 Thread jonathan . colby
Hi - From what I understand, Peter's recommendation should work for you. They have both worked for me. No need to copy anything by hand on the new node. Bootstrap/repair does that for you. From the Wiki: If a node goes down entirely, then you have two options: (Recommended approach) Bring

Re: Unable to repair a node

2011-08-19 Thread Peter Schuller
> I've know run 7 repairs in a row on this keyspace and every single one has > finished successfully but performed streams between all nodes. This keyspace > was written to over the course of several weeks, sometimes with How much data is streamed, do you know? Mainly interesting is if there is a

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Huy Le
I wasn't clear on that. What I mean was would scrub putting data in at state that might have caused the repair consume a lot of disk space? On Thu, Aug 18, 2011 at 6:44 PM, aaron morton wrote: > No scrub is a local operation only. > > Cheers > > - > Aaron Morton > Freelance Cassa

Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-19 Thread Anand Somani
Let me be specific on lost data -> lost a replica , the other 2 nodes have replicas I am running read/write at quorum. At this point I have turned off my clients from talking to this node. So if that is the case I can potentially just nodetool repair (without changing IP). But would it be better

Re: Occasionally getting old data back with ConsistencyLevel.ALL

2011-08-19 Thread Peter Schuller
> Is it possible for instance that sometimes your cron job takes longer > than five minutes? Or just a lack of failure handling in the cron job for that matter. Are you *SURE* the the "processed" flag truly got set? Do you have a log statement (written *AFTER* successful write to Cassandra) that i

Re: How can I patch a single issue

2011-08-19 Thread Peter Schuller
> https://issues.apache.org/jira/browse/CASSANDRA-674 > But when I downloaded the patch file I can't find the correct trunk to > patch... Check it out from git (or svn) and apply to trunk. I'm not sure whether it still applies cleanly; given the size of the patch I wouldn't be surprised if some re

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller
> After upgrading to cass 0.8.4 from cass 0.6.11.  I ran scrub.  That worked > fine.  Then I ran nodetool repair on one of the nodes.  The disk usage on > data directory increased from 40GB to 480GB, and it's still growing. If you check your data directory, does it contain a lot of "*Compacted" fi

Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-19 Thread Peter Schuller
> ok, so we just lost the data on that node. are building the raid on it, but > once it is up what is the best way to bring it back in the cluster You're saying the raid failed and data is gone? > just let it come up and run nodetool repair > copy data from another node and then run nodetool repa

Re: Nodetool repair takes 4+ hours for about 10G data

2011-08-19 Thread Peter Schuller
> Is it normal that the repair takes 4+ hours for every node, with only about > 10G data? If this is not expected, do we have any hint what could be causing > this? It does not seem entirely crazy, depending on the nature of your data and how CPU-intensive it is "per byte" to compact. Assuming

Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-19 Thread Anand Somani
ok, so we just lost the data on that node. are building the raid on it, but once it is up what is the best way to bring it back in the cluster - just let it come up and run nodetool repair - copy data from another node and then run nodetool repair, - do I still need to run repair imme

Re: Nodetool repair takes 4+ hours for about 10G data

2011-08-19 Thread Peter Schuller
> The compactions ettings do not affect repair. (Thinking out loud, or does it > ? Validation compactions and table builds.) It does. -- / Peter Schuller (@scode on twitter)

Re: RF=1

2011-08-19 Thread Jonathan Ellis
(a) this really isn't the right forum to review patches; I've pointed out the relevant jira ticket (b) ignoring unavailable ranges is a misfeature, imo On Fri, Aug 19, 2011 at 8:11 AM, Patrik Modesto wrote: > Is there really no interest in the patch? > > P. > > On Thu, Aug 18, 2011 at 08:54, Pat

Fwd: Cassandra Memory Trend - increased memory usage when node idles.

2011-08-19 Thread Renato Bacelar da Silveira
Hello All I have let a node run for a period of 2 hours, untouched, with something like 10 Column families, and just 30 columns in total. I see a memory trend that is continually increasing. There are no operations against that node. I started the node at 14:05, at 15:05 I did a manual GC.

Re: node restart taking too long

2011-08-19 Thread Yan Chunlu
the log file shows as follows, not sure what does 'Couldn't find cfId=1000' means(google just returned useless results): INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) Found table data in data directories. Consider using JMX to call org.apache.cassandra.service.StorageServ

Re: RF=1

2011-08-19 Thread Patrik Modesto
Is there really no interest in the patch? P. On Thu, Aug 18, 2011 at 08:54, Patrik Modesto wrote: > On Wed, Aug 17, 2011 at 17:08, Jonathan Ellis wrote: >> See https://issues.apache.org/jira/browse/CASSANDRA-2388 > > Ok, thanks for the JIRA ticker. I've found that very same problem > during my

Cluster key distribution wrong after upgrading to 0.8.4

2011-08-19 Thread Thibaut Britz
Hi, we were using apache-cassandra-2011-06-28_08-04-46.jar so far in production and wanted to upgrade to 0.8.4. Our cluster was well balanced and we only saved keys with a lower case md5 prefix. (Orderpreserving partitioner). Each node owned 20% of the tokens, which was also displayed on each nod

Re: Suggested settings for number crunching

2011-08-19 Thread Paul Loy
Nice one thanks. We're now up to 500k a second on one box which is pretty good (well good enough until our data grows 5 fold). So maybe (un)durable_writes may speed us up some more!! Cheers, Paul. On Thu, Aug 18, 2011 at 11:40 PM, aaron morton wrote: > couple of thoughts, 400 row mutations in