Re: Heavy writes ok for single node, but failed for cluster

2011-04-28 Thread Sheng Chen
Thank you for your advice. Rf>=2 is a good work around. I was using 0.7.4 and have updated to the latest 0.7 branch, which includes 2554 patch. But it doesn't help. I still get lots of UnavailableException after the following logs, INFO [GossipTasks:1] 2011-04-28 16:12:17,661 Gossiper.java (line

Re: 0.7.4: Replication assertion error after removetoken, removetoken force and a restart

2011-04-28 Thread aaron morton
I *think* that code is used when one node tells others via gossip it is removing a token that is not it's own. The ode that receives information in gossip does some work and then replies to the first node with a REPLICATION_FINISHED message, which is the node I assume the error is happening on.

Re: Dropping a built in secondary index on a CF

2011-04-28 Thread aaron morton
I can confirm thats the way the code works when processing the CF update. Everything is tidied up. A On 28 Apr 2011, at 18:50, Roshan Dawrani wrote: > On Thu, Apr 28, 2011 at 9:56 AM, Xaero S wrote: > > You just need to use the update column family command on the cassandra-cli > and specify

Re: advice for EC2 deployment

2011-04-28 Thread aaron morton
If you are not going to be multi-region straight away, but wish to be in the near future I would consider: - 1 region - 2 AZ's, with the same number of nodes - Using the EC2Snitch as is, this will map to 1 cassandra DC and 2 cassandra Racks - Using the NetworkTopology strategy For background s

Re: OOM on heavy write load

2011-04-28 Thread Thibaut Britz
Could this be related as well to https://issues.apache.org/jira/browse/CASSANDRA-2463? Thibaut On Wed, Apr 27, 2011 at 11:35 PM, Aaron Morton wrote: > I'm a bit confused by the two different cases you described, so cannot > comment specially on your case. > > In general if Cassandra is slowing

Re: Cassandra node throws NPE on startup

2011-04-28 Thread Subscriber
Hi Aaron, what exactly do you mean? I restarted the cluster by calling > bin/cassandra -p pid.file on all three nodes. The first node is the (only) seed. Udo Am 27.04.2011 um 23:28 schrieb Aaron Morton: > What approach did you take to restarting the cluster? > > It looks like the

Re: Heavy writes ok for single node, but failed for cluster

2011-04-28 Thread Jonathan Ellis
This means a node was too busy with something else to send out its heartbeat. Sometimes this is STW GC. Other times it is a bug (one was fixed for 0.7.6 in https://issues.apache.org/jira/browse/CASSANDRA-2554). On Thu, Apr 28, 2011 at 3:57 AM, Sheng Chen wrote: > Thank you for your advice. Rf>=2

Re: Performance tests using stress testing tool

2011-04-28 Thread Baskar Duraikannu
Thanks Peter. When I looked at the benchmark client machine, it was not under any stress in terms of disk or CPU. But test machines are connected through 10/100 mbps switch port (not gigabit). Can this be a bottleneck? Thanks Baskar - Original Message - From: Peter Schuller To:

Strange corrupt sstable

2011-04-28 Thread Daniel Doubleday
Hi all on one of our dev machines we ran into this: INFO [CompactionExecutor:1] 2011-04-28 15:07:35,174 SSTableWriter.java (line 108) Last written key : DecoratedKey(12707736894140473154801792860916528374, 74657374) INFO [CompactionExecutor:1] 2011-04-28 15:07:35,174 SSTableWriter.java (line

Re: Strange corrupt sstable

2011-04-28 Thread mcasandra
Can someone please help understand the reason for corrupt SSTables? I am just worried what the worst case. Do we lose data in these cases? How to protect from data loss if that's the case. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Strange-co

Re: Performance tests using stress testing tool

2011-04-28 Thread Peter Schuller
> When I looked at the benchmark client machine, it was not under any stress > in terms of disk or CPU. Are you running with the python multiprocessor module available? stress should print a warning if it's not. If it's not, you'd end up with a threaded mode and due to Python's GIL you'd be bottle

Re: Strange corrupt sstable

2011-04-28 Thread Jonathan Ellis
When I have seen this in the past it has been bad memory on the server. On Thu, Apr 28, 2011 at 11:58 AM, Daniel Doubleday wrote: > Hi all > on one of our dev machines we ran into this: > INFO [CompactionExecutor:1] 2011-04-28 15:07:35,174 SSTableWriter.java (line > 108) Last written key : Decora

Re: OOM on heavy write load

2011-04-28 Thread Peter Schuller
> Could this be related as well to > https://issues.apache.org/jira/browse/CASSANDRA-2463? My gut feel: Maybe, if the slowness/timeouts reported by the OP are intermixed with periods of activity to indicate compacting full gc. OP: Check if cassandra is going into 100% (not less, not more) CPU usa

Re: OOM on heavy write load

2011-04-28 Thread Peter Schuller
> My gut feel: Maybe, if the slowness/timeouts reported by the OP are > intermixed with periods of activity to indicate compacting full gc. But even then, after taking a single full GC the behavior should disappear since there should be no left-overs from the smaller columns causing fragmentation

Re: Strange corrupt sstable

2011-04-28 Thread mcasandra
What do you mean by Bad memory? Is it less heap size, OOM issues or something else? What happens in such scenario, is there a data loss? Sorry for many questions just trying to understand since data is critical afterall :) -- View this message in context: http://cassandra-user-incubator-apache-o

best way to backup

2011-04-28 Thread William Oberman
Even with N-nodes for redundancy, I still want to have backups. I'm an amazon person, so naturally I'm thinking S3. Reading over the docs, and messing with nodeutil, it looks like each new snapshot contains the previous snapshot as a subset (and I've read how cassandra uses hard links to avoid ex

Re: best way to backup

2011-04-28 Thread Sasha Dolgy
You could take a snapshot to an EBS volume. then, take a snapshot of that via AWS. of course, this is ok.when they -arent- having outages and issues ... On Apr 28, 2011 9:54 PM, "William Oberman" wrote: > Even with N-nodes for redundancy, I still want to have backups. I'm an > amazon person, so

Re: best way to backup

2011-04-28 Thread William Oberman
Interesting. Both use cases seem easy to code. Compress to S3 = cassandra snapshot, tar, s3 put EBS = cassandra snapshot, rsync snapshot dir -> ebs, ebs snapshot I think the former is cheaper in terms of costs, as my gut says keeping around an EBS drive is more money than the lack of deltas in S3

Re: best way to backup

2011-04-28 Thread Jeremy Hanna
one thing we're looking at doing is watching the cassandra data directory and backing up the sstables to s3 when they are created. Some guys at simplegeo started tablesnap that does this: https://github.com/simplegeo/tablesnap What it does is for every sstable that is pushed to s3, it also copi

07x apt repo signature

2011-04-28 Thread Luke Biddell
On the wiki it suggests installing the key F758CE318D77295D However, when I do and apt-get update I get the following: - W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error

Re: best way to backup

2011-04-28 Thread William Oberman
My newbie mistake (always good to test things): my script wasn't storing/restoring system, only my keyspace. So, if you want to be able to restore from backup, make sure you save the keyspace and system! will On Thu, Apr 28, 2011 at 4:35 PM, Jeremy Hanna wrote: > one thing we're looking at doin

Re: best way to backup

2011-04-28 Thread Adrian Cockcroft
Netflix has also gone down this path, we run a regular full backup to S3 of a compressed tar, and we have scripts that restore everything into the right place on a different cluster (it needs the same node count). We also pick up the SSTables as they are created, and drop them in S3. Whatever you

Re: 07x apt repo signature

2011-04-28 Thread Eric Evans
On Thu, 2011-04-28 at 21:48 +0100, Luke Biddell wrote: > On the wiki it suggests installing the key F758CE318D77295D I suspect you need 2B5C1B00 (Sylvain's key). We probably need to update the wiki to include any of the keys in http://www.apache.org/dist/cassandra/KEYS -- Eric Evans eev...@rack

Re: Cassandra node throws NPE on startup

2011-04-28 Thread aaron morton
Thought you may have re-created the schema. Kill the process like that should be ok, let us know if you get the error again. Aaron On 29 Apr 2011, at 02:56, Subscriber wrote: > Hi Aaron, > > what exactly do you mean? > I restarted the cluster by calling > > > bin/cassandra -p pid.fi

Re: Heavy writes ok for single node, but failed for cluster

2011-04-28 Thread Sheng Chen
Thank you for your patch. I believe the latter version I used (the latest 0.7 branch) includes the patch, but the problem remains. Is there anything else that may block this heartbeat, like gc? Here are some logs during heartbeat failure. INFO [GossipTasks:1] 2011-04-29 07:25:09,716 Gossiper.jav