Re: Cluster key distribution wrong after upgrading to 0.8.4

2011-08-21 Thread aaron morton
This looks like an artifact of the way ownership is calculated for the OOP. See https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java#L177 it was changed in this ticket https://issues.apache.org/jira/browse/CASSANDRA-2800 The c

Re: Cassandra Memory Trend - increased memory usage when node idles.

2011-08-21 Thread aaron morton
Using memory allocated to the JVM is not really a problem unless it's OOM'ing. Or running into performance issues due to excessive GC. One scenario I could imagine is a timeout triggered on a dirty memtable, this resulted in a flush, the flush resulted in a minor compaction, the minor compacti

Re: Different cluster gossiping to each other

2011-08-21 Thread aaron morton
Did you clear the LocationInfo from the non prod cluster ? When you gave prod seeds from non-prod, non-prod would have discovered all the nodes in prod. Unless you have cleared the location info they will still have that knowledge. Does nodetool ring in non-prod list any prod machines ? if

Re: Completely removing a node from the cluster

2011-08-21 Thread aaron morton
Unreachable nodes in either did not respond to the message or were known to be down and were not sent a message. The way the node lists are obtained for the ring command and describe cluster are the same. So it's a bit odd. Can you connect to JMX and have a look at the o.a.c.db.StorageService

how to know if nodetool cleanup is safe?

2011-08-21 Thread Yan Chunlu
since "nodetool cleanup" could remove hinted handoff, will it cause the data loss?

Re: 0.7.4: Replication assertion error after removetoken, removetoken force and a restart

2011-08-21 Thread aaron morton
There is some confusion in the ring about nodes leaving. Check nodetool ring from every node and see if they agree. Check the logs to see if there is any information about node is sending the wrong message. Without knowing much more you could try a rolling restart, but you may need a full res

Questions about TTL and batch_mutate

2011-08-21 Thread Joris van der Wel
Hello, I have a ColumnFamily in which all columns are always set with a TTL, this would be one of the hottest column families (rows_cached is set to 1.0). I am wondering if TTL values also follow gc_grace? If they do, am I correct in thinking it would be best to set gc_grace really low in this cas

Re: Questions about TTL and batch_mutate

2011-08-21 Thread aaron morton
> I am wondering if TTL values also follow gc_grace? They are purged by the first compaction that processes them after TTL has expired. The TTL expiry is used the same way as the expire on a Tombstone. Thinking out loud, is this possible…. t0 - write col to all 3 replicas. t1 - overwrite col

Re: Questions about TTL and batch_mutate

2011-08-21 Thread Joris van der Wel
On Sun, Aug 21, 2011 at 2:21 PM, aaron morton <*@thelastpickle.com> wrote: >>  I am wondering if TTL values also follow gc_grace? > They are purged by the first compaction that processes them after TTL has > expired. The TTL expiry is used the same way as the expire on a Tombstone. > > Thinkin

Re: Cluster key distribution wrong after upgrading to 0.8.4

2011-08-21 Thread Thibaut Britz
Hi, I will wait until this is fixed beforeI upgrade, just to be sure. Shall I open a new ticket for this issue? Thanks, Thibaut On Sun, Aug 21, 2011 at 11:57 AM, aaron morton wrote: > This looks like an artifact of the way ownership is calculated for the OOP. > See https://github.com/apache/ca

Commit log fills up in less than a minute

2011-08-21 Thread Anand Somani
Hi, 7.4, 3 node cluster, RF=3 Load has not changed much, on 2 of the 3 nodes the commit log filled up in less than a minute (did not give a chance to recover). Now have been running this cluster for abt 2-3 months without any problem. At this point I do not see any unusual load (continue to inves

Re: Commit log fills up in less than a minute

2011-08-21 Thread Anand Somani
So no it did not fill in a minute, but ton's of header files were written in a minute (is that normal, I assume these are marker files which get written when memtables are flushed. The actual data files have been around for the last 24 hours? Somehow this all seems connected to "reintroduce node" e

Re: Commit log fills up in less than a minute

2011-08-21 Thread Peter Schuller
> When does the actual commit-data file get deleted. > > The flush interval on all my memtables is 60 minutes They *should* be getting deleted when they no longer contain any data that has not been flushed to disk. Are flushes definitely still happening? Is it possible flushing has started failing

Re: Commit log fills up in less than a minute

2011-08-21 Thread Anand Somani
We have a lot of space on /data, and looks like it was flushing data fine from file timestamps. We did have a bit of goofup with IP's when bringing up a down node (and the commit files have been around since then). Wonder if that is what triggered it and we have a bunch of hinted handoff's being b

RE: Completely removing a node from the cluster

2011-08-21 Thread Bryce Godfrey
Both .2 and .3 list the same from the mbean that Unreachable is empty collection, and Live node lists all 3 nodes still: 192.168.20.2 192.168.20.3 192.168.20.1 The removetoken was done a few days ago, and I believe the remove was done from .2 Here is what ring outlook looks like, not sure why I

Re: Cluster key distribution wrong after upgrading to 0.8.4

2011-08-21 Thread aaron morton
I'm not sure what the fix is. When using an order preserving partitioner it's up to you to ensure the ring is correctly balanced. Say you have the following setup… node : token 1 : a 2 : h 3 : p If keys are always 1 character we can say each node own's roughly 33% of the ring. Because we kn

Re: Commit log fills up in less than a minute

2011-08-21 Thread aaron morton
Yup, you can check the what HH is doing via JMX. there is a bug in 0.7 that can result in log files not been deleted https://issues.apache.org/jira/browse/CASSANDRA-2829 Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/08/2011

Re: Completely removing a node from the cluster

2011-08-21 Thread aaron morton
I see the mistake I made about ring, gets the endpoint list from the same place but uses the token's to drive the whole process. I'm guessing here, don't have time to check all the code. But there is a 3 day timeout in the gossip system. Not sure if it applies in this case. Anyone know ? Che

would it possible for this kind of data loss?

2011-08-21 Thread Yan Chunlu
I was aware of the deleted items might be come back alive without proper node repair. how about modified items, for example 'A'=>{1,2,3}. then 'A'=>{4,5}. if that possible 'A' change back to {1,2,3}? I have encountered this mystery problem after go through a mess procedure with cassandra nodes

Cassandra Cluster Admin - phpMyAdmin for Cassandra

2011-08-21 Thread SebWajam
Hi, I'm working on this project for a few months now and I think it's mature enough to post it here: https://github.com/sebgiroux/Cassandra-Cluster-Admin Cassandra Cluster Admin on GitHub Basically, it's a GUI for Cassandra. If you're like me and used MySQL for a while (and still using it!),

The schema has not settled in 10 seconds; further migrations are ill-advised until it does.?

2011-08-21 Thread Yan Chunlu
I have encountered this problem while update the key cache and row cache. I once updated them to "0"(disable) while node2 was not available, when it comeback they eventually have the same schema version. [default@prjspace] describe cluster; Cluster Information: Snitch: org.apache.cassandra.loc

Re: The schema has not settled in 10 seconds; further migrations are ill-advised until it does.?

2011-08-21 Thread Edward Capriolo
On Sun, Aug 21, 2011 at 10:09 PM, Yan Chunlu wrote: > I have encountered this problem while update the key cache and row cache. > I once updated them to "0"(disable) while node2 was not available, when it > comeback they eventually have the same schema version. > > [default@prjspace] describe cl

Re: Cassandra Cluster Admin - phpMyAdmin for Cassandra

2011-08-21 Thread Yan Chunlu
just tried it and it works like a charming! thanks a lot for the great work! On Mon, Aug 22, 2011 at 9:47 AM, SebWajam wrote: > Hi, > > I'm working on this project for a few months now and I think it's mature > enough to post it here: > Cassandra Cluster Admin on > GitHub

Re: The schema has not settled in 10 seconds; further migrations are ill-advised until it does.?

2011-08-21 Thread Yan Chunlu
thanks for the migration tip, but the schema is in agreement. [default@prjspace] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 79d072cc-cc62-11e0-a753-5525ca993302:

Re: would it possible for this kind of data loss?

2011-08-21 Thread Stephane Legay
Ok, will look into it, thx for the heads up. Sent from a mobile device, please forgive typos. On Aug 21, 2011 6:45 PM, "Yan Chunlu" wrote: > I was aware of the deleted items might be come back alive without proper > node repair. > > how about modified items, for example 'A'=>{1,2,3}. then 'A'=>{4

RE: Completely removing a node from the cluster

2011-08-21 Thread Bryce Godfrey
It's been at least 4 days now. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 3:16 PM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster I see the mistake I made about ring, gets the endpoint list f

get mycf['rowkey']['column_name'] return 'Value was not found' in cassandra-cli

2011-08-21 Thread Yan Chunlu
connect to cassandra-cli and issue the list my cf I got RowKey: comments_62559 => (column=76616c7565, value=28286c70310a4c3236373632334c0a614c3236373733304c0a614c3236373737304c0a614c3236373932324c0a614c3236373934364c0a614c3236383137314c0a614c3236383330334c0a614c3236383934314c0a614c3236383938394c0,

Re: get mycf['rowkey']['column_name'] return 'Value was not found' in cassandra-cli

2011-08-21 Thread Jonathan Ellis
My guess: you're using an old version of the cli that isn't dealing with bytestype column names correctly On Mon, Aug 22, 2011 at 12:08 AM, Yan Chunlu wrote: > connect to cassandra-cli and issue the list my cf I got > RowKey: comments_62559 > => (column=76616c7565, > value=28286c70310a4c323637363

Avoid Simultaneous Minor Compactions?

2011-08-21 Thread Hefeng Yuan
We just noticed that at one time, 4 nodes were doing minor compaction together, each of them took 20~60 minutes. We're on 0.8.1, 6 nodes, RF5. This simultaneous compactions slowed down the whole cluster, we have local_quorum consistency level, therefore, dynamic_snitch is not helping us. Aside

Re: Avoid Simultaneous Minor Compactions?

2011-08-21 Thread Ryan King
You should throttle your compactions to a sustainable level. -ryan On Sun, Aug 21, 2011 at 10:22 PM, Hefeng Yuan wrote: > We just noticed that at one time, 4 nodes were doing minor compaction > together, each of them took 20~60 minutes. > We're on 0.8.1, 6 nodes, RF5. > This simultaneous compac

Recover from startup problems

2011-08-21 Thread Dave Brosius
Greetings, I'm running head from source, and now when i try to start up the database, i get the following exception which causes client connection failures. I'm fine with blowing away the database, just playing, but wanted to know if there is a safe way to do this. Exception encountered during