Re: zookeeper, how do you feed the pets?

2010-05-17 Thread Patrick Hunt
Hi, ZK uses a quorum protocol (similar but not the same as paxos) for writes, as a result it's sensitive to inter-server latency. (however reads are always local and therefore not effected) Running a cluster fully w/in a colo you can achieve 15k writes/second, with a cluster distributed across

Re: list of columns

2010-05-17 Thread Bill de hOra
Agree with David, it's not there and thinking about how the data is laid out on disk, it can't be done without changing core code or harming something else. > if this is a performance concern It's not, it was to supply an administrative function on SuperColumns, but it would be good to not cr

Several CFs and partitioning : which key rabge is used

2010-05-17 Thread Miriam Allalouf
Hi, I have a basic question regarding key ranges and partitions. Assuming we have two CF column familes, each is associated with different KEY range and compare order. Now, Cassandra supports only one "range" of token values and key node assignement --- so, how each key range (that belong to a dif

RE: Nodes Levels of Hierarchy in Cassandra.

2010-05-17 Thread xmanach.ext
Hi Benjamin. Ok. Thank's for your anwser. -Message d'origine- De : Benjamin Black [mailto:b...@b3k.us] Envoyé : lundi 17 mai 2010 00:08 À : user@cassandra.apache.org Objet : Re: Nodes Levels of Hierarchy in Cassandra. Not in Cassandra. Your description of the levels is not quite ac

Re: list of columns

2010-05-17 Thread Bill de hOra
> But then you'd need > re-ahead to check if the column is new, that should be 'read-ahead', sorry. Bill Bill de hOra wrote: Agree with David, it's not there and thinking about how the data is laid out on disk, it can't be done without changing core code or harming something else. > if thi

Re: Load Balancing Mapper Tasks

2010-05-17 Thread Joost Ouwerkerk
At any given moment at least half of those threads are in the following state; what does it represent? Name: ROW-READ-STAGE:6 State: WAITING on java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@fea6030 Total blocked: 44 Total waited: 479 Stack trace: sun.misc.Unsafe.park(Nati

Re: Several CFs and partitioning : which key rabge is used

2010-05-17 Thread Jonathan Ellis
There is only one partitioner, and that alone is what determines key -> token mapping. CF has nothing to do with it. On Mon, May 17, 2010 at 4:55 AM, Miriam Allalouf wrote: > Hi, > I have a basic question regarding key ranges and partitions. > Assuming we have two CF column familes, each is asso

Re: Load Balancing Mapper Tasks

2010-05-17 Thread Jonathan Ellis
That means they are blocking for something to be added to the task queue On Mon, May 17, 2010 at 9:42 AM, Joost Ouwerkerk wrote: > At any given moment at least half of those threads are in the following > state; what does it represent? > Name: ROW-READ-STAGE:6 > State: WAITING on > java.util.conc

Re: Avro Example Code

2010-05-17 Thread Eric Evans
On Fri, 2010-05-14 at 14:52 -0600, David Wellman wrote: > Does anyone have a good link or example code that we can use to spike on Avro > with Cassandra? If you're using Python, the best place to look is the functional tests (see test/system), otherwise, Patrick's quick start (http://bit.ly/32T6M

Re: Overfull node

2010-05-17 Thread Anthony Molinaro
I had this happen when I changed the seed node in a running cluster, and then started and stopped various nodes. I "fixed" it by restarting the seed node(s) (and waiting for it to be fully up), then restarting all the other nodes. -Anthony On Fri, May 14, 2010 at 05:11:40PM -0700, David Koblas w

Re: Avro Example Code

2010-05-17 Thread Wellman, David
I spent the weekend working with avro and some java junit tests. I still have a lot of learning to do, but if others would like to use, add to or improve upon the tests then I would appricate the feedback and help. David Wellman On May 17, 2010, at 10:16 AM, "Eric Evans" wrote: On Fri,

nodetool causing OOM?

2010-05-17 Thread Ronald Park
Hello, We are getting our feet wet with Cassandra and have a test environment set up to do some heavy data insertion. [Heavy is relative: we are talking about 1M inserts in a 3 hours test. Twice while running these tests, when we've tried to use 'nodetool' about an hour or so into the test,

Re: Hadoop over Cassandra

2010-05-17 Thread Jonathan Ellis
Moving to the user@ list. http://wiki.apache.org/cassandra/HadoopSupport should be useful. On Mon, May 17, 2010 at 2:41 PM, Yan Virin wrote: > Hi, > Can someone explain how this works? As long as I know, there is no execution > engine in Cassandra alone, so I assume that Hadoop gives the MapRedu

Re: cannot loadbalance node

2010-05-17 Thread Jonathan Ellis
Sounds like a bug. Can you create a ticket at https://issues.apache.org/jira/browse/CASSANDRA ? As a workaround, it should recover if you shut down the node and restart it. If restarting the single node doesn't work, restarting the entire cluster will. 2010/5/14 Maxim Kramarenko : > Hello! > >

Re: cassandra cluster (2 node): one node is OK, another has this error: java.lang.AssertionError: invalid response count 2

2010-05-17 Thread Jonathan Ellis
do both nodes see each other in nodetool ring? On Fri, May 14, 2010 at 4:47 PM, li wei wrote: > Hi, Guys, > > I have a lasted cassandra -0.6.0.-rc1. > Connect one node1 from java. Node 1 is OK, node 2 has this error: > (if Connect one node2 from java. Node 2 is OK, node 1 has same error): > I hav

Re: nodetool causing OOM?

2010-05-17 Thread Brandon Williams
On Mon, May 17, 2010 at 2:44 PM, Ronald Park wrote: > Hello, > > We are getting our feet wet with Cassandra and have a test environment set > up to do some heavy data insertion. [Heavy is relative: we are talking about > 1M inserts in a 3 hours test. > > Twice while running these tests, when we'v

Re: nodetool causing OOM?

2010-05-17 Thread Ronald Park
Brandon Williams wrote: On Mon, May 17, 2010 at 2:44 PM, Ronald Park > wrote: Hello, We are getting our feet wet with Cassandra and have a test environment set up to do some heavy data insertion. [Heavy is relative: we are talking about 1M inserts in

Re: Cassandra training on May 21 in Palo Alto

2010-05-17 Thread S Ahmed
Jonathan, Curious how many people have signed up? I hope you will do another one soon! On Tue, May 11, 2010 at 12:42 PM, Vick Khera wrote: > On Fri, May 7, 2010 at 6:56 AM, Matt Revelle wrote: > > Reston, VA is a good spot in the DC metro area for tech events. > > +1 >

Re: Hadoop over Cassandra

2010-05-17 Thread Vick Khera
On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis wrote: > Moving to the user@ list. > > http://wiki.apache.org/cassandra/HadoopSupport should be useful. That document doesn't really answer the "is data locality preserved" when running the map phase, but my hunch is "no". > > On Mon, May 17, 2010

JMX metrics for monitoring

2010-05-17 Thread Maxim Kramarenko
Hi! Which JMX metrics do you use for Cassandra monitoring ? Which values can be used for alerts ?

Re: nodetool causing OOM?

2010-05-17 Thread Nahor
On 2010-05-17 12:51, Brandon Williams wrote: On Mon, May 17, 2010 at 2:44 PM, Ronald Park > wrote: Hello, We are getting our feet wet with Cassandra and have a test environment set up to do some heavy data insertion. [Heavy is relative: we are talking

Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Curt Bererton
Hello Cassandra users+experts, Hopefully someone will be able to point me in the correct direction. We have cassandra 0.6.1 working on our test servers and we *thought* everything was great and ready to move to production. We are currently running a ring of 4 large instance EC2 (http://aws.amazon.

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Mark Greene
Can you provide us with the current JVM args? Also, what type of work load you are giving the ring (op/s)? On Mon, May 17, 2010 at 6:39 PM, Curt Bererton wrote: > Hello Cassandra users+experts, > > Hopefully someone will be able to point me in the correct direction. We > have cassandra 0.6.1 wor

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Curt Bererton
Here are the current jvm args and java version: # Arguments to pass to the JVM JVM_OPTS=" \ -ea \ -Xms128M \ -Xmx7G \ -XX:TargetSurvivorRatio=90 \ -XX:+AggressiveOpts \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRem

unbalanced token assignment with random partioner

2010-05-17 Thread Chris Shorrock
I have a feeling this issue may be more misunderstanding than anything else, but after searching for an explanation in the wiki and elsewhere my understanding of token assignments leads me to believe that unbalancing is bound to occur. Given a relatively simple example if we take a 2 node cassandr

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Brandon Williams
On Mon, May 17, 2010 at 6:02 PM, Curt Bererton wrote: > So pretty much the defaults aside from the 7Gig max heap. CPU is totally > hammered right now, and it is receiving 0 ops/sec from me since I > disconnected it from our application right now until I can figure out what's > going on. > > runni

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Lee Parker
What are your storage-conf settings for Memtable thresholds? One thing that could cause lots of CPU usage is dumping the memtables too frequently and then having to do lots of compaction. With that much available heap space you could definitely go larger than the default thresholds. Also, do you

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Lee Parker
Also, I am using batch_mutate for all of my writes. Lee Parker On Mon, May 17, 2010 at 7:11 PM, Lee Parker wrote: > What are your storage-conf settings for Memtable thresholds? One thing > that could cause lots of CPU usage is dumping the memtables too frequently > and then having to do lots of

Re: Hadoop over Cassandra

2010-05-17 Thread Jonathan Ellis
On Mon, May 17, 2010 at 4:12 PM, Vick Khera wrote: > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis wrote: >> Moving to the user@ list. >> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful. > > That document doesn't really answer the "is data locality preserved" > when running t

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Curt Bererton
Thanks for the help guys: First answering the first question: both cores are pegged: Cpu0 : 43.8%us, 34.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 22.1%st Cpu1 : 40.5%us, 36.2%sy, 0.0%ni, 0.4%id, 0.0%wa, 0.0%hi, 0.2%si, 22.6%st Mem: 7872040k total, 3620180k used, 4251860k free,

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Mark Greene
Since you only have 7.5GB of memory, it's a really bad idea to set your heap space to a max of 7GB. Remember, the java process heap will be larger than what Xmx is allowed to grow to. If you reach this level, you can start swapping which is very very bad. As Brandon pointed out, you haven't exhaust

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Curt Bererton
Agreed, and I just saw that in storage conf that a higher value for the MemtableFlushAfterMinutes is suggested otherwise you might get a "flush storm: of all your memtables flushing at once". I've changed that as well. -- Curt, ZipZapPlay Inc., www.PlayCrafter.com, http://apps.facebook.com/happyha

IO errors after upgrading from 0.5.1 to 0.6

2010-05-17 Thread Stephen Hamer
After upgrading my cluster from 0.5.1 to the 0.6 branch (commit 1206bcf in git). I am seeing lots of IO errors in the log output. Two questions: 1. Is this a sign that I have corrupt data? Is there some way for me to recover it or at the very least remove the bad data? 2. If this is an i

Re: JMX metrics for monitoring

2010-05-17 Thread Ran Tavory
There are many, but here's what I found useful so far: Per CF you have: - Recent read/write latency - PendingTasks - Read/Write count Globally you have, for each of the stages (e.g. org.apache.cassandra.concurrent:type=ROW-READ-STAGE): - PendingTasks - ActiveCount ... and as you go you'll find mo

Re: IO errors after upgrading from 0.5.1 to 0.6

2010-05-17 Thread Stephen Hamer
I found out what was wrong. The schema file had gotten changed but not deployed to the cluster recently. During the migration the new schema was used. A column family got switched from a normal column family to a super column family. Stephen Hamer On Mon, May 17, 2010 at 6:16 PM, Stephen Hamer w

Multiple hard disks configuration

2010-05-17 Thread Ma Xiao
Hi all, Recently we have a 5 nodes running cassandra, 4 X 1.5TB drives for each, I installed os(Ubuntu 9.10 Server Edition) on one of them, and make entrie disk as 1 partition for others, then I put 4 paths with DataFileDirectory, my question is what's going to happen when one of the disk fail,

Disk usage doubled after nodetool compact

2010-05-17 Thread Arie Keren
After performing "nodetool compact" command the disk usage was doubled. nodetool info reports that the load is 74G (same as before compaction) while the size of the data folder on disk is 133GB (was about 74G before compaction).

Data migration from mysql to cassandra

2010-05-17 Thread Beier Cai
I'm currently moving my existing mysql database to cassandra. One particular problem I have is to migrate all those integer auto-increment ids to cassandra's code generated keys (like UUID). One way I can do is to dump all the existing records into Cassandra and start with UUID for new records, but