Re: using more than 50% of disk space

2010-05-27 Thread gabriele renzi
On Wed, May 26, 2010 at 8:00 PM, Sean Bridges wrote: > So after CASSANDRA-579, anti compaction won't be done on the source node, > and we can use more than 50% of the disk space if we use multiple column > families? Sorry if I misunderstand, but #579 seems to only solve half of your question, I b

Batch_Mutate throws Uncaught exception

2010-05-27 Thread Moses Dinakaran
Hi, I am trying to use batch_mutate() with PHP Thrift. I was getting the following error. *Fatal error*: Uncaught exception 'cassandra_InvalidRequestException' in CORE/php/phpcassa/thrift/packages/cassandra/Cassandra.php:4869 Stack trace: #0 CORE/php/phpcassa/thrift/packages/cassandra/Ca

Re: Batch_Mutate throws Uncaught exception

2010-05-27 Thread Mishail
Hi, Just to clarify. Are you trying to insert a couple of columns with key "cache_pages" in the ColumnFamily "Page"? Moses Dinakaran wrote: i, > > > > I am trying to use batch_mutate() with PHP Thrift. I was getting the > following error. >

Re: Continuously increasing RAM usage

2010-05-27 Thread Ian Soboroff
A lot of folks have reported this issue, and there are a few JIRAs related to it. Post the output of nodetool tpstats. Also, are there lots of GCs in the system.log? If so, are they something besides ParNew? Ian On Thu, May 27, 2010 at 2:32 AM, James Golick wrote: > We're seeing RAM usage co

Re: Questions regarding batch mutates and transactions

2010-05-27 Thread Gary Dusbabek
On Wed, May 26, 2010 at 04:45, Todd Nine wrote: > > Now, here is where I can't find what I need in the doc.  In case 1, if my > mutation from biz op 2 were to fail during a batch mutate operation > encapsulating all mutations, does the batch mutation as a whole not get > executed, or would I still

Remove and BytesType

2010-05-27 Thread Bill de hOra
Saw some behaviour today on Cassandra 0.6.1 - After running a remove command on a row in a CF whose CompareWith was BytesType the row was still there, and still there after bouncing the server. This was the case for hector/cli. When I changed the CompareWith to UTF8Type, new rows added could b

Re: Remove and BytesType

2010-05-27 Thread Philip Stanhope
Could you clarify what you mean by "remove command"? Remove all columns leaving a row key? Did you use nodetool to force a flush and then compact after GCGraceSeconds? On May 27, 2010, at 9:27 AM, Bill de hOra wrote: > Saw some behaviour today on Cassandra 0.6.1 - > > After running a remove co

Re: Thoughts on adding complex queries to Cassandra

2010-05-27 Thread Jonathan Ellis
There definitely seems to be demand for something like this. Maybe for 0.8? On Wed, May 26, 2010 at 4:31 PM, Jeremy Davis wrote: > > Are there any thoughts on adding a more complex query to Cassandra? > > At a high level what I'm wondering is: Would it be possible/desirable/in > keeping with the

Re: Error reporting Key cache hit rate with cfstats or with JMX

2010-05-27 Thread Jonathan Ellis
Essentially, yes. On Wed, May 26, 2010 at 11:25 PM, Ran Tavory wrote: > so the row cache contains both rows and keys and if I have large enough row > cache (in particular if row cache size equals key cache size) then it's just > wasteful to keep another key cache and I should eliminate the key-ca

Re: Cassandra's 2GB row limit and indexing

2010-05-27 Thread Jonathan Ellis
Yes, #16 (which is almost done for 0.7) will make this possible. On Wed, May 26, 2010 at 7:52 PM, Richard West wrote: > Hi all, > > I'm currently looking at new database options for a URL shortener in order > to scale well with increased traffic as we add new features. Cassandra seems > to be a g

Re: Batch_Mutate throws Uncaught exception

2010-05-27 Thread Jonathan Ellis
you need to pull out the exception "why" field, which explains what was invalid about the request On Thu, May 27, 2010 at 2:45 AM, Moses Dinakaran wrote: > Hi, > > > > I am trying to use batch_mutate() with PHP Thrift. I was getting the > following error. > > > > > > Fatal error:  Uncaught except

Re: Continuously increasing RAM usage

2010-05-27 Thread Jonathan Ellis
What else are you seeing that correlates with "unresponsive?" lots of pending tasks in stage queues? high cpu in a single thread? swapping? (I sent a github pull request, but I've updated http://github.com/jbellis/cassandra-munin-plugins with a lot more metrics to monitor.) On Thu, May 27, 201

Re: Remove and BytesType

2010-05-27 Thread Jonathan Ellis
remove to a full row doesn't touch comparewith at all. I think that's a red herring. More likely data in that row was created with a higher-res timestamp than the delete was issued at. On Thu, May 27, 2010 at 7:27 AM, Bill de hOra wrote: > Saw some behaviour today on Cassandra 0.6.1 - > > After

Re: GMFD messages

2010-05-27 Thread Jonathan Ellis
This is a relic of when Gossip was over UDP and had to worry about packet size. I created https://issues.apache.org/jira/browse/CASSANDRA-1138 to remove those notifications. I think the correlation with MessageDeserializer is a red herring. Gossip only happens once per second so I don't see how t

Re: Two threads inserting columns into same key followed by read gets unexpected results

2010-05-27 Thread Jonathan Ellis
On Wed, May 26, 2010 at 11:11 AM, Scott McCarty wrote: > Am I wrong in thinking that an insert on a column with consistency level ALL > followed immediately by a get_slice should include that column? You are not wrong. Can you create a ticket at https://issues.apache.org/jira/browse/CASSANDRA wi

Re: Cassandra-0.6.1 Crash Error: out of memory

2010-05-27 Thread Jonathan Ellis
It looks like you simply don't have a large enough heap for all the in-flight data. Low-hanging fruit includes - upgrade to 0.6.2 (available from http://people.apache.org/~eevans/ until release is official later today) - when you get a TimeoutException on the client, sleep 100ms or so before re

Re: Thoughts on adding complex queries to Cassandra

2010-05-27 Thread Vick Khera
On Thu, May 27, 2010 at 9:50 AM, Jonathan Ellis wrote: > There definitely seems to be demand for something like this.  Maybe for 0.8? > The Riak data store has something like this: you can submit queries (and map reduce jobs) written in javascript that run on the data nodes using data local to th

Re: Continuously increasing RAM usage

2010-05-27 Thread Robert Coli
On 5/26/10 11:32 PM, James Golick wrote: We're seeing RAM usage continually climb until eventually, cassandra becomes unresponsive. Given the handful of bugs related to memory bloat in specific versions of Cassandra combined with specific versions of JVMs, that information may be relevant to yo

Re: Continuously increasing RAM usage

2010-05-27 Thread James Golick
When I say unresponsive, I mean that latency becomes very high. Swap is turned off, but before I turned it on, it used to swap heavily at this point. Cassandra version is 0.6.0 Beta1 [cassandra1 ~]# java -version java version "1.6.0" OpenJDK Runtime Environment (build 1.6.0-b09) OpenJDK 64-Bit S

RE: Continuously increasing RAM usage

2010-05-27 Thread Daniel Kluesing
0.6.0 had some gc issues, (I think https://issues.apache.org/jira/browse/CASSANDRA-1014) if you see lots of gc collections in the logs, I'd give 0.6.1 a try, I found it much better. Anecdotally, the sun jvm performs better than openJDK, and the u19 drop fixes some jvm bugs that can cause memory

Re: Continuously increasing RAM usage

2010-05-27 Thread Philip Stanhope
I've seen numerous anecdotal references that the Sun JVM performs better. Is there a reason why the debian packaging for Cassandra installs the OpenJDK version? What would it take to create an alternative apt-get package that pulls Sun JVM rather than OpenJDK? -phil On May 27, 2010, at 12:33

Re: Continuously increasing RAM usage

2010-05-27 Thread James Golick
Just upgraded to Sun JVM 1.6.0_20 and cassandra 0.6.2. Will report back when I have data. On Thu, May 27, 2010 at 9:39 AM, Philip Stanhope wrote: > I've seen numerous anecdotal references that the Sun JVM performs better. > > Is there a reason why the debian packaging for Cassandra installs the >

Re: GMFD messages

2010-05-27 Thread Anthony Molinaro
On Thu, May 27, 2010 at 08:04:18AM -0600, Jonathan Ellis wrote: > This is a relic of when Gossip was over UDP and had to worry about > packet size. I created > https://issues.apache.org/jira/browse/CASSANDRA-1138 to remove those > notifications. Ahh, okay, well its odd that a limit was set even

Hector client usage

2010-05-27 Thread Atul Gosain
Hi Im trying to use Hector client to insert and then read the data from cassandra. While im able to write the data and able to see that thru cassandra-client cli, im not able to read that from the program. Getting following error. What am in doing wrong in my program. Can someone help me here ?

Re: Hector client usage

2010-05-27 Thread Atul Gosain
Forgot to attach the class . On Thu, May 27, 2010 at 11:17 PM, Atul Gosain wrote: > Hi > > Im trying to use Hector client to insert and then read the data from > cassandra. While im able to write the data and able to see that thru > cassandra-client cli, im not able to read that from the progr

Re: Anyone using hadoop/MapReduce integration currently?

2010-05-27 Thread Jeremy Hanna
>> Is there anything holding you back from using it (if you would like to use >> it but currently cannot)? > > It would be nice if the output of the mapreduce job was a > MutationOutputFormat in which we could write insert/delete, but I > recall there is something on jira already albeit not sure

Re: using more than 50% of disk space

2010-05-27 Thread Sean Bridges
But doesn't having multiple similarly sized column families mean in-node compaction does not require 50% of disk? Looking at compaction manager, only 1 thread is doing a compaction, so we only need enough free disk space to compact the largest column family. Sean On Thu, May 27, 2010 at 12:00 AM

Re: Continuously increasing RAM usage

2010-05-27 Thread Kyusik Chung
Hi Philip, I think they chose to go with OpenJDK bc Sun's is not open source. Here's what we did on ubuntu 10.04 (if youre using a different debian distro, you can prob do something very similar): # this install gives us the convenient add-apt-repository command sudo apt-get install python-soft

Re: Remove and BytesType

2010-05-27 Thread Bill de hOra
> More likely data in that row was created with a > higher-res timestamp than the delete was issued at. Indeed - the problem was nanos v millis with a bit of clock skew thrown in :) Bill Jonathan Ellis wrote: remove to a full row doesn't touch comparewith at all. I think that's a red herrin

cluster locks up from high MESSAGE-DESERIALIZER-POOL counts

2010-05-27 Thread Edmond Lau
Occasionally, one of my six nodes gets a very high MESSAGE-DESERIALIZER-POOL pending count (over 100K). When that happens, it usually also has a decently high ROW-READ-STAGE pending count around 4K. All other nodes have very low load and no pending tasks. From reading other threads, this is usua

Continuously increasing RAM usage

2010-05-27 Thread Kyusik Chung
> I tried setting the IO mode to standard, but it seemed to be a little slower > and couldn't get the machine to come back online with adequate read > performance, so I set it back. I'll have to write a solid cache warming > script if I'm going to try that again. What cache are you talking about?

Large column/row inserts

2010-05-27 Thread Jones, Nick
Hi everyone, I'm using the Cassandra gem and have been trying to optimize inserting 400k-1M columns per row. I'm currently batching 1k column inserts and see about 130ms response times; however, every 19th insert takes 3.3s. Can anyone think of a reason for this? Thanks. Nick Jones

Re: GMFD messages

2010-05-27 Thread Jonathan Ellis
Yes, Gossip goes through MD too. On Thu, May 27, 2010 at 11:03 AM, Anthony Molinaro wrote: > > On Thu, May 27, 2010 at 08:04:18AM -0600, Jonathan Ellis wrote: >> This is a relic of when Gossip was over UDP and had to worry about >> packet size.  I created >> https://issues.apache.org/jira/browse/

Re: Hector client usage

2010-05-27 Thread Jonathan Ellis
UnavailableException means "all the nodes that should have this data, are down." On Thu, May 27, 2010 at 12:01 PM, Atul Gosain wrote: > Forgot to attach the class . > > On Thu, May 27, 2010 at 11:17 PM, Atul Gosain wrote: >> >> Hi >> >>   Im trying to use Hector client to insert and then read th

Re: Large column/row inserts

2010-05-27 Thread Jonathan Ellis
JVM GC pause? If so the improved JVM options in 0.6.2 should help some. Increasing heap size is also a good candidate to help. On Thu, May 27, 2010 at 2:01 PM, Jones, Nick wrote: > Hi everyone, > I'm using the Cassandra gem and have been trying to optimize inserting > 400k-1M columns per row.

Re: Cassandra training on May 21 in Palo Alto

2010-05-27 Thread S Ahmed
So how did the event turn out? On Mon, May 17, 2010 at 4:07 PM, S Ahmed wrote: > Jonathan, > > Curious how many people have signed up? > > I hope you will do another one soon! > > > On Tue, May 11, 2010 at 12:42 PM, Vick Khera wrote: > >> On Fri, May 7, 2010 at 6:56 AM, Matt Revelle wrote: >>

Cassandra CF sharding

2010-05-27 Thread Maxim Kramarenko
Hello! We have mail archive with one large CF for mail body. In our case, it's easy to shard data to 5-10 CF by customer id. We like to do this because: 1) We get more manageable instances, because we have many small CF instead of one multi-TB CF on each node. 2) Better disk space usage (ne

Re: cluster locks up from high MESSAGE-DESERIALIZER-POOL counts

2010-05-27 Thread Cagatay Kavukcuoglu
I think this is because as an optimization Cassandra sends a read request only to the closest replica and sends digest requests to other replicas for read repair. The same replica is probably getting chosen as the closest for all of your read requests. Maybe it would be a useful improvement to choo

Re: using more than 50% of disk space

2010-05-27 Thread gabriele renzi
On Thu, May 27, 2010 at 9:23 PM, Sean Bridges wrote: > But doesn't having multiple similarly sized column families mean in-node > compaction does not require 50% of disk?  Looking at compaction manager, > only 1 thread is doing a compaction, so we only need enough free disk space > to compact the

Re: cluster locks up from high MESSAGE-DESERIALIZER-POOL counts

2010-05-27 Thread Edmond Lau
If this description is accurate, then it sounds like my only available workaround would be to not use multiget() and instead issue multiple get() calls to random nodes so that I can hit the other replicas. Edmond On Thu, May 27, 2010 at 2:36 PM, Cagatay Kavukcuoglu wrote: > I think this is beca

Re: Questions regarding batch mutates and transactions

2010-05-27 Thread Todd Nine
Correct Ran. It seems like the only way I'm going to get true mutations in a single op is to use Cages. Thankfully a majority of our application won't require it, just a few specialized components. On Wed, 2010-05-26 at 12:57 +0300, Ran Tavory wrote: > The summary of your question is: is batch_

Re: Thoughts on adding complex queries to Cassandra

2010-05-27 Thread Steve Lihn
Mongo has it too. It could save a lot of development time if one can figure out porting Mongo's query API and stored javascript to Cassandra. It would be great if scala's list comprehension can be facilitated to write query-like code against Cassandra schema. On Thu, May 27, 2010 at 11:05 AM, Vick

Re: Cassandra-0.6.1 Crash Error: out of memory

2010-05-27 Thread Peng Guo
Thanks for your help :) On Thu, May 27, 2010 at 10:38 PM, Jonathan Ellis wrote: > It looks like you simply don't have a large enough heap for all the > in-flight data. > > Low-hanging fruit includes > > - upgrade to 0.6.2 (available from > http://people.apache.org/~eevans/

Re: Thoughts on adding complex queries to Cassandra

2010-05-27 Thread Jake Luciani
I've secretly started working on this but nothing to show yet :( I'm calling it SliceDiceReduce or SliceReduce. The plan is to use the js thrift bindings I've added for 0.3 release of thrift (out very soon?) This will allow the supplied js to access the results like any other thrift clie

Re: Thoughts on adding complex queries to Cassandra

2010-05-27 Thread Jeremy Davis
I agree, I had more than filter results in mind. Though I had envisioned the results to continue to use the List (and not JSON). You could still create new result columns that do not in any way exist in Cassandra, and you could still stuff JSON in to any of result columns. I had envisioned: list g

Re: Thoughts on adding complex queries to Cassandra

2010-05-27 Thread Jake Luciani
I had this: string slice_dice_reduce(1:required list key, 2:required ColumnParent column_parent, 3:required SlicePredicate predicate, 4:required ConsistencyLevel consistency_level=ONE

ec2 tests

2010-05-27 Thread Chris Dean
I'm interested in performing some simple performance tests on EC2. I was thinking of using py_stress and Cassandra deployed on 3 servers with one separate machine to run py_stress. Are there any particular configuration settings I should use? I was planning on changing the JVM heap size to refle

Re: Cassandra CF sharding

2010-05-27 Thread Jonathan Ellis
2) is correct, but for 1) I'm not sure what manageability improvements you anticipate from dealing with multiple entities instead of one. I'm not sure what you're thinking of for 3) but routing is done by key only. 2010/5/27 Maxim Kramarenko : > Hello! > > We have mail archive with one large CF fo

Re: ec2 tests

2010-05-27 Thread Mark Greene
If you give us an objective of the test that will help. Trying to get max write throughput? Read throughput? Weak consistency? On Thu, May 27, 2010 at 8:48 PM, Chris Dean wrote: > I'm interested in performing some simple performance tests on EC2. I > was thinking of using py_stress and Cassandr