Re: Anyone using hadoop/MapReduce integration currently?

2010-05-26 Thread gabriele renzi
On Tue, May 25, 2010 at 6:35 PM, Jeremy Hanna wrote: > What is the use case? we end up with messed up data in the database, we run a mapreduce job to find irregular data from time to time. > Why are you using Cassandra versus using data stored in HDFS or HBase? as of now our mapreduce task i

Continuously increasing RAM usage

2010-05-26 Thread James Golick
We're seeing RAM usage continually climb until eventually, cassandra becomes unresponsive. The JVM isn't OOM'ing. It has only committed 14/24GB of memory. So, I am assuming that the memory usage is related to mmap'd IO. Fair assumption? I tried setting the IO mode to standard, but it seemed to be

Re: Error reporting Key cache hit rate with cfstats or with JMX

2010-05-26 Thread Ran Tavory
so the row cache contains both rows and keys and if I have large enough row cache (in particular if row cache size equals key cache size) then it's just wasteful to keep another key cache and I should eliminate the key-cache, correct? On Thu, May 27, 2010 at 1:21 AM, Jonathan Ellis wrote: > It s

Re: Anyone using hadoop/MapReduce integration currently?

2010-05-26 Thread 朱蓝天
2010/5/26 Utku Can Topçu > Hi Jeremy, > > > > Why are you using Cassandra versus using data stored in HDFS or HBase? > - I'm thinking of using it for realtime streaming of user data. While > streaming the requests, I'm also using Lucandra for indexing the data in > realtime. It's a better option

Re: Cassandra's 2GB row limit and indexing

2010-05-26 Thread Jonathan Shook
The example is a little confusing. .. but .. 1) "sharding" You can square the capacity by having a 2-level map. CF1->row->value->CF2->row->value This means finding some natural subgrouping or hash that provides a good distribution. 2) "hashing" You can also use some additional key hashing to sp

Cassandra's 2GB row limit and indexing

2010-05-26 Thread Richard West
Hi all, I'm currently looking at new database options for a URL shortener in order to scale well with increased traffic as we add new features. Cassandra seems to be a good fit for many of our requirements, but I'm struggling a bit to find ways of designing certain indexes in Cassandra due to its

RE: Thoughts on adding complex queries to Cassandra

2010-05-26 Thread Nicholas Sun
I'm very curious on this topic as well. Mainly, I'd like to know is this functionality handled through Map/Reduce HADOOP operations? Nick From: Jeremy Davis [mailto:jerdavis.cassan...@gmail.com] Sent: Wednesday, May 26, 2010 3:31 PM To: user@cassandra.apache.org Subject: Thoughts on addi

Thoughts on adding complex queries to Cassandra

2010-05-26 Thread Jeremy Davis
Are there any thoughts on adding a more complex query to Cassandra? At a high level what I'm wondering is: Would it be possible/desirable/in keeping with the Cassandra plan, to add something like a javascript blob on to a get range slice etc, that does some further filtering on the results before

Re: Error reporting Key cache hit rate with cfstats or with JMX

2010-05-26 Thread Jonathan Ellis
It sure sounds like you're seeing the "my row cache contains the entire hot data set, so the key cache only gets the cold reads" effect. On Wed, May 26, 2010 at 2:54 PM, Ran Tavory wrote: > If I disable row cache the numbers look good - key cache hit rate is > 0, so > it seems to be related to ro

Re: Best Timestamp?

2010-05-26 Thread Mark Robson
On 26 May 2010 22:56, Miguel Verde wrote: > Right, in C# this would be (not the most efficient way, but you get the > idea): > long timestamp = (DateTime.Now.Ticks - new DateTime(1970, 1, 1).Ticks)/10; > > > Yeah, you're fine provided: a) All your client applications (which perform writes) are c

Re: Best Timestamp?

2010-05-26 Thread Miguel Verde
Right, in C# this would be (not the most efficient way, but you get the idea): long timestamp = (DateTime.Now.Ticks - new DateTime(1970, 1, 1).Ticks)/10; On Wed, May 26, 2010 at 4:50 PM, Mark Robson wrote: > On 26 May 2010 22:42, Steven Haar wrote: > >> What is the best timestamp to use while

Re: Best Timestamp?

2010-05-26 Thread Mark Robson
On 26 May 2010 22:42, Steven Haar wrote: > What is the best timestamp to use while using Cassandra with C#? I have > been using DateTime.Now.Ticks, but I have seen others using different > things. > The standard that most clients seem to use is epoch-microseconds, or microseconds since midnight

Best Timestamp?

2010-05-26 Thread Steven Haar
What is the best timestamp to use while using Cassandra with C#? I have been using DateTime.Now.Ticks, but I have seen others using different things. Thanks.

Re: Doing joins between column familes

2010-05-26 Thread Jonathan Shook
I wrote some Iterable<*> methods to do this for column families that share key structure with OPP. It is on the hector examples page. Caveat emptor. It does iterative chunking of the working set for each column family, so that you can set the nominal transfer size when you construct the Iterator/I

Re: Error reporting Key cache hit rate with cfstats or with JMX

2010-05-26 Thread Ran Tavory
If I disable row cache the numbers look good - key cache hit rate is > 0, so it seems to be related to row cache. Interestingly, after running for a really long time and with both row and keys caches I do start to see Key cache hit rate > 0 but the numbers are so small that it doesn't make sense.

Re: Doing joins between column familes

2010-05-26 Thread Charlie Mason
On Wed, May 26, 2010 at 7:45 PM, Dodong Juan wrote: > > So I am not sure if you guys are familiar with OCM . Basically it is an ORM > for Cassandra. Been testing it > In case anyone is interested I have posted a reply on the OCM issue tracker where this was also raised. http://github.com/charlie

Re: Order Preserving Partitioner

2010-05-26 Thread Jonathan Shook
I don't think that queries on a key range are valid unless you are using OPP. As far as hashing the key for OPP goes, I take it to be the same a not using OPP. It's really a matter of where it gets done, but it has much the same effect. (I think) Jonathan On Wed, May 26, 2010 at 12:51 PM, Peter H

Re: nodetool move looks stuck

2010-05-26 Thread Jonathan Ellis
Are there any exceptions in the log like the one in https://issues.apache.org/jira/browse/CASSANDRA-1019 ? If so you'll need to restart the moving node and try again. On Wed, May 26, 2010 at 3:54 AM, Ran Tavory wrote: > I ran nodetool move on one of the nodes and it seems stuck for a few hours >

Doing joins between column familes

2010-05-26 Thread Dodong Juan
So I am not sure if you guys are familiar with OCM . Basically it is an ORM for Cassandra. Been testing it So I have created model that has the following object relationship. OCM generates the code from this that allows me to do easy programmatic query from Java to Cassandra. Object1-(M

Subscribe

2010-05-26 Thread Nazario Parsacala
Sent from my iPhone

Re: using more than 50% of disk space

2010-05-26 Thread Sean Bridges
So after CASSANDRA-579, anti compaction won't be done on the source node, and we can use more than 50% of the disk space if we use multiple column families? Thanks, Sean On Wed, May 26, 2010 at 10:01 AM, Stu Hood wrote: > See https://issues.apache.org/jira/browse/CASSANDRA-579 for some > backg

Re: Order Preserving Partitioner

2010-05-26 Thread Peter Hsu
Correct me if I'm wrong here. Even though you can get your results with Random Partitioner, it's a lot less efficient if you're going across different machines to get your results. If you're doing a lot of range queries, it makes sense to have things ordered sequentially so that if you do need

Two threads inserting columns into same key followed by read gets unexpected results

2010-05-26 Thread Scott McCarty
Hi, I'm seeing a problem with inserting columns into one key using multiple threads and I'm not sure if it's a bug or if it's my misunderstanding of how insert/get_slice should work. My setup is that I have two separate client processes, each with a single thread, writing concurrently to Cassandr

RE: using more than 50% of disk space

2010-05-26 Thread Stu Hood
See https://issues.apache.org/jira/browse/CASSANDRA-579 for some background here: I was just about to start working on this one, but it won't make it in until 0.7. -Original Message- From: "Sean Bridges" Sent: Wednesday, May 26, 2010 11:50am To: user@cassandra.apache.org Subject: using

using more than 50% of disk space

2010-05-26 Thread Sean Bridges
We're investigating Cassandra, and we are looking for a way to get Cassandra use more than 50% of it's data disks. Is this possible? For major compactions, it looks like we can use more than 50% of the disk if we use multiple similarly sized column families. If we had 10 column families of the s

Re: Avro Example Code

2010-05-26 Thread David Wellman
Fantastic! Thank you. On May 26, 2010, at 8:38 AM, Jeff Hammerbacher wrote: > I've got a mostly working Avro server and client for HBase at > http://github.com/hammer/hbase-trunk-with-avro and > http://github.com/hammer/pyhbase. If you replace "scan" with "slice", it > shouldn't be too much di

Re: Avro Example Code

2010-05-26 Thread Jeff Hammerbacher
I've got a mostly working Avro server and client for HBase at http://github.com/hammer/hbase-trunk-with-avro and http://github.com/hammer/pyhbase. If you replace "scan" with "slice", it shouldn't be too much different for Cassandra... On Mon, May 17, 2010 at 10:31 AM, Wellman, David wrote: > I s

RE: Moving/copying columns in between ColumnFamilies

2010-05-26 Thread Dop Sun
In Thrift API, I guess you need to use read/ insert and then delete to implement the move action. If you can shut the Cassandra down, maybe you can try to sstable2json to export data out, and json2sstable to import back to different column family file? I did not do it before, but I guess it

Re: Moving/copying columns in between ColumnFamilies

2010-05-26 Thread Utku Can Topçu
Sorry I now realized that I used the wrong terminology. What I really meant was, moving or copying the ROWS defined by a KeyRange in between ColumnFamilies. Do you think it's doable with an efficient way? On Wed, May 26, 2010 at 3:14 PM, Dop Sun wrote: > There are no single API call to achieve

RE: Moving/copying columns in between ColumnFamilies

2010-05-26 Thread Dop Sun
There are no single API call to achieve this. It’s read and write, plus a delete (if move) API calls I guess. From: Utku Can Topçu [mailto:u...@topcu.gen.tr] Sent: Wednesday, May 26, 2010 9:09 PM To: user@cassandra.apache.org Subject: Moving/copying columns in between ColumnFamilies He

Moving/copying columns in between ColumnFamilies

2010-05-26 Thread Utku Can Topçu
Hey All, Assume I have two ColumnFamilies in the same keyspace and I want to move or copy a range of columns (defined by a keyrange) into another columnfamily. Do you think it's somehow possible and doable with the current support of the API, if so how? Best Regards, Utku

Re: Questions regarding batch mutates and transactions

2010-05-26 Thread Ran Tavory
The summary of your question is: is batch_mutate atomic in the general sense, meaning when used with multiple keys, multiple column families etc, correct? On Wed, May 26, 2010 at 12:45 PM, Todd Nine wrote: > Hey guys, > I originally asked this on the Hector group, but no one was sure of the >

nodetool move looks stuck

2010-05-26 Thread Ran Tavory
I ran nodetool move on one of the nodes and it seems stuck for a few hours now. I've been able to run it successfully in the past, but this time it looks stuck. Streams shows as if there's work in progress, but the same files have been at the same position for a few hours. I've also checked the c

Questions regarding batch mutates and transactions

2010-05-26 Thread Todd Nine
Hey guys, I originally asked this on the Hector group, but no one was sure of the answer. Can I get some feedback on this. I'd prefer to avoid having to use something like Cages if I can for most of our use cases. Long term I can see we'll need to use something like Cages, especially when it c

Re: Order Preserving Partitioner

2010-05-26 Thread David Boxenhorn
Just in case you don't know: You can do range searches on keys even with Random Partitioner, you just won't get the results in order. If this is good enough for you (e.g. if you can order the results on the client, or if you just need to get the right answer, but not the right order), then you shou

Re: batch mutation : how to delete whole row?

2010-05-26 Thread gabriele renzi
On Wed, May 26, 2010 at 9:54 AM, Mishail wrote: > You could either use 1 remove(keyspace, key, column_path, timestamp, > consistency_level) per aech key, or wait till > https://issues.apache.org/jira/browse/CASSANDRA-494 fixed (to use > SliceRange in the Deletion) thanks, I'm already doing that b

Re: batch mutation : how to delete whole row?

2010-05-26 Thread Mishail
You could either use 1 remove(keyspace, key, column_path, timestamp, consistency_level) per aech key, or wait till https://issues.apache.org/jira/browse/CASSANDRA-494 fixed (to use SliceRange in the Deletion) gabriele renzi wrote: > > Is it correct that I cannot perform a row delete via batchMuta

Re: batch mutation : how to delete whole row?

2010-05-26 Thread Sylvain Lebresne
This has been fixed in 0.7 (https://issues.apache.org/jira/browse/CASSANDRA-1027). Not sure this has been merged in 0.6 though. On Wed, May 26, 2010 at 9:05 AM, gabriele renzi wrote: > Hi everyone, > > in our test code we perform a dummy "clear" by reading all the rows > and deleting them (while

batch mutation : how to delete whole row?

2010-05-26 Thread gabriele renzi
Hi everyone, in our test code we perform a dummy "clear" by reading all the rows and deleting them (while waiting for cassandra 0.7 & CASSANDRA-531). A couple of days ago I updated our code to perform this operation using batchMutate, but there seem to be no way to perform a deletion of the whole