Re: Cassandra vs MongoDB

2010-07-29 Thread Jeff Hammerbacher
Having participated in the design of a few of these systems being mentioned, I'll chime in here and point out that the combination of Flume and Hive makes CDH3 very useful for log processing and that use case is directly in the wheelhouse of the system, especially for large collections of log files

Re: Please need help with Munin: Cassandra Munin plugin problem

2010-07-29 Thread Miriam Allalouf
Hi, Please, can someone help us with Munin?? Thanks, Miriam On Mon, Jul 26, 2010 at 1:58 PM, osishkin osishkin wrote: > Hi, > > I'm trying to use Munin to monitor cassandra. > I've seen other people using munin here ,so I hope someone ran into > this problem. > The default plugins are working,

Re: Please need help with Munin: Cassandra Munin plugin problem

2010-07-29 Thread Dave Viner
Is your code posted somewhere such that others could try it? On Thu, Jul 29, 2010 at 5:57 AM, Miriam Allalouf wrote: > Hi, > Please, can someone help us with Munin?? > Thanks, > Miriam > > > On Mon, Jul 26, 2010 at 1:58 PM, osishkin osishkin > wrote: > > Hi, > > > > I'm trying to use Munin to m

0.6.4 tag

2010-07-29 Thread B. Todd Burruss
i see a 0.6.4 tag in SVN, but not on cassandra's download page. is this ready for use if building from SVN?

Re: 0.6.4 tag

2010-07-29 Thread Gary Dusbabek
The vote is in process. http://permalink.gmane.org/gmane.comp.db.cassandra.devel/2010 Gary. On Thu, Jul 29, 2010 at 11:34, B. Todd Burruss wrote: > i see a 0.6.4 tag in SVN, but not on cassandra's download page.  is this > ready for use if building from SVN? > >

Date loading pattern with OrderPreservingPartiton

2010-07-29 Thread Rana Aich
Hi All, We are working with a Cassandra Cluster consisting 3 nodes with each having storage capacity of 0.5 Terabytes. We are loading the data into the cluster with OrderPreservingPartition and with Replication Factor 2. The Data that has been loaded so far looks as follows: Address Status

Re: Evaluating Cassandra for our use case

2010-07-29 Thread Russ Brown
On Wed, Jul 28, 2010 at 9:13 PM, Aaron Morton wrote: > Have you considered Redis http://code.google.com/p/redis/? > > It may be more suited to the master-slave configuration you are after. > > - You can have a master to write to, then slave to a slave master, then your > web heads run a local redi

Re: any better way to retrieve data than using get_range_slices

2010-07-29 Thread Ken Matsumoto
Thank you, Aaron. Yes, we're now thinking Hadoop would be one of choices, too. So far, it doesn't matter if we use "SQL" or not as long as Cassandra can process millions of rows at a time in a practical time. As a result, what kind of patterns should be Cassandra more powerful than MySQL from t

Re: Cassandra benchmarking on Rackspace Cloud

2010-07-29 Thread Oren Benjamin
Just wanted to follow up on this. We were never able to achieve throughput scaling in the cloud. We were able to verify that many of our cluster nodes and test servers were collocated on the same physical hardware (thanks Stu for the tip on the Rackspace REST API), and that performance on coll

Re: Consequences of Cassandra key NOT unique

2010-07-29 Thread Benjamin Black
You are both confusing columns with rows. Columns have timestamps, row keys do not. On Wed, Jul 28, 2010 at 11:37 PM, Thorvaldsson Justus wrote: > You insert 500 rows with key “x” > > And 1000 rows with key “y” > > You make a query getting all rows. > > It will only show two rows, the ones with

Columns limit

2010-07-29 Thread Mark
Is there any limitations on the number of columns a row can have? Does all the day for a single key need to reside on a single host? If so, wouldn't that mean there is an implicit limit on the number of columns one can have... ie the disk size of that machine. What is the proper way to handle

Avro Runtime Exception Bad Index

2010-07-29 Thread Arya Goudarzi
Just wanted to toss this out there in case if this is an issue or the format really changed and have to start from a clean slate. I was running from yesterday's trunc and had some Keyspaces with data. Today's trunc failed server start giving this exception: ERROR [main] 2010-07-29 14:05:21,489

Re: Date loading pattern with OrderPreservingPartiton

2010-07-29 Thread Aaron Morton
Yes the OPP will could give you a distribution like that. Given that only two nodes have data, and they seem to have same amount of data. I wonder if all your keys are falling into the key range of the last node? So with RF 2 they go to the last and first node only. As an experiment you could try r

Re: Index/Count/Order by syntax

2010-07-29 Thread Aaron Morton
One method would be to use a Super Column Family. Have one row, in that create a column family for each count value you have, and then in the super column create a column for each word. Set the CompareWith for the super col to be LongType and the CompareSubcolumnsWith to be AsciiTyoe or UTFType. Yo

RE: Avro Runtime Exception Bad Index

2010-07-29 Thread Stu Hood
Can you determine approximately what revisions you were running before and after? -Original Message- From: "Arya Goudarzi" Sent: Thursday, July 29, 2010 4:42pm To: user@cassandra.apache.org Subject: Avro Runtime Exception Bad Index Just wanted to toss this out there in case if this is a

Re: Cassandra benchmarking on Rackspace Cloud

2010-07-29 Thread Peter Schuller
> Just wanted to follow up on this. > > We were never able to achieve throughput scaling in the cloud.  We were able > to verify that many of our cluster nodes and test servers were collocated on > the same physical hardware (thanks Stu for the tip on the Rackspace REST > API), and that performa

Re: Evaluating Cassandra for our use case

2010-07-29 Thread Aaron Morton
Thanks for this, Aaron. It does actually look like Redis may be better suited to our needs. I had originally discounted Redis because I had the impression that it had volatile storage only, but now I see that not to be the case. Thanks again! Yup, you've got Append Only, foreground  Snap Shot and

Re: cassandra 0.6.1 read returns wrong data?

2010-07-29 Thread Aaron Morton
I noticed this once when accidentally sharing connections around. Could that be the case ? What sort of commands are you running ? Could you be seeing this problem ?http://www.mail-archive.com/user@cassandra.apache.org/msg04831.htmlAaronOn 29 Jul, 2010,at 12:47 PM, Jianing Hu wrote:We recently mig

Re: importance of key cache vs row cache

2010-07-29 Thread Aaron Morton
Which type of cache is appropriate to your particular case depends on a variety of factors including the hotness and other access characteristics of your data set, the relationship of data set size to the heap size, row size to key size, and so forth. =Rob  A little of topic, but I remember re

Re: Index/Count/Order by syntax

2010-07-29 Thread Mark
Ok so basically an "array" of words grouped by their count? Something like this? { SearchLogs : { ALL : { 999: { word1:word1, word2:word2, word3:word3 } 998: { word1:word1, word2:word2, word3:word3 } } } } On 7/29/10 2:50 PM, Aaron Morton wrote: One metho

Re: Index/Count/Order by syntax

2010-07-29 Thread Aaron Morton
Yes, but as I said it may not be the optimal design. You may end up with a single row very big row. - you could use multiple rows, each holding a range of counts. - you could use a standard CF and store the count in the row key, then use get_range_slices. Using the random partition you will need to

Re: cassandra 0.6.1 read returns wrong data?

2010-07-29 Thread Jianing Hu
Hi Aaron, Thanks for the reply. Can you explain what you mean by "sharing connections around"? I'm just calling a simple "get", and the data returned is for a completely different key. It's intermittent and hard to produce in my test environment, but can be observed in our production environment

Re: cassandra 0.6.1 read returns wrong data?

2010-07-29 Thread Aaron Morton
I was accidentally sharing connections between threads, and getting strange results. Is your client multi threaded?Can you provide some more information, such as the client library, how the data is written and  how you're deciding that the returned results are the wrong ones. Is the read inconsiste

Re: Unreliable transport layer

2010-07-29 Thread ChingShen
Why? What reasons did you choose TCP? Shen On Sat, Mar 6, 2010 at 9:15 AM, Jonathan Ellis wrote: > In 0.6 gossip is over TCP. > > On Fri, Mar 5, 2010 at 6:54 PM, Ashwin Jayaprakash > wrote: > > Hey guys! I have a simple question. I'm a casual observer, not a real > > Cassandra user yet. So, ex

Re: cassandra 0.6.1 read returns wrong data?

2010-07-29 Thread Jianing Hu
That's an interesting thought. My code runs in FCGI and although the cassandra connection is used to serve multiple requests, those requests are supposedly processed sequentially, in a while ($request->Accept() >= 0) loop. However, we do call FCGI::finish to close the request (so the HTTP request w

Re: non blocking Cassandra with Tornado

2010-07-29 Thread Ryan Daum
An asynchronous thrift client in Java would be something that we could really use; I'm trying to get a sense of whether this async client is usable with Cassandra at this point -- given that Cassandra typically bundles a specific older Thrift version, would the technique described here work at all

Re: what causes MESSAGE-DESERIALIZER-POOL to spike

2010-07-29 Thread Chris Goffinet
When you can't get the number of threads, that means you have way too many running (8,000+) usually. Try running `ps -eLf | grep cassandra`. How many threads? -Chris On Jul 29, 2010, at 8:40 PM, Dathan Pattishall wrote: > > To Follow up on this thread. I blew away the data for my entire clust