Re: newbie question: how do I know the total number of rows of a cf?

2011-03-29 Thread Sheng Chen
Thanks all. 2011/3/28 Stephen Connolly > for #2 you could pipe through wc -l to get the answer > > sort -n keys.txt | uniq | wc -l > > but both examples are just refinements of iterate. > > #1 is just a distributed iterate > #2 is just an optimized iterate based on knowledge of the on-disk > for

[ANN] Mojo's Cassandra Maven Plugin 0.7.4-1 released

2011-03-29 Thread Stephen Connolly
Hi, The Mojo team is pleased to announce the release of Mojo's Cassandra Maven Plugin version 0.7.4-1. Mojo's Cassandra Plugin is used when you want to install and control a test instance of Apache Cassandra from within your Apache Maven build. The plugin has the following goals. * cassandra:

Compaction doubles disk space

2011-03-29 Thread Sheng Chen
I use 'nodetool compact' command to start a compaction. I can understand that extra disk spaces are required during the compaction, but after the compaction, the extra spaces are not released. Before compaction: SSTable count: 10 space used (live): 19G space used (total): 21G After compaction: ss

Re: Compaction doubles disk space

2011-03-29 Thread Sheng Chen
>From a previous thread of the same topic, I used a force GC and the extra spaces are released. What about my second question? 2011/3/29 Sheng Chen > I use 'nodetool compact' command to start a compaction. > I can understand that extra disk spaces are required during the compaction, > but af

Re: Compaction doubles disk space

2011-03-29 Thread Sylvain Lebresne
> BTW, given that compaction requires double disk spaces, does it mean that I > should never reach half of my total disk space? > e.g. if I have 505GB data on 1TB disk, I cannot even delete any data at all. It is not so black and white. What is true is that in practice reaching half the disk shoul

Re: balance between concurrent_[reads|writes] and feeding/reading threads i clients

2011-03-29 Thread aaron morton
The concurrent_reads and concurrent_writes set the number of threads in the relevant thread pools. You can view the number of active and queued tasks using nodetool tpstats. The thread pool uses a blocking linked list for it's work queue with a max size of Integer.MAX_VALUE. So it's size is es

Re: Help on how to configure an off-site DR node.

2011-03-29 Thread aaron morton
Be aware that at RF 2 the Quorum is 2, so you cannot afford to lose a replica when working at Quorum. 3 is really the starting point if you want some redundancy. If you want to get your data offsite how about doing snapshots and moving them off site http://wiki.apache.org/cassandra/Operations#

Re: Compaction doubles disk space

2011-03-29 Thread Karl Hiramoto
Would it be possible to improve the current compaction disk space issue by compacting one only a few SSTables at a time then imediately deleting the old one? Looking at the logs it seems like deletions of old SSTables are taking longer than necessary. -- Karl

improving speed/space for repair/ompact Big O Notation of

2011-03-29 Thread Karl Hiramoto
Can someone roughly advise Big O() for number of keys in a CF? Is it advisable to partition data into more Column Famlies and Keyspaces to improve repair and compact performance? Thanks -- Karl

Re: Problem about freeing space after a major compaction

2011-03-29 Thread aaron morton
Cassandra will request a GC to free compacted SSTables if there is not sufficient space to write an SSTable or perform a compaction. Aaron On 29 Mar 2011, at 02:15, Roberto Bentivoglio wrote: > Thanks you again, we're going to update our enviroment. > > Regards, > Roberto > > On 28 March 201

Re: design cassandra issue client when moving from version 0.6.* to 0.7.3

2011-03-29 Thread aaron morton
There should only be one active request on the socket at a time. Otherwise things could get confused on the server side. Also is there a reason you are not calling CassandraClient::multiget_slice ? Aaron On 29 Mar 2011, at 10:59, Anurag Gujral wrote: > Hi All, > I am currently portin

Re: New committer Sylvain Lebresne

2011-03-29 Thread aaron morton
Congratulations Sylvain On 29 Mar 2011, at 11:47, Jake Luciani wrote: > Great job, well deserved Sylvain! > > On Mon, Mar 28, 2011 at 4:33 PM, Jonathan Ellis wrote: > The Cassandra PMC has voted to add Sylvain as a committer. > > Welcome, Sylvain, and thanks for the hard work! > > -- > Jonath

Re: Problem about freeing space after a major compaction

2011-03-29 Thread Roberto Bentivoglio
Hi Aaron, we had tried invoking a full GC on Cassandra without any success. The space is still used. Regards, Roberto On 29 March 2011 13:12, aaron morton wrote: > Cassandra will request a GC to free compacted SSTables if there is not > sufficient space to write an SSTable or perform a compacti

Re: Help on how to configure an off-site DR node.

2011-03-29 Thread Brian Lycett
Hi. Cheers for your reply. Unfortunately there's too much data for snapshots to be practical. The data set will be at least 400GB initially, and the offsite node will be on a 20Mbit leased line. However I don't need the consistency level to be quorum for read/writes in the production cluster, s

Two column families or One super column family?

2011-03-29 Thread T Akhayo
Good afternoon, I'm making my data model from scratch for cassandra, this means i can tune and fine tune it for performance. At this time i'm having problems choosing between a 2 column families or 1 super column family. I will illustrate with a example. Sector, this defines a place, this is one

International language implementations

2011-03-29 Thread A J
Can someone list some of the current international language implementations of cassandra ? Thanks.

NegativeArraySizeException during upgrade from 0.7.0 to 0.7.4

2011-03-29 Thread Wenjun Che
even after I ran compact on all keyspaces. Cassandra exits after the exception so I can't try "nodetool scrub". There is just one node. java.lang.NegativeArraySizeException at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:280) at org.apache.cassandra.db.C

How to determine if repair need to be run

2011-03-29 Thread mcasandra
Is there a way to monitor and tell if one of the node require repair? For eg: Node was down and came back up but in the meantime HH were dropped. Now unless we are really careful in all the scenarios we wouldn't have any problems :) but in general when things are going awry you might forget about r

Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> Is there a way to monitor and tell if one of the node require repair? For eg: > Node was down and came back up but in the meantime HH were dropped. Now > unless we are really careful in all the scenarios we wouldn't have any > problems :) but in general when things are going awry you might forget

Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
Yes but that doesn't really provide the monitoring that will really be helpful. If I don't realize it until 2 days then we potentially could be returning inconsistent results or not have data sync for 2 days until repair is run. It will be best to be able to monitor these things so that it can be r

Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> Yes but that doesn't really provide the monitoring that will really be > helpful. If I don't realize it until 2 days then we potentially could be > returning inconsistent results or not have data sync for 2 days until repair > is run. It will be best to be able to monitor these things so that it

Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
Thanks! I was keeping the discussion simple. But you make my case stronger that we need such monitoring since it looks like it should always be run but we want to run it as soon as it is required. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Ho

Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> Thanks! I was keeping the discussion simple. But you make my case stronger > that we need such monitoring since it looks like it should always be run but > we want to run it as soon as it is required. The way to deal with individual requests timing out or transient flapping, is to use a consiste

Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
I think my problem is that I don't want to remember to run read repair. I want to know from cassandra that I "need" to run repair "now". This seems like a important functionality that need to be there. I don't really want to find out hard way that I forgot to run "repair" :) Say Node A, B, C. Now

Re: NegativeArraySizeException during upgrade from 0.7.0 to 0.7.4

2011-03-29 Thread Jonathan Ellis
Remove the cache file. On Tue, Mar 29, 2011 at 11:44 AM, Wenjun Che wrote: > > even after I ran compact on all keyspaces.  Cassandra exits after the > exception so I can't try "nodetool scrub". > There is just one node. > > java.lang.NegativeArraySizeException >     at > org.apache.cassandra.db.C

client connection timeouts vs. thrift timeouts

2011-03-29 Thread David Hawthorne
I've been scratching my head on this one for a day now and I'm hoping someone can help clear it up. The initial question was: does it make sense to have a configurable connection timeout (for a client connecting to a cassandra server) separate from the thrift socket timeout (which governs *all*

Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
First some specifics: > I think my problem is that I don't want to remember to run read repair. I You are not expected to remember to do so manually. Typically periodic repairs would be automated in some fashion, such as by having a cron job on each node that starts the repair. Typically some kin

Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
Looks like you didn't get to see my updated post :) This is the scenario I was referring to: Say Node A, B, C. Now A is inconsistent and needs repair. Now after a day Node B goes down and comes up. Now both nodes are inconsistent. Even with Quorum this will fail read and write by returning inconsi

OOM in compaction - cassandra 0.7.4

2011-03-29 Thread Marek Żebrowski
Hi, I am getting repeatable OOM during compaction: ERROR [CompactionExecutor:1] 2011-03-29 14:52:29,193 AbstractCassandraDaemon.java (line 112) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2

Re: client connection timeouts vs. thrift timeouts

2011-03-29 Thread Narendra Sharma
I think it make sense to have two different timeouts. The client timeout clearly affects user's experience with the application. The timeout could be due to number of factors not directly related to thrift connection. The client timeout could trigger retry of operation on a different instance of th

Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> Looks like you didn't get to see my updated post :) This is the scenario I > was referring to: I don't see what's different. If you write a QUORUM and read at QUORUM, your read is guaranteed to see a previous write, period. If that cannot be satisfied, the read will fail due to not being able to

Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
So from what I am understanding is that there is no need to monitor this and no need to remember running repair? If that's the case then manual repair wouldn't be needed ever, correct? But if Manual repair is needed then shouldn't there be ability to monitor? Having dealt with production problems

Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> So from what I am understanding is that there is no need to monitor this and > no need to remember running repair? If that's the case then manual repair > wouldn't be needed ever, correct? No. See my next-to-last e-mail where I go through two reasons to run nodetool repair, of which (a) is absol

Re: International language implementations

2011-03-29 Thread Peter Schuller
> Can someone list some of the current international language > implementations of cassandra ? What is an "international language implementation of Cassandra"? -- / Peter Schuller

Re: International language implementations

2011-03-29 Thread Sasha Dolgy
I store multiple languages in a cf if this is what you are on about... On Mar 29, 2011 5:42 PM, "Peter Schuller" wrote: >> Can someone list some of the current international language >> implementations of cassandra ? > > What is an "international language implementation of Cassandra"? > > -- > / P

Re: International language implementations

2011-03-29 Thread A J
Example, taobao.com is a chinese online bid site. All data is chinese and they use Mongodb successfully. Are there similar installations of cassandra where data is non-latin ? I know in theory, it should all work as cassandra has full utf-8 support. But unless there are real implementations, you c

Re: Help on how to configure an off-site DR node.

2011-03-29 Thread aaron morton
Snapshots take use a hard link and do not take additional disk space http://www.mail-archive.com/user@cassandra.apache.org/msg11028.html WRT losing a node, it's not the number of total nodes thats important is the number of replicas. If you have 3 nodes with RF2 and you lose one of the replicas

Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
I think what I feel is that there is a need to know if repair is required flag in order for team to manage the cluster. Atleast at minimum, Is there a flag somewhere that tells if repair was run within GCGracePeriod? -- View this message in context: http://cassandra-user-incubator-apache-org.306

Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> I think what I feel is that there is a need to know if repair is required > flag in order for team to manage the cluster. And again, repair is always required essentially. You should *always* run it within the necessary period as determined by GCGraceSeconds. > Atleast at minimum, Is there a fl

Re: Two column families or One super column family?

2011-03-29 Thread aaron morton
I would go with the solution that means you only have to make one request to serve your reads, so consider the super CF approach. There are some downsides to super columns see http://wiki.apache.org/cassandra/CassandraLimitations and they tend to have a love-them-hate-them reputation. One thi

Re: OOM in compaction - cassandra 0.7.4

2011-03-29 Thread Tyler Hobbs
You might want to lower your in memory compaction limit, but I would also recommend checking your heap sizeand monitoring (with some

Ditching Cassandra

2011-03-29 Thread Gregori Schmidt
hi, After using Cassandra during development for the past 8 months my team and I made the decision to switch from Cassandra to MongoDB this morning. I thought I'd share some thoughts on why we did this and where Cassandra might benefit from improvement. - The API is horrible and it produces p

Re: Ditching Cassandra

2011-03-29 Thread Drew Kutcharian
Hi Gregori, I'm about to start a new project and I was considering using MongoDB too, but I just couldn't find a nice way to scale it. Seems like for scaling you need to use the same style as MySQL, having master/slaves and replicas, which for us was a deal breaker. We just couldn't see how you

Re: Ditching Cassandra

2011-03-29 Thread Eric Evans
On Wed, 2011-03-30 at 02:11 +0200, Gregori Schmidt wrote: >- The API is horrible and it produces pointlessly verbose code in >addition to being utterly confusing. EVERYTHING takes a lot of > time to implement with Cassandra, and to be frank, it is incredibly > tiring. For this reason alon

Re: Ditching Cassandra

2011-03-29 Thread Jake Luciani
Hi Gregori, What language *were* you using to interact with cassandra? were you unable to find a wrapper API that you found We have discussed adopting the "best of" client api's in cassandra but we decided it's better for the community to naturally develop them. I think this has also motivated E

Re: Ditching Cassandra

2011-03-29 Thread Colin
Eric, Seems like the answer to everything is 8. 8 has been very painful. Are you saying that 8 will or not be compatible with 7? If not, would you recommend waiting until 8? We have done an awful lot of work, have an awful lot of work left, and have become very frustrated. Any idea on when

Re: Compaction doubles disk space

2011-03-29 Thread Sheng Chen
Yes. I think at least we can remove the tombstones for each sstable first, and then do the merge. 2011/3/29 Karl Hiramoto > Would it be possible to improve the current compaction disk space issue by > compacting one only a few SSTables at a time then imediately deleting the > old one? Looking

Re: Ditching Cassandra

2011-03-29 Thread Edward Capriolo
On Tue, Mar 29, 2011 at 9:56 PM, Colin wrote: > Eric, > > Seems like the answer to everything is 8. > > 8 has been very painful. > > Are you saying that 8 will or not be compatible with 7? > > If not, would you recommend waiting until 8?  We have done an awful lot of > work, have an awful  lot of

Re: Ditching Cassandra

2011-03-29 Thread mcasandra
I am also interested in knowing when 8 will be released. Also, is there someplace where we can read about features that will be relased in 8? Looks like some major changes are going to come out. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Ditc

Re: International language implementations

2011-03-29 Thread Edward Capriolo
On Tue, Mar 29, 2011 at 5:54 PM, A J wrote: > Example, taobao.com is a chinese online bid site. All data is chinese > and they use Mongodb successfully. > Are there similar installations of cassandra where data is non-latin ? > > I know in theory, it should all work as cassandra has full utf-8 > s

Data Modeling advise for Cassandra 0.8

2011-03-29 Thread Drew Kutcharian
I'm pretty new to Cassandra and I would like to get your advice on modeling. The object model of the project that I'm working on will be pretty close to Blogger, Tumblr, etc. (or any other blogging website). Where you have Users, that each can have many Blogs and each Blog can have many comments

Revised: Data Modeling advise for Cassandra 0.8 (added #8)

2011-03-29 Thread Drew Kutcharian
I'm pretty new to Cassandra and I would like to get your advice on modeling. The object model of the project that I'm working on will be pretty close to Blogger, Tumblr, etc. (or any other blogging website). Where you have Users, that each can have many Blogs and each Blog can have many comments

RE: Ditching Cassandra

2011-03-29 Thread Colin
Edward, My issue isn't in doing the work, I just don't want to do a lot of work if 8 is going to be out in a month or two. That's just common sense. Especially if I can't upgrade an existing implementation without incurring undue risk. -Original Message- From: Edward Capriolo [mailto:

Re: Ditching Cassandra

2011-03-29 Thread Eric Evans
On Tue, 2011-03-29 at 20:56 -0500, Colin wrote: > Are you saying that 8 will or not be compatible with 7? You will be able to perform a rolling upgrade from 0.7.x to 0.8. That is to say, you'll be able to upgrade each node one at a time, mixing 0.7 and 0.8 nodes until the upgrade is complete. >

Re: Ditching Cassandra

2011-03-29 Thread Eric Evans
On Tue, 2011-03-29 at 19:58 -0700, mcasandra wrote: > I am also interested in knowing when 8 will be released. We're targeting the week of May 9th. > Also, is there someplace where we can read about features that will be > relased in 8? Looks like some major changes are going to come out. The

RE: Ditching Cassandra

2011-03-29 Thread Colin
Thank you Eric. I appreciate it. -Original Message- From: Eric Evans [mailto:eev...@rackspace.com] Sent: Tuesday, March 29, 2011 11:47 PM To: user@cassandra.apache.org Subject: Re: Ditching Cassandra On Tue, 2011-03-29 at 20:56 -0500, Colin wrote: > Are you saying that 8 will or not be