Re: Brisk with standard C* cluster

2012-01-18 Thread aaron morton
Yes, you can add nodes in a second DC that have cassandra and brisk. This will keep the analytics load of the original nodes. There is some documentation here http://www.datastax.com/docs/0.8/brisk/index You may have better luck with user group http://groups.google.com/group/brisk-users or the

Incremental backups

2012-01-18 Thread Michael Vaknine
Hi, I am configured to do incremental backups on all my node on the cluster but it is not working. In cassandra.yaml : incremental_backups: true When I check data folder there are some keyspaces that has folder backups but empty and I suspect this is a folder created in the past when I had 0.7.6

Re: Hector + Range query problem

2012-01-18 Thread aaron morton
Does this help ? http://wiki.apache.org/cassandra/FAQ#range_rp Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/01/2012, at 10:58 AM, Philippe wrote: > Hello, > I've been trying to retrieve rows based on key range but every single time I

Re: specifying initial cassandra schema

2012-01-18 Thread aaron morton
check the command line help for cassandra-cli, you can pass it a file name. e.g. cassandra --host localhost --file schema.txt Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/01/2012, at 9:35 AM, Carlos Pérez Miguel wrote: > Hi Ramesh >

Re: Incremental backups

2012-01-18 Thread Alain RODRIGUEZ
As this option is in the cassandra.yaml file, you might need to perform a restart of your entire cluster (a rolling restart should work). Hope this will help. Alain 2012/1/18 Michael Vaknine > Hi, > > I am configured to do incremental backups on all my node on the cluster > but it is not w

Re: nodetool ring question

2012-01-18 Thread aaron morton
Good idea Jeremiah, are you using compression Michael ? Scanning through the CF stats this jumps out… Column Family: Attractions SSTable count: 3 Space used (live): 27542876685 Space used (total): 1213220387 Thats 25Gb of live data

Re: poor Memtable performance on column slices?

2012-01-18 Thread Sylvain Lebresne
On Wed, Jan 18, 2012 at 2:44 AM, Josep Blanquer wrote: > Hi, > >  I've been doing some tests using wide rows recently, and I've seen some odd > performance problems that I'd like to understand. > > In particular, I've seen that the time it takes for Cassandra to perform a > column slice of a singl

Re: cassandra hit a wall: Too many open files (98567!)

2012-01-18 Thread Sylvain Lebresne
On Fri, Jan 13, 2012 at 8:01 PM, Thorsten von Eicken wrote: > I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: > > ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[CompactionExecutor:29

Re: nodetool ring question

2012-01-18 Thread R. Verlangen
I also have this problem. My data on nodes grows to roughly 30GB. After a restart only 5GB remains. Is a factor 6 common for Cassandra? 2012/1/18 aaron morton > Good idea Jeremiah, are you using compression Michael ? > > Scanning through the CF stats this jumps out… > > Column Fa

Re: Hector + Range query problem

2012-01-18 Thread Philippe
Hi aaron Nope: I'm using BOP...forgot to mention it in my original message. I changed it to a multiget and it works but i think the range would be more efficient so I'd really like to solve this. Thanks Le 18 janv. 2012 09:18, "aaron morton" a écrit : > Does this help ? > http://wiki.apache.org

RE: nodetool ring question

2012-01-18 Thread Michael Vaknine
I did restart the cluster and now it is normal 5GB. From: R. Verlangen [mailto:ro...@us2.nl] Sent: Wednesday, January 18, 2012 11:32 AM To: user@cassandra.apache.org Subject: Re: nodetool ring question I also have this problem. My data on nodes grows to roughly 30GB. After a restart only 5

Re: cassandra hit a wall: Too many open files (98567!)

2012-01-18 Thread Janne Jalkanen
1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason? https://issues.apache.org/jira/browse/CASSANDRA-3616 /Janne On Jan 18, 2012, at 03:52 , dir dir wrote: > Very Interesting Why you open so many file? Actually what kind of > system that is built by you until open so

RE: Incremental backups

2012-01-18 Thread Michael Vaknine
Hi, Thank you for response. I did restart for all the nodes and now I can see files in backup folders so It seems like it is working. During this process I have noticed to something very strange In data/City folder there are files that are not created in the snapshot folder (it looks like ol

Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-18 Thread Andrei Savu
Hi guys, I just want to the let you know that Apache Whirr trunk (the upcoming 0.7.1 release) can deploy Cassandra 1.0.7 on AWS EC2 & Rackspace Cloud. You can give it a try by running the following commands: https://gist.github.com/1632893 And the last thing we would appreciate any suggestions

RE: JMX BulkLoad weirdness

2012-01-18 Thread Scott Fines
I'm running 1.0.6 on both clusters. After running a nodetool repair on all machines, everything seems to be behaving correctly, and AFAIK, no data has been lost. If what you say is true and the exception was preventing a file from being used, then I imagine that the nodetool repair corrected th

Re: Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-18 Thread Jake Luciani
Thanks Andrei! On Wed, Jan 18, 2012 at 8:00 AM, Andrei Savu wrote: > Hi guys, > > I just want to the let you know that Apache Whirr trunk (the upcoming > 0.7.1 release) can deploy Cassandra 1.0.7 on AWS EC2 & Rackspace Cloud. > > You can give it a try by running the following commands: > https:

How to store unique visitors in cassandra

2012-01-18 Thread Alain RODRIGUEZ
I'm wondering how to modelize my CFs to store the number of unique visitors in a time period in order to be able to request it fast. I thought of sharding them by day (row = 20120118, column = visitor_id, value = '') and perform a getcount. This would work to get unique visitors pe

Re: How to store unique visitors in cassandra

2012-01-18 Thread Lucas de Souza Santos
sharding them by day (row = 20120118, column = visitor_id, > value = '') and perform a getcount. This would work to get unique visitors > per day, per week or per month but it wouldn't work if I want to get unique > visitors between 2 specific dates because 2 rows can share

Max records per node for a given secondary index value

2012-01-18 Thread Kamal Bahadur
Hi All, It is great to know that Cassandra column family can accommodate 2 billion columns per row! I was reading about how Cassandra stores the secondary index info internally. I now understand that the index related data are stored in hidden CF and each node is responsible to store the keys of d

Re: Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-18 Thread Rustam Aliyev
Hi Andrei, As you know, we are using Whirr for ElasticInbox (https://github.com/elasticinbox/whirr-elasticinbox). While testing we encountered a few minor problems which I think could be improved. Note that we were using 0.6 (there were some strange bug in 0.7, maybe fixed already). Althoug

Re: specifying initial cassandra schema

2012-01-18 Thread Ramesh Natarajan
Thanks and appreciate the responses. Will look into this. thanks Ramesh On Wed, Jan 18, 2012 at 2:27 AM, aaron morton wrote: > check the command line help for cassandra-cli, you can pass it a file name. > > e.g. cassandra --host localhost --file schema.txt > > Cheers > > - > Aaro

Re: poor Memtable performance on column slices?

2012-01-18 Thread Josep Blanquer
Excellent Sylvain! Yes, that seems to remove the linear scan component of slice read times. FYI, I still see some interesting difference in some aspects though. If I do a slice without a start (i.e., get me the first column)...it seems to fly. GET("K", :count => 1 ) -- 4.832877 -->> very fast, a

Re: poor Memtable performance on column slices?

2012-01-18 Thread Jonathan Ellis
On Wed, Jan 18, 2012 at 12:31 PM, Josep Blanquer wrote: > If I do a slice without a start (i.e., get me the first column)...it seems > to fly. GET("K", :count => 1 ) Yep, that's a totally different code path (SimpleSliceReader instead of IndexedSliceReader) that we've done to optimize this common

Re: Unbalanced cluster with RandomPartitioner

2012-01-18 Thread aaron morton
If you have performed any token moves the data will not be deleted until you run nodetool cleanup. To get a baseline I would run nodetool compact to do major compaction and purge any tomb stones as others have said. Cheers - Aaron Morton Freelance Developer @aaronmorton http:

Re: nodetool ring question

2012-01-18 Thread aaron morton
Michael, Robin Let us know if the reported live load is increasing and diverging from the on disk size. If it is can you check nodetool cfstats and find an example of a particular CF where Space Used Live has diverged from the on disk size. The provide the schema for the CF and

Re: Incremental backups

2012-01-18 Thread aaron morton
Looks like you are on a 0.7.X release, which one exactly ? It would be a really good idea to at least be on 8.X, preferably 1.0 Pre 1.0 compacted SSTables were removed during JVM GC, but compacted SSTables have a .Compacted file created so we know they are no longer needed. These SSTables loo

Re: Max records per node for a given secondary index value

2012-01-18 Thread Kamal Bahadur
Anyone? On Wed, Jan 18, 2012 at 9:53 AM, Kamal Bahadur wrote: > Hi All, > > It is great to know that Cassandra column family can accommodate 2 billion > columns per row! I was reading about how Cassandra stores the secondary > index info internally. I now understand that the index related data ar

Re: Max records per node for a given secondary index value

2012-01-18 Thread Mohit Anchlia
You need to shard your rows On Wed, Jan 18, 2012 at 5:46 PM, Kamal Bahadur wrote: > Anyone? > > > On Wed, Jan 18, 2012 at 9:53 AM, Kamal Bahadur > wrote: >> >> Hi All, >> >> It is great to know that Cassandra column family can accommodate 2 billion >> columns per row! I was reading about how Cas

Re: poor Memtable performance on column slices?

2012-01-18 Thread Josep Blanquer
On Wed, Jan 18, 2012 at 12:44 PM, Jonathan Ellis wrote: > On Wed, Jan 18, 2012 at 12:31 PM, Josep Blanquer > wrote: > > If I do a slice without a start (i.e., get me the first column)...it > seems > > to fly. GET("K", :count => 1 ) > > Yep, that's a totally different code path (SimpleSliceReader

RE: Incremental backups

2012-01-18 Thread Michael Vaknine
I am on 1.0.3 release and it looks like very old files that remained from the upgrade process. How can I verify that? Michael From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, January 19, 2012 2:22 AM To: user@cassandra.apache.org Subject: Re: Incremental backups Loo

Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-18 Thread Ertio Lew
I believe the timestamps *on per column basis* are only required until the compaction time after that it may also work if the timestamp range could be specified globally on per SST table basis. and thus the timestamps until compaction are only required to be measure the time from the initialization

Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-18 Thread Maxim Potekhin
I must have accidentally deleted all messages in this thread save this one. On the face value, we are talking about saving 2 bytes per column. I know it can add up with many columns, but relative to the size of the column -- is it THAT significant? I made an effort to minimize my CF footprint