Re: UnavailableException when data grows

2010-09-27 Thread Oleg Anastasyev
Rana Aich gmail.com> writes: > > Yet my nodetool shows the following: > > 192.168.202.202Down       319.94 GB     7200044730783885730400843868815072654      |<--| > 192.168.202.4 Up         382.39 GB     23719654286404067863958492664769598669     |   ^ > 192.168.202.2 Up         106.81 GB     3

Re: Best strategy for adding new nodes to the cluster

2010-09-27 Thread Michael Dürgner
Sent from my iPhone On 27.09.2010, at 19:30, Marc Canaleta wrote: > What do you mean by "running live"? I am also planning to use cassandra on > EC2 using small nodes. Small nodes have 1/4 cpu of the large ones, 1/4 cost, > but I/O is more than 1/4 (amazon does not give explicit I/O numbers.

Re: 0.7 memory usage problem

2010-09-27 Thread Benjamin Black
On Mon, Sep 27, 2010 at 3:48 PM, Alaa Zubaidi wrote: >  RF=2 With RF=2, QUORUM and ALL are the same. Again, your logs show you are attempting to insert about 180,000 columns/sec. The only way that is possible with your hardware is if you are using CL.ZERO. The available information does not ad

Re: 0.7 memory usage problem

2010-09-27 Thread Alaa Zubaidi
RF=2 Each process is processing 75 "rows". So, do you think that the cause of my problems is the high rate of inserts I am doing (coupled with the reads)? taking into consideration that, the first errors were heap overflow and after I disabled swapping it was stack overflow? I will try anothe

Re: 0.7 memory usage problem

2010-09-27 Thread Benjamin Black
Does that mean you are doing 600 rows/sec per process or 600/sec total across all processes? On Mon, Sep 27, 2010 at 3:14 PM, Alaa Zubaidi wrote: >  Its actually split to 8 different processes that are doing the insertion. > > Thanks > > On 9/27/2010 2:03 PM, Peter Schuller wrote: >> >> [note: i

Re: 0.7 memory usage problem

2010-09-27 Thread Benjamin Black
What is your RF? On Mon, Sep 27, 2010 at 3:13 PM, Alaa Zubaidi wrote: >  Sorry 3 means QUORUM. > > > On 9/27/2010 2:55 PM, Benjamin Black wrote: >> >> On Mon, Sep 27, 2010 at 2:51 PM, Benjamin Black  wrote: >>> >>> On Mon, Sep 27, 2010 at 12:59 PM, Alaa Zubaidi >>>  wrote: Thanks for th

Re: 0.7 memory usage problem

2010-09-27 Thread Alaa Zubaidi
I can test the single node on Windows now.. On 9/27/2010 2:02 PM, Jonathan Ellis wrote: How reproducible is this stack overflow? If you can reproduce it at will then I would like to see if you can also reproduce against (a) a single node Windows machine (b) a single node Linux machine On

Re: 0.7 memory usage problem

2010-09-27 Thread Alaa Zubaidi
Its actually split to 8 different processes that are doing the insertion. Thanks On 9/27/2010 2:03 PM, Peter Schuller wrote: [note: i put user@ back on CC but I'm not quoting the source code] Here is the code I am using (this is only for testing Cassandra it is not going the be used in produ

Re: 0.7 memory usage problem

2010-09-27 Thread Alaa Zubaidi
Sorry 3 means QUORUM. On 9/27/2010 2:55 PM, Benjamin Black wrote: On Mon, Sep 27, 2010 at 2:51 PM, Benjamin Black wrote: On Mon, Sep 27, 2010 at 12:59 PM, Alaa Zubaidi wrote: Thanks for the help. we have 2 drives using basic configurations, commitlog on one drive and data on another. and Y

Re: UnavailableException when data grows

2010-09-27 Thread Benjamin Black
Your ring is wildly unbalanced and you are almost certainly out of I/O on one or more nodes. You should be monitoring via JMX and common systems tools to know when you are starting to have issues. It is going to take you some effort to get out of this situation now. b On Mon, Sep 27, 2010 at 2

Re: Is there a debian 0.6.1 install package archived anywhere?

2010-09-27 Thread Peter Schuller
> We're running a cassandra cluster using 0.6.1 and we need to add a node.  Id > like to add another node with the same version so that I don't have to test a > mixed version cluster or test (and conduct) a whole-scale upgrade to > 0.6.5and I'd like to use the debian package to install.  Doe

Re: 0.7 memory usage problem

2010-09-27 Thread Benjamin Black
On Mon, Sep 27, 2010 at 2:51 PM, Benjamin Black wrote: > On Mon, Sep 27, 2010 at 12:59 PM, Alaa Zubaidi wrote: >> Thanks for the help. >> we have 2 drives using basic configurations, commitlog on one drive and data >> on another. >> and Yes the CL for writes is 3, however, the CL for reads is 1.

Re: UnavailableException when data grows

2010-09-27 Thread Rana Aich
Hi Peter, Thanks for your detailed query... I have 8 m/c cluster. KVSHIGH1,2,3,4 and KVSLOW1,2,3,4. As the name suggests KVSLOWs have low diskspace ~ 350GB Whereas KVSHIGHs have 1.5 terabytes. Yet my nodetool shows the following: 192.168.202.202Down 319.94 GB 72000447307838857304008438688

Re: 0.7 memory usage problem

2010-09-27 Thread Benjamin Black
On Mon, Sep 27, 2010 at 12:59 PM, Alaa Zubaidi wrote: > Thanks for the help. > we have 2 drives using basic configurations, commitlog on one drive and data > on another. > and Yes the CL for writes is 3, however, the CL for reads is 1. > It is simply not possible that you are inserting at CL.ALL

Is there a debian 0.6.1 install package archived anywhere?

2010-09-27 Thread Kyusik Chung
We're running a cassandra cluster using 0.6.1 and we need to add a node. Id like to add another node with the same version so that I don't have to test a mixed version cluster or test (and conduct) a whole-scale upgrade to 0.6.5and I'd like to use the debian package to install. Does anyone

Re: UnavailableException when data grows

2010-09-27 Thread Peter Schuller
> How can I handle this kind of situation? In terms of surviving the problem, a re-try on the client side might help assuming the problem is temporary. However, certainly the fact that you're seeing an issue to begin with is interesting, and the way to avoid it would depend on what the problem i

UnavailableException when data grows

2010-09-27 Thread Rana Aich
Hi, I'm having great difficulty in inserting data in 8 server Cassandra cluster (RandomPartition with RF 2). For the first one billion the data insertion was smooth. But slowly I'm getting Unavailable Exception from the cluster. And now I can't put not more than 30 million data at a one stretch be

Re: 0.7 memory usage problem

2010-09-27 Thread Peter Schuller
[note: i put user@ back on CC but I'm not quoting the source code] > Here is the code I am using (this is only for testing Cassandra it is not > going the be used in production) I am new to Java, but I tested this and it > seems to work fine when running for short amount of time: If you mean to a

Re: 0.7 memory usage problem

2010-09-27 Thread Jonathan Ellis
How reproducible is this stack overflow? If you can reproduce it at will then I would like to see if you can also reproduce against (a) a single node Windows machine (b) a single node Linux machine On Fri, Sep 24, 2010 at 3:03 PM, Alaa Zubaidi wrote: >  Nothing is working, after disabling swap

Re: 0.7 memory usage problem

2010-09-27 Thread Peter Schuller
> You are saying I am doing 36000 inserts per second, when I am inserting 600 > rows, I thought that every row goes into one Node, so the work is done for a > row not a column, so my assumption is NOT true, the work is done on a column > level? so if I reduce the number of columns I will get a "sub

Re: 0.7 memory usage problem

2010-09-27 Thread Alaa Zubaidi
Thanks for the help. we have 2 drives using basic configurations, commitlog on one drive and data on another. and Yes the CL for writes is 3, however, the CL for reads is 1. You are saying I am doing 36000 inserts per second, when I am inserting 600 rows, I thought that every _row_ goes into

Re: Best strategy for adding new nodes to the cluster

2010-09-27 Thread Peter Schuller
> What do you mean by "running live"? I am also planning to use cassandra on I believe live as in "in production". > EC2 using small nodes. Small nodes have 1/4 cpu of the large ones, 1/4 cost, > but I/O is more than 1/4 (amazon does not give explicit I/O numbers...), so > I think 4 small instanc

Re: Best strategy for adding new nodes to the cluster

2010-09-27 Thread Marc Canaleta
What do you mean by "running live"? I am also planning to use cassandra on EC2 using small nodes. Small nodes have 1/4 cpu of the large ones, 1/4 cost, but I/O is more than 1/4 (amazon does not give explicit I/O numbers...), so I think 4 small instances should perform better than 1 large one (and t

Re: How to Load Records From ColumnFamily by Condition Based on a Column Values.

2010-09-27 Thread Tyler Hobbs
Of course, those slice based on the column name, not value. So, you may want to consider using the date for the column name instead (in a more usable form, like "/MM/DD", a timestamp, or a TimeUUID). Alternatively, use a second ColumnFamily with those dates as column names and use it as an ind

Re: Best strategy for adding new nodes to the cluster

2010-09-27 Thread Jonathan Ellis
I strongly recommend not running live on Small nodes. So in your case I would recommend starting up Large instances with raid0'd disks, shut down cassandra on the Small ones, rsync to the Large, and start up on Large. On Mon, Sep 27, 2010 at 6:46 AM, Utku Can Topçu wrote: > Hi All, > > We're cur

Best strategy for adding new nodes to the cluster

2010-09-27 Thread Utku Can Topçu
Hi All, We're currently running a cassandra cluster with Replication Factor 3, consisting of 4 nodes. The current situation is: - The nodes are all identical (AWS small instances) - Data directory is in the partition (/mnt) which has 150G capacity and each node has around 90 GB load, so 60 G fre

Re: How to Retrieve all the rows from a ColumnFamily

2010-09-27 Thread Lucas Nodine
Commented your questions below. HTH - Lucas Nodine On Mon, Sep 27, 2010 at 7:35 AM, sekhar kosuru wrote: > Hi Lucas, > > Thanks for the reply. > > Please clarify me if i am wrong, this kind of filtering will work on Key > values in ColumnFamily or any other column available in the ColumnFamily.

Re: How to Retrieve all the rows from a ColumnFamily

2010-09-27 Thread sekhar kosuru
Hi Lucas, Thanks for the reply. Please clarify me if i am wrong, this kind of filtering will work on Key values in ColumnFamily or any other column available in the ColumnFamily. One more thing is here we are restricting the out come for 100 records, if we are not doing also api is resticting fo

Re: How to Retrieve all the rows from a ColumnFamily

2010-09-27 Thread Lucas Nodine
Example code using C# below: Collections.Generic.List results; SlicePredicate predicate; ColumnParent cp; // Create Slice predicate = new SlicePredicate() { Slice_range = new SliceRange() { Start = _utf8.GetBytes(""), Finish = _utf8.GetBytes(""), Count = 100, Reversed = false

Re: How to Load Records From ColumnFamily by Condition Based on a Column Values.

2010-09-27 Thread Lucas Nodine
Have you looked at get_slice or multiget_slice ( http://wiki.apache.org/cassandra/API) to see if those will fit your needs? I would expect the slice_range property of the SlicePredicate might get you started. - Lucas Nodine On Mon, Sep 27, 2010 at 4:43 AM, sekhar kosuru wrote: > Hi > > I have a

How to Load Records From ColumnFamily by Condition Based on a Column Values.

2010-09-27 Thread sekhar kosuru
Hi I have a ColumnFamily with 50k rows, how can i load only say some 2k or so, Which are satisfied some Criteria on a Single Column Values. For Example, In the ColumnFamily normal column named CreatedDate, i need to load the satisfied criteria like this. 1-sep-2009 to 3-july-2010. /Regards Sek

Re: How to Retrieve all the rows from a ColumnFamily

2010-09-27 Thread Benjamin Black
http://wiki.apache.org/cassandra/FAQ#iter_world On Sun, Sep 26, 2010 at 11:51 PM, sekhar kosuru wrote: > Hi > I am new to Cassandra Database. > I want to know how to Retrieve all the records from a column family, is this > is different in the clustered servers vs single servers. > Please suggest