Re: script to modify cassandra.yaml file
I use grep / awk / sed from within a bash script ... this works quite well. -sd On Mon, Mar 21, 2011 at 12:39 AM, Anurag Gujral wrote: > Hi All, > I want to modify the values in the cassandra.yaml which comes with > the cassandra-0.7 package based on values of hostnames, > colo etc. > Does someone knows of some script which I can use which reads in default > cassandra.yaml and write outs new cassandra.yaml > with values based on number of nodes in the cluster ,hostname,colo name etc.
Re: Active / Active Data Center and RF
I'll take another crack at it, here's how I think it works. When using the NetworkTopologyStrategy you can specify how the RF is distributed between the DC's you have. This is done as part of the schema definition. When using a CLI script use the strategy_options clause of the create keyspace statement, it is also available via the yaml configuration and the RPC. For example you can split an RF of 6 evenly over two DC's or say 4 replicas in one and 2 in another. You can slice it up anyway you want. When using the NetworkTopologyStrategy you will want to use the PropertyFileSnitch (set in yaml config). This reads the conf/cassandra-topology.properties file to find out which DC and which rack a node is in. The old way is the RackInferringSnitch. So no matter which DC the co-ordinator is in, the cluster will try to place replicas according to these configuration settings. The LOCAL_QUORUM and EACH_QUORUM CL's tell the coordinator to block on either just the local mutations or the local and remote from each DC. The settings will be used when a read is performed to determine where the replicas are. In the read case the PropertyFileSnitch will sort the (live) end points by proximity to the coordinator node, this takes into account both the rack and the datacentre. The request will only be sent to the number of nodes we are going to block on, with the closest nodes chosen first. If you read at LOCAL_QUORUM and everything is working your read will only use nodes in the local DC, using the DC's local RF. If you read at QUORUM you would use the full clusters RF and the read would potentially cross DC's. For both LOCAL_QUORUM and EACH_QUORUM the read blocks until RF nodes for the local DC have returned. (The remote DC RF settings are ignored, anyone know why?) Hope that helps. Aaron On 21 Mar 2011, at 16:43, mcasandra wrote: > CL is just a way to satisfy consistency but you still want majority of your > reads (preferrably) occurring in the same DC. > > I don't think that answers my question at all. I understand the CL but I > think I have more basic and important question about active/active data > center and the replicas in that very specific scenario which to me looks > like a issue somehow. Can someone please look at my question specifically > again? > > > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Active-Active-Data-Center-and-RF-tp6185528p6191120.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.
Re: Pauses of GC
After some investigations i think that my problems is similar to this : http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/reduced-cached-mem-resident-set-size-growth-td5967110.html Now i disable mmap, and set disk_access_mode to mmap_index_only
stress.py bug?
Hi, I'm inserting data from client node with stress.py to cluster of 6 nodes. They are all on 1Gbps network, max real throughput of network is 930Mbps (after measurement). python stress.py -c 1 -S 17 -d{6nodes} -l3 -e QUORUM --operation=insert -i 1 -n 50 -t100 The problem is stress.py show up it does avg ~750ops/sec what is 127MB/s, but the real throughput of network is ~116MB/s. Any idea? Thanks Best, Peter
Re: Column family cannot be removed
Jonathan Ellis gmail.com> writes: > > drop and truncate both snapshot first, which requires forking to run > ln if you don't have JNA installed. > > best solution: install JNA so it can do in-process link calls. > Could you please tell exact actions in client (or not in client ?) that should be done ? I have installed JNA, yes truncate still doesn't work for me. What to do after ? Thank you!
RE: Column family cannot be removed
Hi Nikolay, JNA has to be installed on the service box(es). On Ubuntu you can do the following: wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb ln -s /usr/share/java/jna.jar [path_to_cassandra]/lib ... and then restart Cassandra server. George -Original Message- From: Nikolay [mailto:nkovs...@yandex.ru] Sent: 21 March 2011 12:20 To: user@cassandra.apache.org Subject: Re: Column family cannot be removed Jonathan Ellis gmail.com> writes: > > drop and truncate both snapshot first, which requires forking to run > ln if you don't have JNA installed. > > best solution: install JNA so it can do in-process link calls. > Could you please tell exact actions in client (or not in client ?) that should be done ? I have installed JNA, yes truncate still doesn't work for me. What to do after ? Thank you!
Apache Cassandra Hangout in Mumbai-Pune area (India)
Hi, Anyone interested joining in Apache Cassandra hangout/meetup nearby * mumbai-pune* area. - Share/teach your exp with Apache Cassandra, problems/issue you faced during deployment. - **Excited and heard about its buzz, want to learn more about NoSQL cassandra. Regards, GeekTalks
Re: 0.6.5 OOM during high read load
Beautiful, thanks. On Sun, Mar 20, 2011 at 4:36 PM, Jonathan Ellis wrote: > 0.7.1+ uses zero-copy reads in mmap'd mode so having 80k references to > the same column is essentially just the reference overhead. > > On Fri, Mar 18, 2011 at 7:11 PM, Dan Retzlaff wrote: > > Dear experts, :) > > Our application triggered an OOM error in Cassandra 0.6.5 by reading the > > same 1.7MB column repeatedly (~80k reads). I analyzed the heap dump, and > it > > looks like the column value was queued 5400 times in an > > OutboundTcpConnection destined for the Cassandra instance that received > the > > client request. Unfortunately, this intra-node connection goes across a > > 100Mb data center interconnect, so it was only a matter of time before > the > > heap was exhausted. > > Is there something I can do (other than change the application behavior) > to > > avoid this failure mode? I'm not the first to run into this, am I? > > Thanks, > > Dan > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: Pauses of GC
I mean a linux process heap fragmentation by malloc, so at one critical moment all memory holden by java process in RSS, and OS core cant allocate any system resource an as result hung? Is it possble?
Re: EC2 - 2 regions
Thanks for sharing this. What mechanisms secure the data (streams?) in transit between nodes? This isn't clear for me. On Mon, Mar 21, 2011 at 10:01 AM, Milind Parikh wrote: > Here's the document on Cassandra (0.7.4) across EC2 regions. Clearly this is > work in progress but wanted to share what I have. PDF is the working > copy. > > > https://docs.google.com/document/d/175duUNIx7m5mCDa2sjXVI04ekyMa5bdiWdu-AFgisaY/edit?hl=en
Re: Pauses of GC
No. We do zero allocation by malloc (so far). It's all managed by GC in heap. On Mon, Mar 21, 2011 at 10:25 AM, ruslan usifov wrote: > I mean a linux process heap fragmentation by malloc, so at one critical > moment all memory holden by java process in RSS, and OS core cant allocate > any system resource an as result hung? Is it possble? > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: stress.py bug?
On Mon, Mar 21, 2011 at 4:02 AM, pob wrote: > Hi, > I'm inserting data from client node with stress.py to cluster of 6 nodes. > They are all on 1Gbps network, max real throughput of network is 930Mbps > (after measurement). > python stress.py -c 1 -S 17 -d{6nodes} -l3 -e QUORUM > --operation=insert -i 1 -n 50 -t100 > The problem is stress.py show up it does avg ~750ops/sec what is 127MB/s, > but the real throughput of network is ~116MB/s. You may need more concurrency in order to saturate your network. -ryan
Re: Active / Active Data Center and RF
I think what I am trying to ask is this: what happens if it's RF=3 with network toplogy (RackInferringSnitch) and 2 copies are stored in Site A and 1 copy in Site B data center. Now client for some reason is directed to Site B data center and does a write/update on existing column, now would Site B have 2 copies too because of network topology (RackInferringSnitch)? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Active-Active-Data-Center-and-RF-tp6185528p6192916.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: stress.py bug?
You mean, more threads in stress.py? The purpose was figure out whats the biggest bandwidth that C* can use. Peter 2011/3/21 Ryan King > On Mon, Mar 21, 2011 at 4:02 AM, pob wrote: > > Hi, > > I'm inserting data from client node with stress.py to cluster of 6 nodes. > > They are all on 1Gbps network, max real throughput of network is 930Mbps > > (after measurement). > > python stress.py -c 1 -S 17 -d{6nodes} -l3 -e QUORUM > > --operation=insert -i 1 -n 50 -t100 > > The problem is stress.py show up it does avg ~750ops/sec what is 127MB/s, > > but the real throughput of network is ~116MB/s. > > You may need more concurrency in order to saturate your network. > > -ryan >
Re: stress.py bug?
On Mon, Mar 21, 2011 at 9:34 AM, pob wrote: > You mean, > more threads in stress.py? The purpose was figure out whats the > biggest bandwidth that C* can use. You should try more threads, but at some point you'll hit diminishing returns there. You many need to drive load from more than one host. Either way, you need to find out what the bottleneck is. -ryan
Re: EC2 - 2 regions
Thanks for sharing the document, Milind ! Followed the instructions and it worked for me. On Mon, Mar 21, 2011 at 5:01 AM, Milind Parikh wrote: > Here's the document on Cassandra (0.7.4) across EC2 regions. Clearly this is > work in progress but wanted to share what I have. PDF is the working > copy. > > > https://docs.google.com/document/d/175duUNIx7m5mCDa2sjXVI04ekyMa5bdiWdu-AFgisaY/edit?hl=en > > On Sun, Mar 20, 2011 at 7:49 PM, aaron morton > wrote: >> >> Recent discussion on the dev list >> http://www.mail-archive.com/dev@cassandra.apache.org/msg01832.html >> Aaron >> On 19 Mar 2011, at 06:46, A J wrote: >> >> Just to add, all the telnet (port 7000) and cassandra-cli (port 9160) >> connections are done using the public DNS (that goes like >> ec2-.compute.amazonaws.com) >> >> On Fri, Mar 18, 2011 at 1:37 PM, A J wrote: >> >> I am able to telnet from one region to another on 7000 port without >> >> issues. (I get the expected Connected to .Escape character is >> >> '^]'.) >> >> Also I am able to execute cassandra client on 9160 port from one >> >> region to another without issues (this is when I run cassandra >> >> separately on each region without forming a cluster). >> >> So I think the ports 7000 and 9160 are not the issue. >> >> >> >> On Fri, Mar 18, 2011 at 1:26 PM, Dave Viner wrote: >> >> From the us-west instance, are you able to connect to the us-east instance >> >> using telnet on port 7000 and 9160? >> >> If not, then you need to open those ports for communication (via your >> >> Security Group) >> >> Dave Viner >> >> On Fri, Mar 18, 2011 at 10:20 AM, A J wrote: >> >> Thats exactly what I am doing. >> >> I was able to do the first two scenarios without any issues (i.e. 2 >> >> nodes in same availability zone. Followed by an additional node in a >> >> different zone but same region) >> >> I am stuck at the third scenario of separate regions. >> >> (I did read the "Cassandra nodes on EC2 in two different regions not >> >> communicating" thread but it did not seem to end with resolution) >> >> >> On Fri, Mar 18, 2011 at 1:15 PM, Dave Viner wrote: >> >> Hi AJ, >> >> I'd suggest getting to a multi-region cluster step-by-step. First, get >> >> 2 >> >> nodes running in the same availability zone. Make sure that works >> >> properly. >> >> Second, add a node in a separate availability zone, but in the same >> >> region. >> >> Make sure that's working properly. Third, add a node that's in a >> >> separate >> >> region. >> >> Taking it step-by-step will ensure that any issues are specific to the >> >> region-to-region communication, rather than intra-zone connectivity or >> >> cassandra cluster configuration. >> >> Dave Viner >> >> On Fri, Mar 18, 2011 at 8:34 AM, A J wrote: >> >> Hello, >> >> I am trying to setup a cassandra cluster across regions. >> >> For testing I am keeping it simple and just having one node in US-EAST >> >> (say ec2-1-2-3-4.compute-1.amazonaws.com) and one node in US-WEST (say >> >> ec2-2-2-3-4.us-west-1.compute.amazonaws.com). >> >> Using Cassandra 0.7.4 >> >> >> The one in east region is the seed node and has the values as: >> >> auto_bootstrap: false >> >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com >> >> listen_address: ec2-1-2-3-4.compute-1.amazonaws.com >> >> rpc_address: 0.0.0.0 >> >> The one in west region is non seed and has the values as: >> >> auto_bootstrap: true >> >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com >> >> listen_address: ec2-2-2-3-4.us-west-1.compute.amazonaws.com >> >> rpc_address: 0.0.0.0 >> >> I first fire the seed node (east region instance) and it comes up >> >> without issues. >> >> When I fire the non-seed node (west region instance) it fails after >> >> sometime with the error: >> >> DEBUG 15:09:08,844 Created HHOM instance, registered MBean. >> >> INFO 15:09:08,844 Joining: getting load information >> >> INFO 15:09:08,845 Sleeping 9 ms to wait for load information... >> >> DEBUG 15:09:09,822 attempting to connect to >> >> ec2-1-2-3-4.compute-1.amazonaws.com/1.2.3.4 >> >> DEBUG 15:09:10,825 Disseminating load info ... >> >> DEBUG 15:10:10,826 Disseminating load info ... >> >> DEBUG 15:10:38,845 ... got load info >> >> INFO 15:10:38,845 Joining: getting bootstrap token >> >> ERROR 15:10:38,847 Exception encountered during startup. >> >> java.lang.RuntimeException: No other nodes seen! Unable to bootstrap >> >> at >> >> >> org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:164) >> >> at >> >> >> org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:146) >> >> at >> >> >> org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:141) >> >> at >> >> >> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:450) >> >> at >> >> >> org.apache.cassandra.service.StorageService.initServer(StorageService.java:404) >> >> at >> >> >> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandr
Re: stress.py bug?
Not completely related. just fyi. I like it better to see the start time, end time, duration of each execution in each thread. And then do the aggregation (avg,max,min) myself. I modified last few lines of the Inserter function as follows: endtime = time.time() self.latencies[self.idx] += endtime - start self.opcounts[self.idx] += 1 self.keycounts[self.idx] += 1 open('log'+str(self.idx)+'.txt','a').write(str(endtime-start) + ' ' + str(self.idx) + ' ' + str(i) + ' ' + str(time.asctime())+ ' ' + str(start) + ' ' + str(endtime) + '\n') You need to understand little bit of python to plug this properly in stress.py. Above creates lot of log*.txt files. One for each thread. Each line in these log files have the duration, thread#,key,timestamp,starttime,endtime separated by space. i then load these log files to a database and do aggregations as I need. Remember to remove the old log files on rerun. The above will append to existing log files. Just a fyi. Most will not need this. On Mon, Mar 21, 2011 at 12:40 PM, Ryan King wrote: > On Mon, Mar 21, 2011 at 9:34 AM, pob wrote: >> You mean, >> more threads in stress.py? The purpose was figure out whats the >> biggest bandwidth that C* can use. > > You should try more threads, but at some point you'll hit diminishing > returns there. You many need to drive load from more than one host. > Either way, you need to find out what the bottleneck is. > > -ryan >
Re: EC2 - 2 regions
Hi Milind, Great work here. Can you provide the patch against the 2 files? Perhaps there's some way to incorporate it into the trunk of cassandra so that this is feasible (in a future release) without patching the source code. Dave Viner On Mon, Mar 21, 2011 at 9:41 AM, A J wrote: > Thanks for sharing the document, Milind ! > Followed the instructions and it worked for me. > > On Mon, Mar 21, 2011 at 5:01 AM, Milind Parikh > wrote: > > Here's the document on Cassandra (0.7.4) across EC2 regions. Clearly this > is > > work in progress but wanted to share what I have. PDF is the working > > copy. > > > > > > > https://docs.google.com/document/d/175duUNIx7m5mCDa2sjXVI04ekyMa5bdiWdu-AFgisaY/edit?hl=en > > > > On Sun, Mar 20, 2011 at 7:49 PM, aaron morton > > wrote: > >> > >> Recent discussion on the dev list > >> http://www.mail-archive.com/dev@cassandra.apache.org/msg01832.html > >> Aaron > >> On 19 Mar 2011, at 06:46, A J wrote: > >> > >> Just to add, all the telnet (port 7000) and cassandra-cli (port 9160) > >> connections are done using the public DNS (that goes like > >> ec2-.compute.amazonaws.com) > >> > >> On Fri, Mar 18, 2011 at 1:37 PM, A J wrote: > >> > >> I am able to telnet from one region to another on 7000 port without > >> > >> issues. (I get the expected Connected to .Escape character is > >> > >> '^]'.) > >> > >> Also I am able to execute cassandra client on 9160 port from one > >> > >> region to another without issues (this is when I run cassandra > >> > >> separately on each region without forming a cluster). > >> > >> So I think the ports 7000 and 9160 are not the issue. > >> > >> > >> > >> On Fri, Mar 18, 2011 at 1:26 PM, Dave Viner > wrote: > >> > >> From the us-west instance, are you able to connect to the us-east > instance > >> > >> using telnet on port 7000 and 9160? > >> > >> If not, then you need to open those ports for communication (via your > >> > >> Security Group) > >> > >> Dave Viner > >> > >> On Fri, Mar 18, 2011 at 10:20 AM, A J wrote: > >> > >> Thats exactly what I am doing. > >> > >> I was able to do the first two scenarios without any issues (i.e. 2 > >> > >> nodes in same availability zone. Followed by an additional node in a > >> > >> different zone but same region) > >> > >> I am stuck at the third scenario of separate regions. > >> > >> (I did read the "Cassandra nodes on EC2 in two different regions not > >> > >> communicating" thread but it did not seem to end with resolution) > >> > >> > >> On Fri, Mar 18, 2011 at 1:15 PM, Dave Viner > wrote: > >> > >> Hi AJ, > >> > >> I'd suggest getting to a multi-region cluster step-by-step. First, get > >> > >> 2 > >> > >> nodes running in the same availability zone. Make sure that works > >> > >> properly. > >> > >> Second, add a node in a separate availability zone, but in the same > >> > >> region. > >> > >> Make sure that's working properly. Third, add a node that's in a > >> > >> separate > >> > >> region. > >> > >> Taking it step-by-step will ensure that any issues are specific to the > >> > >> region-to-region communication, rather than intra-zone connectivity or > >> > >> cassandra cluster configuration. > >> > >> Dave Viner > >> > >> On Fri, Mar 18, 2011 at 8:34 AM, A J wrote: > >> > >> Hello, > >> > >> I am trying to setup a cassandra cluster across regions. > >> > >> For testing I am keeping it simple and just having one node in US-EAST > >> > >> (say ec2-1-2-3-4.compute-1.amazonaws.com) and one node in US-WEST (say > >> > >> ec2-2-2-3-4.us-west-1.compute.amazonaws.com). > >> > >> Using Cassandra 0.7.4 > >> > >> > >> The one in east region is the seed node and has the values as: > >> > >> auto_bootstrap: false > >> > >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com > >> > >> listen_address: ec2-1-2-3-4.compute-1.amazonaws.com > >> > >> rpc_address: 0.0.0.0 > >> > >> The one in west region is non seed and has the values as: > >> > >> auto_bootstrap: true > >> > >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com > >> > >> listen_address: ec2-2-2-3-4.us-west-1.compute.amazonaws.com > >> > >> rpc_address: 0.0.0.0 > >> > >> I first fire the seed node (east region instance) and it comes up > >> > >> without issues. > >> > >> When I fire the non-seed node (west region instance) it fails after > >> > >> sometime with the error: > >> > >> DEBUG 15:09:08,844 Created HHOM instance, registered MBean. > >> > >> INFO 15:09:08,844 Joining: getting load information > >> > >> INFO 15:09:08,845 Sleeping 9 ms to wait for load information... > >> > >> DEBUG 15:09:09,822 attempting to connect to > >> > >> ec2-1-2-3-4.compute-1.amazonaws.com/1.2.3.4 > >> > >> DEBUG 15:09:10,825 Disseminating load info ... > >> > >> DEBUG 15:10:10,826 Disseminating load info ... > >> > >> DEBUG 15:10:38,845 ... got load info > >> > >> INFO 15:10:38,845 Joining: getting bootstrap token > >> > >> ERROR 15:10:38,847 Exception encountered during startup. > >> > >> java.lang.RuntimeException: No other nodes seen! Unable to bootst
Re: EC2 - 2 regions
I talked to Matt Dennis in the channel about it and I think everyone would like to make sure that cassandra works great across multiple regions. He sounded like he didn't know why it wouldn't work after having looked at the patches. I would like to try it both ways - with and without the patches later today if I can and I'd like to help out with getting it working out of the box. Thanks for the investigative work and documentation Milind! Jeremy On Mar 21, 2011, at 12:12 PM, Dave Viner wrote: > Hi Milind, > > Great work here. Can you provide the patch against the 2 files? > > Perhaps there's some way to incorporate it into the trunk of cassandra so > that this is feasible (in a future release) without patching the source code. > > Dave Viner > > > On Mon, Mar 21, 2011 at 9:41 AM, A J wrote: > Thanks for sharing the document, Milind ! > Followed the instructions and it worked for me. > > On Mon, Mar 21, 2011 at 5:01 AM, Milind Parikh wrote: > > Here's the document on Cassandra (0.7.4) across EC2 regions. Clearly this is > > work in progress but wanted to share what I have. PDF is the working > > copy. > > > > > > https://docs.google.com/document/d/175duUNIx7m5mCDa2sjXVI04ekyMa5bdiWdu-AFgisaY/edit?hl=en > > > > On Sun, Mar 20, 2011 at 7:49 PM, aaron morton > > wrote: > >> > >> Recent discussion on the dev list > >> http://www.mail-archive.com/dev@cassandra.apache.org/msg01832.html > >> Aaron > >> On 19 Mar 2011, at 06:46, A J wrote: > >> > >> Just to add, all the telnet (port 7000) and cassandra-cli (port 9160) > >> connections are done using the public DNS (that goes like > >> ec2-.compute.amazonaws.com) > >> > >> On Fri, Mar 18, 2011 at 1:37 PM, A J wrote: > >> > >> I am able to telnet from one region to another on 7000 port without > >> > >> issues. (I get the expected Connected to .Escape character is > >> > >> '^]'.) > >> > >> Also I am able to execute cassandra client on 9160 port from one > >> > >> region to another without issues (this is when I run cassandra > >> > >> separately on each region without forming a cluster). > >> > >> So I think the ports 7000 and 9160 are not the issue. > >> > >> > >> > >> On Fri, Mar 18, 2011 at 1:26 PM, Dave Viner wrote: > >> > >> From the us-west instance, are you able to connect to the us-east instance > >> > >> using telnet on port 7000 and 9160? > >> > >> If not, then you need to open those ports for communication (via your > >> > >> Security Group) > >> > >> Dave Viner > >> > >> On Fri, Mar 18, 2011 at 10:20 AM, A J wrote: > >> > >> Thats exactly what I am doing. > >> > >> I was able to do the first two scenarios without any issues (i.e. 2 > >> > >> nodes in same availability zone. Followed by an additional node in a > >> > >> different zone but same region) > >> > >> I am stuck at the third scenario of separate regions. > >> > >> (I did read the "Cassandra nodes on EC2 in two different regions not > >> > >> communicating" thread but it did not seem to end with resolution) > >> > >> > >> On Fri, Mar 18, 2011 at 1:15 PM, Dave Viner wrote: > >> > >> Hi AJ, > >> > >> I'd suggest getting to a multi-region cluster step-by-step. First, get > >> > >> 2 > >> > >> nodes running in the same availability zone. Make sure that works > >> > >> properly. > >> > >> Second, add a node in a separate availability zone, but in the same > >> > >> region. > >> > >> Make sure that's working properly. Third, add a node that's in a > >> > >> separate > >> > >> region. > >> > >> Taking it step-by-step will ensure that any issues are specific to the > >> > >> region-to-region communication, rather than intra-zone connectivity or > >> > >> cassandra cluster configuration. > >> > >> Dave Viner > >> > >> On Fri, Mar 18, 2011 at 8:34 AM, A J wrote: > >> > >> Hello, > >> > >> I am trying to setup a cassandra cluster across regions. > >> > >> For testing I am keeping it simple and just having one node in US-EAST > >> > >> (say ec2-1-2-3-4.compute-1.amazonaws.com) and one node in US-WEST (say > >> > >> ec2-2-2-3-4.us-west-1.compute.amazonaws.com). > >> > >> Using Cassandra 0.7.4 > >> > >> > >> The one in east region is the seed node and has the values as: > >> > >> auto_bootstrap: false > >> > >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com > >> > >> listen_address: ec2-1-2-3-4.compute-1.amazonaws.com > >> > >> rpc_address: 0.0.0.0 > >> > >> The one in west region is non seed and has the values as: > >> > >> auto_bootstrap: true > >> > >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com > >> > >> listen_address: ec2-2-2-3-4.us-west-1.compute.amazonaws.com > >> > >> rpc_address: 0.0.0.0 > >> > >> I first fire the seed node (east region instance) and it comes up > >> > >> without issues. > >> > >> When I fire the non-seed node (west region instance) it fails after > >> > >> sometime with the error: > >> > >> DEBUG 15:09:08,844 Created HHOM instance, registered MBean. > >> > >> INFO 15:09:08,844 Joining: getting load information > >> > >> INFO 15:09:08,845 Sleep
ConsistencyLevel greater ONE + node failure = non-responsive Cassandra 0.6.5 cluster
Hi guys, we are currently benchmarking various configurations of an EC2-based Cassandra cluster. This is our current setup: 1) 8 nodes where each node is an m1.xlarge EC2 instance 2) Cassandra version 0.6.5 3) Replication Factor = 3 4) this delivers ~7K to 10K ops/sec with 50% GET and 50% INSERT depending on the consistency level We have been benchmarking the cluster with YCSB, while altering the consistency levels ONE, QUORUM, and ALL, ceteris paribus. This works fine if all nodes are alive. Then, we wanted to benchmark the cluster performance behavior when one node goes down. So, we killed one node and tested the cluster with consistency level ONE, which delivered reasonable throughput of multiple thousand ops/sec. Then, we wanted to test QUORUM and ALL. However, when one node is down, the cluster throughput sharply drops to a few operations and then stops responding to the YCSB client if the consistency level of operations in the benchmark is set to QUORUM or ALL. For ALL, this behavior would (kind of) make sense for read requests but we are puzzled that even QUORUM won't work. And for 100% write operations in consistency level ALL it won't work either. Any ideas why the cluster stops responding for QUORUM and ALL? Thanks, Markus
Re: ConsistencyLevel greater ONE + node failure = non-responsive Cassandra 0.6.5 cluster
I suggest upgrading to either 0.6.12 or 0.7.4 and re-testing. On Mon, Mar 21, 2011 at 12:52 PM, Markus Klems wrote: > Hi guys, > > we are currently benchmarking various configurations of an EC2-based > Cassandra cluster. This is our current setup: > > 1) 8 nodes where each node is an m1.xlarge EC2 instance > 2) Cassandra version 0.6.5 > 3) Replication Factor = 3 > 4) this delivers ~7K to 10K ops/sec with 50% GET and 50% INSERT > depending on the consistency level > > We have been benchmarking the cluster with YCSB, while altering the > consistency levels ONE, QUORUM, and ALL, ceteris paribus. This works > fine if all nodes are alive. Then, we wanted to benchmark the cluster > performance behavior when one node goes down. So, we killed one node > and tested the cluster with consistency level ONE, which delivered > reasonable throughput of multiple thousand ops/sec. Then, we wanted to > test QUORUM and ALL. However, when one node is down, the cluster > throughput sharply drops to a few operations and then stops responding > to the YCSB client if the consistency level of operations in the > benchmark is set to QUORUM or ALL. For ALL, this behavior would (kind > of) make sense for read requests but we are puzzled that even QUORUM > won't work. And for 100% write operations in consistency level ALL it > won't work either. > > Any ideas why the cluster stops responding for QUORUM and ALL? > > Thanks, > > Markus > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Can the Cassandra to be hosted, with all your features and performance, on Microsoft Azure ?
There are any benchmark that I can apply after install Cassandra on Azure to check performance/scalability issues? []'s FernandoVM On Sun, Mar 13, 2011 at 10:16 PM, aaron morton wrote: > If it works like all the other virtual machine hosts then yes it can be > hosted. > Performance can always be less on a virtual machine though. > See http://wiki.apache.org/cassandra/CloudConfig > Aaron > On 14 Mar 2011, at 13:09, FernandoVM wrote: > > Hello friends, > > > Anyone know if the Cassandra can be hosted, with all your > features and performance, on Microsoft Azure? > > > []'s > FernandoVM > > -- []'s FernandoVM
Clearsnapshot Problem
I'm running 3-way CA cluster (0.62 ) on a windows 2008 (jre 1.6.24) 64-bit. Things are running fine except when trying to remove old snaphsot files. When running clearsnapshot I get an error msg like below. I can't remove any daily snapshot files. When trying to delete the actual snapshot file os cmd file I get "access denied. File used by another process". Seems CA or JRE is sitting on file? Feedback much appreciated. C:\Cassandra\ Exception in thread "main" java.io.IOException: Failed to delete c:\cassandra\da ta\data\ks_SnapshotTest\snapshots\1300731301822-ks_SnapshotTest\cf_SnapshotTest-135-Data.db at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.ja va:47) at org.apache.cassandra.io.util.FileUtils.deleteDir(FileUtils.java:189) at org.apache.cassandra.io.util.FileUtils.deleteDir(FileUtils.java:184) at org.apache.cassandra.io.util.FileUtils.deleteDir(FileUtils.java:184) at org.apache.cassandra.db.Table.clearSnapshot(Table.java:274) at org.apache.cassandra.service.StorageService.clearSnapshot(StorageServ ice.java:1023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown So urce) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown So urce) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source) at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Sou rce) at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown Sour ce) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run (Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(U nknown Source) at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Sou rce) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Sour ce) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source ) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) C:\Cassandra\bin>
RE: Nodes frozen in GC
> With the large new-gen, you were actually seeing fallbacks to full GC? > You weren't just still experiencing problems because at 10 gig, the new-gen > will be so slow to compact to effectively be similar to a full gc in terms of > affecting latency? Yes, we were seeing fallbacks to full GC with a large young generation. Surprisingly, the young generation was still collecting quickly (under 0.25s). Perhaps that means the young gen was full of fewer, large object? Doing some experimentation now, with a 250MB young gen, parnew collections are around 0.01-0.02 wall time seconds. With a 2GB young, they are around 0.02-0.07s. Finally, with ~10GB new (9.5GB young), they are around 0.05-0.15s, with most being <0.1s. If no compactions have occurred, we occasionally see a 0.2s parnew collection. 109.126: [GC 109.126: [ParNew Desired survivor size 298254336 bytes, new threshold 3 (max 3) - age 1: 34897008 bytes, 34897008 total - age 2: 17763192 bytes, 52660200 total - age 3: 135342264 bytes, 188002464 total : 9524607K->204678K(9903232K), 0.0565180 secs] 9527767K->229803K(15146112K), 0.0567610 secs] [Times: user=0.31 sys=0.01, real=0.06 secs] 128.997: [GC 128.998: [ParNew Desired survivor size 298254336 bytes, new threshold 2 (max 3) - age 1: 294294360 bytes, 294294360 total - age 2: 16201608 bytes, 310495968 total - age 3: 17755920 bytes, 328251888 total : 9500071K->349162K(9903232K), 0.1070140 secs] 9525196K->400440K(15146112K), 0.1072430 secs] [Times: user=0.60 sys=0.00, real=0.11 secs] These were taken during compaction of a non-troublesome CF. We are still seeing new generation allocations in the 1 GB/s range. And, process read I/O reported from /proc//io reports reading about 135MB/s. A 7.5x memory allocations to read I/O ratio does seem pretty high. Granted, the host was servicing some thrift requests at the time and these obviously contribute to object allocations. Without any compactions, we blow through the ~9.5GB young generation in 50-60s on average, which means our baseline allocation rate (from servicing thrift requests and other misc background work) is a (more reasonable) 200MB/s. Assuming there isn't much else going on in Cassandra, that means that ~80% of the allocated space during compactions is coming from the compaction and that the memory allocation overhead to process read bytes for compactions is pretty high (around 6:1). I understand there will be an overhead for the programming language, but a ratio in the 6:1 range of JVM allocations to I/O reads seems a bit high. Since we are talking about many gigabytes of memory, I would expect the JVM allocation size to be dominated by the column values. This leads me to believe that the column value buffers are being excessively copied or the sstables aren't being read as efficiently as possible. Whatever the root cause, there definitely seems to be room for improvement. But, I'm not a Java expert and don't know the compaction algorithm too well, so maybe a ratio of 6:1 is pretty good. > If there is any suspicion that the above is happening, maybe try decreasing > in_memory_compaction_limit_in_mb (preparing to see lots of stuff logged > to console, assuming that's still happening in the 0.6. > version you're running). I don't believe an in-memory compaction limit config option exists in 0.6. > I"m not sure on what you're basing that, but unless I have fatally failed to > grok something fundamental about the interaction between new-gen and > old-gen with CMS, object's aren't being allocated *period* while the "young > generation is being collected" as that is a stop-the-world pause. (This is > also > why I said before that at 10 gig new-gen size, the observed behavior on > young gen collections may be similar to fallback-to-full-gc cases, but not > quite > since it would be parallel rather than serial) The grok fail is probably on my end. I couldn't find any documentation to back either of our claims, so I'll defer to your experience. Greg
cassandra nodes with mixed hard disk sizes
This is a two part question ... 1. If you have cassandra nodes with different sized hard disks, how do you deal with assigning the token ring such that the nodes with larger disks get more data? In other words, given equally distributed token ranges, when the smaller disk nodes run out of space, the larger disk nodes with still have unused capacity.Or is installing a mixed hardware cluster a no-no? 2. What happens when a cassandra node runs out of disk space for its data files? Does it continue serving the data while not accepting new data? Or does the node break and require manual intervention? This info has alluded me elsewhere. Jon
Re: script to modify cassandra.yaml file
We use Puppet to manage the cassandra.yaml in a different location from the installation. Ours is in /etc/cassandra/cassandra.yaml You can set the environment CASSANDRA_CONF (i believe it is. check the cassandra.in.sh) and the startup script will pick up this as the configuration file to use. With Puppet you can manage the list of seeds, set the IP addresses, etc dynamically. I even use it to set the initial tokens. It makes life a lot easier. On Mar 21, 2011, at 9:14 AM, Sasha Dolgy wrote: > I use grep / awk / sed from within a bash script ... this works quite well. > -sd > > On Mon, Mar 21, 2011 at 12:39 AM, Anurag Gujral > wrote: >> Hi All, >> I want to modify the values in the cassandra.yaml which comes with >> the cassandra-0.7 package based on values of hostnames, >> colo etc. >> Does someone knows of some script which I can use which reads in default >> cassandra.yaml and write outs new cassandra.yaml >> with values based on number of nodes in the cluster ,hostname,colo name etc.
Re: script to modify cassandra.yaml file
to elaborate: our_temp_yaml=/tmp/$$.cassandra.yaml cp cassandra.yaml $our_temp_yaml for instance in $instances # do some more work to get the hostname from the instance sed -i "s/^seeds:/seeds: \n - $hostname/" $our_temp_yaml done -- the above inserts a new line for each $hostname into the temporary yaml. -sd On Mon, Mar 21, 2011 at 9:14 AM, Sasha Dolgy wrote: > I use grep / awk / sed from within a bash script ... this works quite well. > -sd > > On Mon, Mar 21, 2011 at 12:39 AM, Anurag Gujral > wrote: >> Hi All, >> I want to modify the values in the cassandra.yaml which comes with >> the cassandra-0.7 package based on values of hostnames, >> colo etc. >> Does someone knows of some script which I can use which reads in default >> cassandra.yaml and write outs new cassandra.yaml >> with values based on number of nodes in the cluster ,hostname,colo name etc.
Re: Active / Active Data Center and RF
No, replicas will always be directed to the same nodes. Otherwise we would not know where to find it. The OldNetworkTopologyStrategy alternated replicas between DC's , but it would still always put them on the same nodes. Aaron On 22 Mar 2011, at 05:31, mcasandra wrote: > I think what I am trying to ask is this: > > what happens if it's RF=3 with network toplogy (RackInferringSnitch) and 2 > copies are stored in Site A and 1 copy in Site B data center. Now client for > some reason is directed to Site B data center and does a write/update on > existing column, now would Site B have 2 copies too because of network > topology (RackInferringSnitch)? > > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Active-Active-Data-Center-and-RF-tp6185528p6192916.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.
Re: Can the Cassandra to be hosted, with all your features and performance, on Microsoft Azure ?
contrib/py_stress is the easiest way to shake out any issues with your install and get a benchmark. There is also https://github.com/brianfrankcooper/YCSB but I would go with py_stress until it stops been useful. Note: These are abstract benchmarks to be used for entertainment purposes only, the performance and scaling of your application may vary. :) Aaron On 22 Mar 2011, at 07:24, FernandoVM wrote: > There are any benchmark that I can apply after install Cassandra on > Azure to check performance/scalability issues? > > > []'s > FernandoVM > > On Sun, Mar 13, 2011 at 10:16 PM, aaron morton > wrote: >> If it works like all the other virtual machine hosts then yes it can be >> hosted. >> Performance can always be less on a virtual machine though. >> See http://wiki.apache.org/cassandra/CloudConfig >> Aaron >> On 14 Mar 2011, at 13:09, FernandoVM wrote: >> >> Hello friends, >> >> >> Anyone know if the Cassandra can be hosted, with all your >> features and performance, on Microsoft Azure? >> >> >> []'s >> FernandoVM >> >> > > > > -- > > []'s > FernandoVM
nodetool repair takes forever
I am trying to estimate the time it will take to rebuild a node. After loading reasonable data, I brought down a node and manually removed all its datafiles for a given keyspace (Keyspace1) I then restarted the node and got i back in the ring. At this point, I wish to run nodetool repair (bin/nodetool -h 127.0.0.1 repair Keyspace1) and estimate the time the time to rebuild from the time it takes to repair. For some reason, the repair command runs forever. I just have 3G of data per node but still the repair is running for more than an hour ! Can someone tell if it is normal or I am doing something wrong. Thanks.
Re: EC2 - 2 regions
Sorry if I was presumptuous earlier. I created a ticket so that the patch could be submitted and reviewed - that is if it can be generalized so that it works across regions and doesn't adversely affect the common case. https://issues.apache.org/jira/browse/CASSANDRA-2362 On Mar 21, 2011, at 10:41 PM, Jeremy Hanna wrote: > Sorry if I was presumptuous earlier. I created a ticket so that the patch > could be submitted and reviewed - that is if it can be generalized so that it > works across regions and doesn't adversely affect the common case. > https://issues.apache.org/jira/browse/CASSANDRA-2362 > > On Mar 21, 2011, at 12:20 PM, Jeremy Hanna wrote: > >> I talked to Matt Dennis in the channel about it and I think everyone would >> like to make sure that cassandra works great across multiple regions. He >> sounded like he didn't know why it wouldn't work after having looked at the >> patches. I would like to try it both ways - with and without the patches >> later today if I can and I'd like to help out with getting it working out of >> the box. >> >> Thanks for the investigative work and documentation Milind! >> >> Jeremy >> >> On Mar 21, 2011, at 12:12 PM, Dave Viner wrote: >> >>> Hi Milind, >>> >>> Great work here. Can you provide the patch against the 2 files? >>> >>> Perhaps there's some way to incorporate it into the trunk of cassandra so >>> that this is feasible (in a future release) without patching the source >>> code. >>> >>> Dave Viner >>> >>> >>> On Mon, Mar 21, 2011 at 9:41 AM, A J wrote: >>> Thanks for sharing the document, Milind ! >>> Followed the instructions and it worked for me. >>> >>> On Mon, Mar 21, 2011 at 5:01 AM, Milind Parikh >>> wrote: Here's the document on Cassandra (0.7.4) across EC2 regions. Clearly this is work in progress but wanted to share what I have. PDF is the working copy. https://docs.google.com/document/d/175duUNIx7m5mCDa2sjXVI04ekyMa5bdiWdu-AFgisaY/edit?hl=en On Sun, Mar 20, 2011 at 7:49 PM, aaron morton wrote: > > Recent discussion on the dev list > http://www.mail-archive.com/dev@cassandra.apache.org/msg01832.html > Aaron > On 19 Mar 2011, at 06:46, A J wrote: > > Just to add, all the telnet (port 7000) and cassandra-cli (port 9160) > connections are done using the public DNS (that goes like > ec2-.compute.amazonaws.com) > > On Fri, Mar 18, 2011 at 1:37 PM, A J wrote: > > I am able to telnet from one region to another on 7000 port without > > issues. (I get the expected Connected to .Escape character is > > '^]'.) > > Also I am able to execute cassandra client on 9160 port from one > > region to another without issues (this is when I run cassandra > > separately on each region without forming a cluster). > > So I think the ports 7000 and 9160 are not the issue. > > > > On Fri, Mar 18, 2011 at 1:26 PM, Dave Viner wrote: > > From the us-west instance, are you able to connect to the us-east instance > > using telnet on port 7000 and 9160? > > If not, then you need to open those ports for communication (via your > > Security Group) > > Dave Viner > > On Fri, Mar 18, 2011 at 10:20 AM, A J wrote: > > Thats exactly what I am doing. > > I was able to do the first two scenarios without any issues (i.e. 2 > > nodes in same availability zone. Followed by an additional node in a > > different zone but same region) > > I am stuck at the third scenario of separate regions. > > (I did read the "Cassandra nodes on EC2 in two different regions not > > communicating" thread but it did not seem to end with resolution) > > > On Fri, Mar 18, 2011 at 1:15 PM, Dave Viner wrote: > > Hi AJ, > > I'd suggest getting to a multi-region cluster step-by-step. First, get > > 2 > > nodes running in the same availability zone. Make sure that works > > properly. > > Second, add a node in a separate availability zone, but in the same > > region. > > Make sure that's working properly. Third, add a node that's in a > > separate > > region. > > Taking it step-by-step will ensure that any issues are specific to the > > region-to-region communication, rather than intra-zone connectivity or > > cassandra cluster configuration. > > Dave Viner > > On Fri, Mar 18, 2011 at 8:34 AM, A J wrote: > > Hello, > > I am trying to setup a cassandra cluster across regions. > > For testing I am keeping it simple and just having one node in US-EAST > > (say ec2-1-2-3-4.compute-1.amazonaws.com) and one node in US-WEST (say > > ec2-2-2-3-4.us-west-1.compute.amazonaws.com). > >
Re: Clearsnapshot Problem
There have been some issues to with deleting files on windows, cannot find a reference to it happening for snapshots. If you restart the node can you delete the snapshot? Longer term can you upgrade to 0.6.12 and let us know if it happens again? Any fix will be against that version. Hope that helps. Aaron On 22 Mar 2011, at 08:11, s p wrote: > I'm running 3-way CA cluster (0.62 ) on a windows 2008 (jre 1.6.24) 64-bit. > Things are running fine except when trying to remove old snaphsot files. When > running clearsnapshot I get an error msg like below. I can't remove any > daily snapshot files. When trying to delete the actual snapshot file os cmd > file I get "access denied. File used by another process". Seems CA or JRE is > sitting on file? > > Feedback much appreciated. > > > C:\Cassandra\ > Exception in thread "main" java.io.IOException: Failed to delete > c:\cassandra\da > ta\data\ks_SnapshotTest\snapshots\1300731301822-ks_SnapshotTest\cf_SnapshotTest-135-Data.db > at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.ja > va:47) > at > org.apache.cassandra.io.util.FileUtils.deleteDir(FileUtils.java:189) > at > org.apache.cassandra.io.util.FileUtils.deleteDir(FileUtils.java:184) > at > org.apache.cassandra.io.util.FileUtils.deleteDir(FileUtils.java:184) > at org.apache.cassandra.db.Table.clearSnapshot(Table.java:274) > at > org.apache.cassandra.service.StorageService.clearSnapshot(StorageServ > ice.java:1023) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown > So > urce) > at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown > So > urce) > at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source) > at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown > Source) > at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source) > at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown > Sou > rce) > at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown > Sour > ce) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run > (Unknown Source) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(U > nknown Source) > at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown > Source) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) > at sun.rmi.transport.Transport$1.run(Unknown Source) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Unknown Source) > at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) > at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown > Sou > rce) > at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown > Sour > ce) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source > ) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > > C:\Cassandra\bin> > > > > >
Re: cassandra nodes with mixed hard disk sizes
1) You should use nodes with the same capacity (CPU, RAM, HDD), cassandra assumes they are all equal. 2) Not sure what exactly would happen. Am guessing either the node would shutdown or writes would eventually block, probably the former. If the node was up read performance may suffer (if there were more writes been sent in). If you really want to know more let me know and I may find time to dig into it. Also a node is be responsible for storing it's token range and acting as a replica for other token ranges. So reducing the token range may not have a dramatic affect on the storage requirements. Hope that helps. Aaron On 22 Mar 2011, at 09:50, Jonathan Colby wrote: > > This is a two part question ... > > 1. If you have cassandra nodes with different sized hard disks, how do you > deal with assigning the token ring such that the nodes with larger disks get > more data? In other words, given equally distributed token ranges, when the > smaller disk nodes run out of space, the larger disk nodes with still have > unused capacity.Or is installing a mixed hardware cluster a no-no? > > 2. What happens when a cassandra node runs out of disk space for its data > files? Does it continue serving the data while not accepting new data? Or > does the node break and require manual intervention? > > This info has alluded me elsewhere. > Jon
Re: nodetool repair takes forever
Are you monitoring the progress http://wiki.apache.org/cassandra/Streaming ? or with nodetool netstats Aaron On 22 Mar 2011, at 16:33, A J wrote: > I am trying to estimate the time it will take to rebuild a node. After > loading reasonable data, I brought down a node and manually removed > all its datafiles for a given keyspace (Keyspace1) > I then restarted the node and got i back in the ring. At this point, I > wish to run nodetool repair (bin/nodetool -h 127.0.0.1 repair > Keyspace1) and estimate the time the time to rebuild from the time it > takes to repair. > > For some reason, the repair command runs forever. I just have 3G of > data per node but still the repair is running for more than an hour ! > Can someone tell if it is normal or I am doing something wrong. > > Thanks.
Re: EC2 - 2 regions
Patch is attached... I don't have access to Jira. A cautionery note: This is NOT a general solution and is not intended as such. It could be included as a part of larger patch. I will explain in the limitation sections about why it is not a general solution; as I find time. Regards Milind On Mon, Mar 21, 2011 at 11:42 PM, Jeremy Hanna wrote: > Sorry if I was presumptuous earlier. I created a ticket so that the patch > could be submitted and reviewed - that is if it can be generalized so that > it works across regions and doesn't adversely affect the common case. > https://issues.apache.org/jira/browse/CASSANDRA-2362 > > On Mar 21, 2011, at 10:41 PM, Jeremy Hanna wrote: > > > Sorry if I was presumptuous earlier. I created a ticket so that the > patch could be submitted and reviewed - that is if it can be generalized so > that it works across regions and doesn't adversely affect the common case. > > https://issues.apache.org/jira/browse/CASSANDRA-2362 > > > > On Mar 21, 2011, at 12:20 PM, Jeremy Hanna wrote: > > > >> I talked to Matt Dennis in the channel about it and I think everyone > would like to make sure that cassandra works great across multiple regions. > He sounded like he didn't know why it wouldn't work after having looked at > the patches. I would like to try it both ways - with and without the > patches later today if I can and I'd like to help out with getting it > working out of the box. > >> > >> Thanks for the investigative work and documentation Milind! > >> > >> Jeremy > >> > >> On Mar 21, 2011, at 12:12 PM, Dave Viner wrote: > >> > >>> Hi Milind, > >>> > >>> Great work here. Can you provide the patch against the 2 files? > >>> > >>> Perhaps there's some way to incorporate it into the trunk of cassandra > so that this is feasible (in a future release) without patching the source > code. > >>> > >>> Dave Viner > >>> > >>> > >>> On Mon, Mar 21, 2011 at 9:41 AM, A J wrote: > >>> Thanks for sharing the document, Milind ! > >>> Followed the instructions and it worked for me. > >>> > >>> On Mon, Mar 21, 2011 at 5:01 AM, Milind Parikh > wrote: > Here's the document on Cassandra (0.7.4) across EC2 regions. Clearly > this is > work in progress but wanted to share what I have. PDF is the > working > copy. > > > > https://docs.google.com/document/d/175duUNIx7m5mCDa2sjXVI04ekyMa5bdiWdu-AFgisaY/edit?hl=en > > On Sun, Mar 20, 2011 at 7:49 PM, aaron morton < > aa...@thelastpickle.com> > wrote: > > > > Recent discussion on the dev list > > http://www.mail-archive.com/dev@cassandra.apache.org/msg01832.html > > Aaron > > On 19 Mar 2011, at 06:46, A J wrote: > > > > Just to add, all the telnet (port 7000) and cassandra-cli (port 9160) > > connections are done using the public DNS (that goes like > > ec2-.compute.amazonaws.com) > > > > On Fri, Mar 18, 2011 at 1:37 PM, A J wrote: > > > > I am able to telnet from one region to another on 7000 port without > > > > issues. (I get the expected Connected to .Escape character is > > > > '^]'.) > > > > Also I am able to execute cassandra client on 9160 port from one > > > > region to another without issues (this is when I run cassandra > > > > separately on each region without forming a cluster). > > > > So I think the ports 7000 and 9160 are not the issue. > > > > > > > > On Fri, Mar 18, 2011 at 1:26 PM, Dave Viner > wrote: > > > > From the us-west instance, are you able to connect to the us-east > instance > > > > using telnet on port 7000 and 9160? > > > > If not, then you need to open those ports for communication (via your > > > > Security Group) > > > > Dave Viner > > > > On Fri, Mar 18, 2011 at 10:20 AM, A J wrote: > > > > Thats exactly what I am doing. > > > > I was able to do the first two scenarios without any issues (i.e. 2 > > > > nodes in same availability zone. Followed by an additional node in a > > > > different zone but same region) > > > > I am stuck at the third scenario of separate regions. > > > > (I did read the "Cassandra nodes on EC2 in two different regions not > > > > communicating" thread but it did not seem to end with resolution) > > > > > > On Fri, Mar 18, 2011 at 1:15 PM, Dave Viner > wrote: > > > > Hi AJ, > > > > I'd suggest getting to a multi-region cluster step-by-step. First, > get > > > > 2 > > > > nodes running in the same availability zone. Make sure that works > > > > properly. > > > > Second, add a node in a separate availability zone, but in the same > > > > region. > > > > Make sure that's working properly. Third, add a node that's in a > > > > separate > > > > region. > > > > Taking it step-by-step will ensure that any issues are specif
How to use join_ring=false?
I set join_ring=false in my java opts: -Djoin_ring=false However, when the node started up, it joined the ring. Is there something I am missing? Using 0.7.4 Thanks, Jason
Re: How to use join_ring=false?
-Dcassandra.join_ring=false -Chris On Mar 21, 2011, at 10:32 PM, Jason Harvey wrote: > I set join_ring=false in my java opts: > -Djoin_ring=false > > However, when the node started up, it joined the ring. Is there > something I am missing? Using 0.7.4 > > Thanks, > Jason
Re: How to use join_ring=false?
Gah! Thx :) Jason On Mar 21, 10:34 pm, Chris Goffinet wrote: > -Dcassandra.join_ring=false > > -Chris > > On Mar 21, 2011, at 10:32 PM, Jason Harvey wrote: > > > I set join_ring=false in my java opts: > > -Djoin_ring=false > > > However, when the node started up, it joined the ring. Is there > > something I am missing? Using 0.7.4 > > > Thanks, > > Jason