Re: script to modify cassandra.yaml file

2011-03-21 Thread Sasha Dolgy
I use grep / awk / sed from within a bash script ... this works quite well. -sd On Mon, Mar 21, 2011 at 12:39 AM, Anurag Gujral wrote: > Hi All, >   I want to modify the values in the cassandra.yaml which comes with > the cassandra-0.7 package based on values of hostnames, > colo etc. > D

Re: Active / Active Data Center and RF

2011-03-21 Thread aaron morton
I'll take another crack at it, here's how I think it works. When using the NetworkTopologyStrategy you can specify how the RF is distributed between the DC's you have. This is done as part of the schema definition. When using a CLI script use the strategy_options clause of the create keyspace s

Re: Pauses of GC

2011-03-21 Thread ruslan usifov
After some investigations i think that my problems is similar to this : http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/reduced-cached-mem-resident-set-size-growth-td5967110.html Now i disable mmap, and set disk_access_mode to mmap_index_only

stress.py bug?

2011-03-21 Thread pob
Hi, I'm inserting data from client node with stress.py to cluster of 6 nodes. They are all on 1Gbps network, max real throughput of network is 930Mbps (after measurement). python stress.py -c 1 -S 17 -d{6nodes} -l3 -e QUORUM --operation=insert -i 1 -n 50 -t100 The problem is stress.py

Re: Column family cannot be removed

2011-03-21 Thread Nikolay
Jonathan Ellis gmail.com> writes: > > drop and truncate both snapshot first, which requires forking to run > ln if you don't have JNA installed. > > best solution: install JNA so it can do in-process link calls. > Could you please tell exact actions in client (or not in client ?) that should b

RE: Column family cannot be removed

2011-03-21 Thread George Ciubotaru
Hi Nikolay, JNA has to be installed on the service box(es). On Ubuntu you can do the following: wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb ln -s /usr/share/java/jna.jar [path_to_cassandra]/lib ... and the

Apache Cassandra Hangout in Mumbai-Pune area (India)

2011-03-21 Thread Geek Talks
Hi, Anyone interested joining in Apache Cassandra hangout/meetup nearby * mumbai-pune* area. - Share/teach your exp with Apache Cassandra, problems/issue you faced during deployment. - **Excited and heard about its buzz, want to learn more about NoSQL cassandra. Regards, GeekTalks

Re: 0.6.5 OOM during high read load

2011-03-21 Thread Dan Retzlaff
Beautiful, thanks. On Sun, Mar 20, 2011 at 4:36 PM, Jonathan Ellis wrote: > 0.7.1+ uses zero-copy reads in mmap'd mode so having 80k references to > the same column is essentially just the reference overhead. > > On Fri, Mar 18, 2011 at 7:11 PM, Dan Retzlaff wrote: > > Dear experts, :) > > Our

Re: Pauses of GC

2011-03-21 Thread ruslan usifov
I mean a linux process heap fragmentation by malloc, so at one critical moment all memory holden by java process in RSS, and OS core cant allocate any system resource an as result hung? Is it possble?

Re: EC2 - 2 regions

2011-03-21 Thread Sasha Dolgy
Thanks for sharing this. What mechanisms secure the data (streams?) in transit between nodes? This isn't clear for me. On Mon, Mar 21, 2011 at 10:01 AM, Milind Parikh wrote: > Here's the document on Cassandra (0.7.4) across EC2 regions. Clearly this is > work in progress but wanted to share

Re: Pauses of GC

2011-03-21 Thread Jonathan Ellis
No. We do zero allocation by malloc (so far). It's all managed by GC in heap. On Mon, Mar 21, 2011 at 10:25 AM, ruslan usifov wrote: > I mean a linux process heap fragmentation by malloc, so at one critical > moment all memory holden by java process in RSS, and OS core cant allocate > any system

Re: stress.py bug?

2011-03-21 Thread Ryan King
On Mon, Mar 21, 2011 at 4:02 AM, pob wrote: > Hi, > I'm inserting data from client node with stress.py to cluster of 6 nodes. > They are all on 1Gbps network, max real throughput of network is 930Mbps > (after measurement). > python stress.py -c 1 -S 17  -d{6nodes}  -l3 -e QUORUM >  --operatio

Re: Active / Active Data Center and RF

2011-03-21 Thread mcasandra
I think what I am trying to ask is this: what happens if it's RF=3 with network toplogy (RackInferringSnitch) and 2 copies are stored in Site A and 1 copy in Site B data center. Now client for some reason is directed to Site B data center and does a write/update on existing column, now would Site

Re: stress.py bug?

2011-03-21 Thread pob
You mean, more threads in stress.py? The purpose was figure out whats the biggest bandwidth that C* can use. Peter 2011/3/21 Ryan King > On Mon, Mar 21, 2011 at 4:02 AM, pob wrote: > > Hi, > > I'm inserting data from client node with stress.py to cluster of 6 nodes. > > They are all on 1Gbps

Re: stress.py bug?

2011-03-21 Thread Ryan King
On Mon, Mar 21, 2011 at 9:34 AM, pob wrote: > You mean, > more threads in stress.py? The purpose was figure out whats the > biggest bandwidth that C* can use. You should try more threads, but at some point you'll hit diminishing returns there. You many need to drive load from more than one host.

Re: EC2 - 2 regions

2011-03-21 Thread A J
Thanks for sharing the document, Milind ! Followed the instructions and it worked for me. On Mon, Mar 21, 2011 at 5:01 AM, Milind Parikh wrote: > Here's the document on Cassandra (0.7.4) across EC2 regions. Clearly this is > work in progress but wanted to share what I have. PDF is the working

Re: stress.py bug?

2011-03-21 Thread A J
Not completely related. just fyi. I like it better to see the start time, end time, duration of each execution in each thread. And then do the aggregation (avg,max,min) myself. I modified last few lines of the Inserter function as follows: endtime = time.time() self.latencies[self.idx

Re: EC2 - 2 regions

2011-03-21 Thread Dave Viner
Hi Milind, Great work here. Can you provide the patch against the 2 files? Perhaps there's some way to incorporate it into the trunk of cassandra so that this is feasible (in a future release) without patching the source code. Dave Viner On Mon, Mar 21, 2011 at 9:41 AM, A J wrote: > Thanks

Re: EC2 - 2 regions

2011-03-21 Thread Jeremy Hanna
I talked to Matt Dennis in the channel about it and I think everyone would like to make sure that cassandra works great across multiple regions. He sounded like he didn't know why it wouldn't work after having looked at the patches. I would like to try it both ways - with and without the patch

ConsistencyLevel greater ONE + node failure = non-responsive Cassandra 0.6.5 cluster

2011-03-21 Thread Markus Klems
Hi guys, we are currently benchmarking various configurations of an EC2-based Cassandra cluster. This is our current setup: 1) 8 nodes where each node is an m1.xlarge EC2 instance 2) Cassandra version 0.6.5 3) Replication Factor = 3 4) this delivers ~7K to 10K ops/sec with 50% GET and 50% INSERT

Re: ConsistencyLevel greater ONE + node failure = non-responsive Cassandra 0.6.5 cluster

2011-03-21 Thread Jonathan Ellis
I suggest upgrading to either 0.6.12 or 0.7.4 and re-testing. On Mon, Mar 21, 2011 at 12:52 PM, Markus Klems wrote: > Hi guys, > > we are currently benchmarking various configurations of an EC2-based > Cassandra cluster. This is our current setup: > > 1) 8 nodes where each node is an m1.xlarge EC

Re: Can the Cassandra to be hosted, with all your features and performance, on Microsoft Azure ?

2011-03-21 Thread FernandoVM
There are any benchmark that I can apply after install Cassandra on Azure to check performance/scalability issues? []'s FernandoVM On Sun, Mar 13, 2011 at 10:16 PM, aaron morton wrote: > If it works like all the other virtual machine hosts then yes it can be > hosted. > Performance can always b

Clearsnapshot Problem

2011-03-21 Thread s p
I'm running 3-way CA cluster (0.62 ) on a windows 2008 (jre 1.6.24) 64-bit. Things are running fine except when trying to remove old snaphsot files. When running clearsnapshot I get an error msg like below. I can't remove any daily snapshot files. When trying to delete the actual snapshot file os

RE: Nodes frozen in GC

2011-03-21 Thread Gregory Szorc
> With the large new-gen, you were actually seeing fallbacks to full GC? > You weren't just still experiencing problems because at 10 gig, the new-gen > will be so slow to compact to effectively be similar to a full gc in terms of > affecting latency? Yes, we were seeing fallbacks to full GC with

cassandra nodes with mixed hard disk sizes

2011-03-21 Thread Jonathan Colby
This is a two part question ... 1. If you have cassandra nodes with different sized hard disks, how do you deal with assigning the token ring such that the nodes with larger disks get more data? In other words, given equally distributed token ranges, when the smaller disk nodes run out of s

Re: script to modify cassandra.yaml file

2011-03-21 Thread Jonathan Colby
We use Puppet to manage the cassandra.yaml in a different location from the installation. Ours is in /etc/cassandra/cassandra.yaml You can set the environment CASSANDRA_CONF (i believe it is. check the cassandra.in.sh) and the startup script will pick up this as the configuration file to u

Re: script to modify cassandra.yaml file

2011-03-21 Thread Sasha Dolgy
to elaborate: our_temp_yaml=/tmp/$$.cassandra.yaml cp cassandra.yaml $our_temp_yaml for instance in $instances # do some more work to get the hostname from the instance sed -i "s/^seeds:/seeds: \n - $hostname/" $our_temp_yaml done -- the above inserts a new line for each $hostname into the t

Re: Active / Active Data Center and RF

2011-03-21 Thread aaron morton
No, replicas will always be directed to the same nodes. Otherwise we would not know where to find it. The OldNetworkTopologyStrategy alternated replicas between DC's , but it would still always put them on the same nodes. Aaron On 22 Mar 2011, at 05:31, mcasandra wrote: > I think what I a

Re: Can the Cassandra to be hosted, with all your features and performance, on Microsoft Azure ?

2011-03-21 Thread aaron morton
contrib/py_stress is the easiest way to shake out any issues with your install and get a benchmark. There is also https://github.com/brianfrankcooper/YCSB but I would go with py_stress until it stops been useful. Note: These are abstract benchmarks to be used for entertainment purposes only,

nodetool repair takes forever

2011-03-21 Thread A J
I am trying to estimate the time it will take to rebuild a node. After loading reasonable data, I brought down a node and manually removed all its datafiles for a given keyspace (Keyspace1) I then restarted the node and got i back in the ring. At this point, I wish to run nodetool repair (bin/nodet

Re: EC2 - 2 regions

2011-03-21 Thread Jeremy Hanna
Sorry if I was presumptuous earlier. I created a ticket so that the patch could be submitted and reviewed - that is if it can be generalized so that it works across regions and doesn't adversely affect the common case. https://issues.apache.org/jira/browse/CASSANDRA-2362 On Mar 21, 2011, at 10:

Re: Clearsnapshot Problem

2011-03-21 Thread aaron morton
There have been some issues to with deleting files on windows, cannot find a reference to it happening for snapshots. If you restart the node can you delete the snapshot? Longer term can you upgrade to 0.6.12 and let us know if it happens again? Any fix will be against that version. Hope th

Re: cassandra nodes with mixed hard disk sizes

2011-03-21 Thread aaron morton
1) You should use nodes with the same capacity (CPU, RAM, HDD), cassandra assumes they are all equal. 2) Not sure what exactly would happen. Am guessing either the node would shutdown or writes would eventually block, probably the former. If the node was up read performance may suffer (if ther

Re: nodetool repair takes forever

2011-03-21 Thread aaron morton
Are you monitoring the progress http://wiki.apache.org/cassandra/Streaming ? or with nodetool netstats Aaron On 22 Mar 2011, at 16:33, A J wrote: > I am trying to estimate the time it will take to rebuild a node. After > loading reasonable data, I brought down a node and manually removed > all

Re: EC2 - 2 regions

2011-03-21 Thread Milind Parikh
Patch is attached... I don't have access to Jira. A cautionery note: This is NOT a general solution and is not intended as such. It could be included as a part of larger patch. I will explain in the limitation sections about why it is not a general solution; as I find time. Regards Milind On Mon

How to use join_ring=false?

2011-03-21 Thread Jason Harvey
I set join_ring=false in my java opts: -Djoin_ring=false However, when the node started up, it joined the ring. Is there something I am missing? Using 0.7.4 Thanks, Jason

Re: How to use join_ring=false?

2011-03-21 Thread Chris Goffinet
-Dcassandra.join_ring=false -Chris On Mar 21, 2011, at 10:32 PM, Jason Harvey wrote: > I set join_ring=false in my java opts: > -Djoin_ring=false > > However, when the node started up, it joined the ring. Is there > something I am missing? Using 0.7.4 > > Thanks, > Jason

Re: How to use join_ring=false?

2011-03-21 Thread Jason Harvey
Gah! Thx :) Jason On Mar 21, 10:34 pm, Chris Goffinet wrote: > -Dcassandra.join_ring=false > > -Chris > > On Mar 21, 2011, at 10:32 PM, Jason Harvey wrote: > > > I set join_ring=false in my java opts: > > -Djoin_ring=false > > > However, when the node started up, it joined the ring. Is there > >