Re: Migration from Cassandra 1.2.5 to Cassandra 2.0.8 with changed partitioner settings

2014-07-30 Thread tsi
Well, the new Cassandra cluster is already setup with the different partitioner settings and there are already other applications running on it. So the task is to migrate our application data to this new cluster to avoid setting up a dedicated Cassandra cluster just for our application. -- Vie

Cassandra - Pig integration

2014-07-30 Thread Akshay Ballarpure
Hello, I am trying to integrate cassandra into Hadoop and PIG and trying to load CSV file into Cassandra using PIG Script. Can someone help ? root@hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig# cat pigCasandra.pig data = LOAD 'example.csv' using PigStorage(',') AS (row_id: chararray,

Question about Vnodes

2014-07-30 Thread Rahul Neelakantan
Given the issue with repairs and Vnodes (currently expected to be fixed with the 3.0 release) I am considering reducing the number of tokens per node. One of my clusters has 8 nodes in it with 256 tokens per node. The main keyspace on it has 40+ column families and nodetool repair takes extremel

Re: Migration from Cassandra 1.2.5 to Cassandra 2.0.8 with changed partitioner settings

2014-07-30 Thread Robert Coli
On Tue, Jul 29, 2014 at 6:33 AM, DuyHai Doan wrote: > No, I meant why don't you set the partitioner of 2.0.8 cluster to > RandomPartitioner instead of Mumur3 ? > +1 Murmur3 is a semi-marginal win, OP probably does not need it on this particular cluster. =Rob

Re: Measuring WAN replication latency

2014-07-30 Thread Rahul Neelakantan
Agreed... This is what we are trying right now. Rahul Neelakantan > On Jul 30, 2014, at 1:43 PM, Jeremy Jongsma wrote: > > Yes, the results should definitely not be relied on as a future performance > indicator for key app functionality. but knowing roughly what your current > replication lat

Re: Measuring WAN replication latency

2014-07-30 Thread Jeremy Jongsma
Yes, the results should definitely not be relied on as a future performance indicator for key app functionality. but knowing roughly what your current replication latency is (and whether it's outside of the normal average) can inform client failover policies, debug data consistency issues, warn of

Re: Measuring WAN replication latency

2014-07-30 Thread Robert Coli
On Wed, Jul 30, 2014 at 6:59 AM, Rahul Neelakantan wrote: > Any ideas you can provide on how to do this will be appreciated, we would > like to build a latency monitoring tool/dashboard that shows how long it > takes for data to get sent across various DCs. > The brute force method described dow

Re: Cassandra 2 Upgrade

2014-07-30 Thread Langston, Jim
Hi Rob, Did you every create this blog post ? Jim From: Robert Coli mailto:rc...@eventbrite.com>> Reply-To: mailto:user@cassandra.apache.org>> Date: Wed, 11 Sep 2013 10:38:58 -0700 To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Subject: Re:

RE: bootstrapping new nodes on 1.2.12

2014-07-30 Thread Parag Patel
My understanding of a 9 second GC seems to be very off based on the gossip logs. Correct me if im wrong, but the “handshaking version” is just a log for it to attempt to connect to the other nodes? Manual FGC 2:01:02 - Node1 full GC 2:01:25 - Node2 detects node1 DOWN 2:01:27 - Node2 handsha

Re: Error while converting data from sstable to json with sstable2json

2014-07-30 Thread Chris Lohfink
Its stored as bytes, depending completely on what is given to it. If I were to guess I would say this looks like a composite partition key of utf8 values separated with control character (0) and a length of the next key. i.e. PRIMARY KEY ((uid, vendor, x), timestamp, y) Chris Lohfink On Jul 3

Re: Measuring WAN replication latency

2014-07-30 Thread Jeremy Jongsma
The brute force way would be: 1) Make client connections to a node in each datacenter from your monitoring tool. 2) Periodically write a row to one datacenter (at whatever consistency level your application typically uses.) 3) Immediately query the other datacenter nodes for the same row key with

Re: Measuring WAN replication latency

2014-07-30 Thread Rahul Neelakantan
Rob, Any ideas you can provide on how to do this will be appreciated, we would like to build a latency monitoring tool/dashboard that shows how long it takes for data to get sent across various DCs. Rahul Neelakantan > On Jul 29, 2014, at 8:53 PM, Robert Coli wrote: > >> On Tue, Jul 29, 2014

Re: bootstrapping new nodes on 1.2.12

2014-07-30 Thread Mark Reddy
> > Our Full GC’s take about 9 seconds. If we were to increase the > phi_convict_threshold to not take a node offline for a 9 second > unavailability, what negative side effects can there be? When you observe these GC's do you also see the node being marked down and then back up ~9 seconds later

RE: bootstrapping new nodes on 1.2.12

2014-07-30 Thread Parag Patel
As to why we do it, we need to reevaluate because the GC optimizations we’ve made recently probably don’t require it anymore. However, prior to our optimizations we observed a benefit at our peak time. When we force a GC, we don’t remove it from the ring. This seems like a fundamental flaw in

Re: Authentication exception

2014-07-30 Thread Jeremy Jongsma
Yes, and all nodes have had at least two more scheduled repairs since then. On Jul 30, 2014 1:47 AM, "Or Sher" wrote: > Did you ran a repair after changing replication factor for system_auth ? > > > On Tue, Jul 29, 2014 at 5:48 PM, Jeremy Jongsma > wrote: > >> This is still happening to me; is t

Re: bootstrapping new nodes on 1.2.12

2014-07-30 Thread Mark Reddy
HI Parag, I see this output my log many times over for 2 nodes. We have a cron entry > across all clusters that force a full GC at 2 AM. node1 is due to Full > GC that was scheduled (I can disable this). Node2 was due to a Full GC > that occurred during our peak operation (these happen occasion

Error while converting data from sstable to json with sstable2json

2014-07-30 Thread ankit tyagi
Hi, I am using sstable2json to convert data into json from sstable. it gives me data in below format. {"key": *"000d55494430303030303037383530063932376561640a524541445355444f303100*","columns": [["1406126067358:8:","",1406126067369000], ["1406126067358:8:errormessage","53cfc7f3",140612606

RE: bootstrapping new nodes on 1.2.12

2014-07-30 Thread Parag Patel
Mark, I see this output my log many times over for 2 nodes. We have a cron entry across all clusters that force a full GC at 2 AM. node1 is due to Full GC that was scheduled (I can disable this). Node2 was due to a Full GC that occurred during our peak operation (these happen occasionally, w

Re: bootstrapping new nodes on 1.2.12

2014-07-30 Thread Mark Reddy
> > Thanks for the detailed response. I checked ‘nodetool netstats’ and I see > there are pending streams, all of which are stuck at 0%. I was expecting > to see at least one output that was more than 0%. Have you seen this > before? This could indicate that the bootstrap process is hung due t

RE: bootstrapping new nodes on 1.2.12

2014-07-30 Thread Parag Patel
Thanks for the detailed response. I checked ‘nodetool netstats’ and I see there are pending streams, all of which are stuck at 0%. I was expecting to see at least one output that was more than 0%. Have you seen this before? Side question – does a new node stream from other nodes in any partic

dropping secondary indexes

2014-07-30 Thread Parag Patel
Hi, I've noticed that our datamodel has many unnecessary secondary indexes. Are there a recommended procedure to drop a secondary index on a very large table? Is there any sort of repair/cleanup that should be done after calling the DROP command? Thanks, Parag

Re: Migration from Cassandra 1.2.5 to Cassandra 2.0.8 with changed partitioner settings

2014-07-30 Thread Kais Ahmed
Sorry my advice is not good for you, if you are moving to another platform with a different portionner, i think sstableloader is the right tool for you. http://www.datastax.com/docs/1.1/references/bulkloader http://www.datastax.com/dev/blog/bulk-loading 2014-07-30 10:51 GMT+02:00 Hao Cheng : >

Re: Migration from Cassandra 1.2.5 to Cassandra 2.0.8 with changed partitioner settings

2014-07-30 Thread Hao Cheng
No idea if this will help, but have you tried the sstable2json and json2sstable utilities to output json from your old cluster and import it into the new one? On Wed, Jul 30, 2014 at 1:40 AM, Kais Ahmed wrote: > hi tsi, > > You have you upgrade to 1.2.9 first. > > 2.0.0 > = > > Upgrading >

Re: bootstrapping new nodes on 1.2.12

2014-07-30 Thread Mark Reddy
Hi Parag, 1) Would anyone be able to help me interrupt this information from > OpsCenter? At a high level bootstrapping a new node has two phases, streaming and secondary index builds. I believe OpsCenter will only report active streams, the pending stream will be listed as such in OpsCente

Re: Migration from Cassandra 1.2.5 to Cassandra 2.0.8 with changed partitioner settings

2014-07-30 Thread Kais Ahmed
hi tsi, You have you upgrade to 1.2.9 first. 2.0.0 = Upgrading - - Java 7 is now *required*! - Upgrading is ONLY supported from Cassandra 1.2.9 or later. This goes for sstable compatibility as well as network. When upgrading from an earlier release, upgrade to 1.