Expanding single node to 2 node cluster

2011-04-27 Thread maneela a
Hi, I had a 2 node cassandra cluster with replication factor 2 and OrderPreservingPartitioner but we did not provide InitialToken in the configuration files. One of the node was affected in the recent AWS EBS outage and had been partitioned from cluster. However, I continued to allowed all writ

Re: Manual Conflict Resolution in Cassandra

2011-04-27 Thread Oleg Anastasyev
David Strauss davidstrauss.net> writes: > > You can actually already perform "manual conflict resolution" in > Cassandra by naming your columns so that they don't squash each other in > Cassandra's internal replication. Then, ensure the code that accesses > Cassandra reads all columns with data

Heavy writes ok for single node, but failed for cluster

2011-04-27 Thread Sheng Chen
I succeeded to insert 1 billion records into a single node cassandra, >> bin/stress -d cas01 -o insert -n 10 -c 5 -S 34 -C5 -t 20 Inserts finished in about 14 hours at a speed of 20k/sec. But when I added another node, tests always failed with UnavailableException in an hour. >> bin/stress

Re: encryption_options & 0.8

2011-04-27 Thread Sasha Dolgy
Although it's crude, websphere for example provides a simple, internal hashing algorithm to encrypt the clear text passwords. it's quite easy to decrypt the passwords ... however, it's an extra step that takes a bit more time ... as opposed to saying, "hi, here are my cleartext passwords. have fu

Re: Heavy writes ok for single node, but failed for cluster

2011-04-27 Thread Sylvain Lebresne
On Wed, Apr 27, 2011 at 10:32 AM, Sheng Chen wrote: > I succeeded to insert 1 billion records into a single node cassandra, >>> bin/stress -d cas01 -o insert -n 10 -c 5 -S 34 -C5 -t 20 > Inserts finished in about 14 hours at a speed of 20k/sec. > But when I added another node, tests always

read failed?

2011-04-27 Thread pob
Hello, im expecting this problem: with c-cli: get messagesContent['558a512f30a46f55e75e63f2f816f7435283269f92070618ba9213c0bfac730f']; Returned 33 results. within pycassa code: server_list=['SERVER:9160',], prefill=False, pool_size=15, max_overflow=10, max_retries=-1, timeout=5, pool_

Re: [RELEASE] Apache Cassandra 0.8.0 beta1

2011-04-27 Thread Pierre-Yves Ritschard
Thanks Jonathan, Should I repackage myself or do you think updated Debian packages will be made available shortly ? Regards, - pyr On mar., 2011-04-26 at 11:47 -0500, Jonathan Ellis wrote: > https://issues.apache.org/jira/browse/CASSANDRA-2549 is open to fix this >

Performance tests using stress testing tool

2011-04-27 Thread Baskar Duraikannu
I have setup 4 node cluster for testing and all of them have around 25gb of data. I ran a read and write tests using 100 and 200 threads with each thread reading or writing 50 columns with quorum consistency using stress tool against 4 nodes Test servers have 4 cores and 16gb of ram. While runni

Re: encryption_options & 0.8

2011-04-27 Thread Sasha Dolgy
"IBM WebSphere applies a hardcoded XOR. Each caracter is XOR'd with the caracter ‘_’, and the resulting string is encoded in base64. This is not cryptography, it is just enough encoding so that a casual glance at the file will not reveal the password." I'm sure there are many different options. K

Re: advice for EC2 deployment

2011-04-27 Thread aaron morton
Using the EC2Snitch you could have one AZ in us-east-1 and one Az in us-west-1, treat each AZ as a single rack and each region as a DC. The network topology is rack aware so will prefer request that go to the same rack (not much of an issue when you have only one rack). If possible I would use

Re: encryption_options & 0.8

2011-04-27 Thread David Boxenhorn
How about a more general (and encrypted!) solution: Add a password decryption class to the YAML. If it is not defined, that means the passwords are not encrypted, if it is defined, use it to decrypt the passwords. That way, you need to steal both the YAML and the decryption class if you want to st

Cassandra node throws NPE on startup

2011-04-27 Thread Subscriber
Hi, I'm using Cassandra 0.7.4 on a three node cluster. The cluster was setup yesterday as a fresh installation (no upgrade). The cluster is installed beside a hadoop cluster (I want to discover how cassandra works together with hadoop's map/reduce feature). After loading some test data into th

Re: advice for EC2 deployment

2011-04-27 Thread pankajsoni0126
I have been trying to deploy Cassandra cluster across regions and for that I posted this "IP address resolution in MultiDC setup". But when it is to get nodes talking to each other on different regions say, us-east and us-west over private IP's of EC2 nodes I am facing problems. I am assuming if

Re: advice for EC2 deployment

2011-04-27 Thread pankajsoni0126
I have been trying to deploy Cassandra cluster across regions and for that I posted this "IP address resolution in MultiDC setup". But when it is to get nodes talking to each other on different regions say, us-east and us-west over private IP's of EC2 nodes I am facing problems. I am assuming if

Re: Cassandra node throws NPE on startup

2011-04-27 Thread Subscriber
Hi again, some more remarks. I renamed the commitlog directory on the third node so that cassandra cannot see it on startup. Now the node starts fine. The problem seems to have something to do with the commitlogs... Best Regards Udo Am 27.04.2011 um 13:22 schrieb Subscriber: > Hi, > > I

Re: advice for EC2 deployment

2011-04-27 Thread Sasha Dolgy
Hi, If I understand you correctly, you are trying to get a private ip in us-east speaking to the private ip in us-west. to make your life easier, configure your nodes to use hostname of the server. if it's in a different region, it will use the public ip (ec2 dns will handle this for you) and if

Re: OOM on heavy write load

2011-04-27 Thread Nikolay Kоvshov
I have set quite low memory consumption (see my configuration in first message) and give Cassandra 2.7 Gb of memory. I cache 1M of 64-bytes keys + 64 Mb memtables. I believe overhead can't be 500% or so ? memtable operations in millions = default 0.3 I see now very strange behaviour If i fil

Re: advice for EC2 deployment

2011-04-27 Thread William Oberman
While I haven't configured it for multi-region yet, Sasha is exactly right now how amzon's DNS works (returning private vs. public IP depending on if the machine is local to the region or not). For extra fun, now that Route53 exists you can (somewhat trivially) map and dynamically maintain all EC2

Re: Expanding single node to 2 node cluster

2011-04-27 Thread Maki Watanabe
Why don't you just add new node to ring and removetoken of the bad one? 2011/4/27 maneela a > > Hi, > I had a 2 node cassandra cluster with replication factor 2 and > OrderPreservingPartitioner but we did not provide InitialToken in > the configuration files. One of the node was affected in the

Re: advice for EC2 deployment

2011-04-27 Thread William Oberman
It's great advice, but I'm still torn. I've never done multi-region work before, and I'd prefer to wait for 0.8 with built-in inter-node security, but I'm otherwise ready to roll (and need to roll) cassandra out sooner than that. Given how well my system held up with a total single AZ failure, I'

Re: advice for EC2 deployment

2011-04-27 Thread Sasha Dolgy
Hi William, The default behavior of Ec2Snitch is outlined below: http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/locator/Ec2Snitch.java // Split "us-east-1a" or "asia-1a" into "us-east"/"1a" and "asia"/"1a". String azone = new String(b ,"UTF-8");

Re: advice for EC2 deployment

2011-04-27 Thread Sasha Dolgy
if you migrate the instance, does Route53 automatically re-map all the information to the new ec2 instance? another issue is that cassandra only maintains the IP of the other nodes, and not the hostname (assumed based on output of the nodetool ring) ... which means, if you migrate the instance a

Re: advice for EC2 deployment

2011-04-27 Thread William Oberman
Thanks Sasha. Fortunately/unfortunately I did realize the default & current behavior of the Ec2Snitch, but my application isn't multi-region capable (yet), so I need to get intra-region redundancy. And having a SingleRegionEc2Snitch that did DC=ec2zone and RACK=??? would be much better for me (fo

Re: advice for EC2 deployment

2011-04-27 Thread William Oberman
I don't think of it as migrating an instance, it's more of a destroy/start with EC2. But, I still think it would be very useful to spin up a set of instances with known hostnames (cassandra1, 2, 3... N) and be able to quickly SSH to them by doing "ssh ec2u...@cassandra1.random.ec2.mydomain.com ".

Re: advice for EC2 deployment

2011-04-27 Thread William Oberman
Oh, and Route53 doesn't do anything automatically, but there is an API to manage the DNS. It's up to you to run a task on instance boot/terminate, or a cron job if you want to do this trick (for now, seems like a solid future feature of Route53). Though, I hear geographical aware Route53 is alrea

Re: advice for EC2 deployment

2011-04-27 Thread Sasha Dolgy
so can you not simply leverage a strategy that replicates data between "racks" and at some point in the future when you move to multi-dc upgrade the replication strategy to maintain the current replication and add in some replication between DC's ... ? i'll go re-read your posts to see if you've a

nodes reference by hostname and not IP

2011-04-27 Thread Sasha Dolgy
Hi , Silly question maybe ... but came to me in the Ec2 thread. Is there a design reason why cassandra stores nodes as IP addresses and not hostnames? -- Sasha Dolgy sasha.do...@gmail.com

Re: [RELEASE] Apache Cassandra 0.8.0 beta1

2011-04-27 Thread Stephen Connolly
Similar issue with the RPMs from riptano On 27 April 2011 11:01, Pierre-Yves Ritschard wrote: > Thanks Jonathan, > > Should I repackage myself or do you think updated Debian packages will > be made available shortly ? > > Regards, > - pyr > > On mar., 2011-04-26 at 11:47 -0500, Jonathan Ellis wro

Re: advice for EC2 deployment

2011-04-27 Thread William Oberman
I think you're right about changing NetworkToplogyStrategy, but the timing isn't working in my favor at this point. I wonder how bad that will really be On Wed, Apr 27, 2011 at 9:35 AM, Sasha Dolgy wrote: > so can you not simply leverage a strategy that replicates data between > "racks" and

suggestion: sstable2json to ignore TTL

2011-04-27 Thread Timo Nentwig
Hi! What about a simple option for sstable2json to not print out expiration TTL+LocalDeletionTime (maybe even ignore isMarkedForDelete)? I want to move old data from a live cluster (with TTL) to an archive cluster (->data does not expire there). BTW is there a smarter way to do this? Actually

Thrift client thread is locked. (TSocket is initialized with _timeout)

2011-04-27 Thread 박용욱
Hello. I have a problem with thrift client socket. Server (0.7.4) - 6 nodes cluster - reboot 1 node(EC2 instance) suddenly. Client (hector-core-0.7.0-22, libthrift-0.5) - hector's cassandraThriftSocketTimeout option is set to 3ms and *It initiated TSocket with same timeout(Socket.setSoTimeo

Re: suggestion: sstable2json to ignore TTL

2011-04-27 Thread Edward Capriolo
On Wed, Apr 27, 2011 at 9:40 AM, Timo Nentwig wrote: > Hi! > > What about a simple option for sstable2json to not print out expiration > TTL+LocalDeletionTime (maybe even ignore isMarkedForDelete)? I want to move > old data from a live cluster (with TTL) to an archive cluster (->data does > not

Seeking "Cassandra in production" speaker volunteers! (free beer on offer)

2011-04-27 Thread Dave Gardner
Hi all Influenced by the up and coming "Redis in production" meetup in London, I'm on the lookout for volunteers to speak at a "Cassandra in production" meetup (again, in London). You will get the satisfaction of becoming "Internet famous", plus I will personally buy you a beer. Links: http://www

Re: suggestion: sstable2json to ignore TTL

2011-04-27 Thread Timo Nentwig
On Apr 27, 2011, at 15:58, Edward Capriolo wrote: > Hacking a separate copy of SSTable2json is trivial. Just look for the > section of the code that writes the data and change what it writes. If I did. The method's private... > you can make it a knob --nottl then it could be included in Cassand

JDBC Driver issue in 0.8beta1

2011-04-27 Thread David McNelis
I have a feeling that I'm likely doing something dumb. I have the following code compiling without any issues: String url = null; try { Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver"); url = "jdbc:cassandra:username/password@localhost:9160/keyspace"; Connection co

Re: suggestion: sstable2json to ignore TTL

2011-04-27 Thread Edward Capriolo
On Wed, Apr 27, 2011 at 10:16 AM, Timo Nentwig wrote: > > On Apr 27, 2011, at 15:58, Edward Capriolo wrote: > >> Hacking a separate copy of SSTable2json is trivial. Just look for the >> section of the code that writes the data and change what it writes. If > > I did. The method's private... > >> y

Re: suggestion: sstable2json to ignore TTL

2011-04-27 Thread Timo Nentwig
On Apr 27, 2011, at 16:52, Edward Capriolo wrote: > The method being private is not a deal-breaker.While not good software > engineering practice you can copy and paste the code and renamed the > class SSTable2MyJson or whatever. Sure I can do this but I'd like to have it just available in the d

Re: suggestion: sstable2json to ignore TTL

2011-04-27 Thread Edward Capriolo
On Wed, Apr 27, 2011 at 10:59 AM, Timo Nentwig wrote: > > On Apr 27, 2011, at 16:52, Edward Capriolo wrote: > >> The method being private is not a deal-breaker.While not good software >> engineering practice you can copy and paste the code and renamed the >> class SSTable2MyJson or whatever. > > S

[RELEASE] Apache Cassandra 0.7.5 released

2011-04-27 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra version 0.7.5. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassan

memtablePostFlusher blocking writes?

2011-04-27 Thread Terje Marthinussen
0.8 trunk: When playing back a fairly large chunk of hints, things basically locks up under load. The hints are never processed successfully. Lots of Mutations dropped. One thing is that maybe the default 10k columns per send with 50ms delays is a bit on the aggressive side (10k*20 =200.000 colum

nodetool hanging

2011-04-27 Thread William Oberman
I've figured this out, but to help those out there who don't want to waste an hour like me debugging a hung "nodetool ring" command: JMX opens a second random port, so you either have to disable any firewalls between the machine running nodetool and the cassandra instance (or there are complicated

Re: nodes reference by hostname and not IP

2011-04-27 Thread Milind Parikh
Most likely because in the wild, you can't assume a reliable DNS. Just as an aside...This question comes up often in context of managing Cassandra clusters;especially in elastic situations. Most CMDBs assume a static name (host names/static IPs) for nodes. However this often proves to be mismatche

Re: [RELEASE] Apache Cassandra 0.8.0 beta1

2011-04-27 Thread Nate McCall
Indeed. This has been fixed and redeployed. Thanks Stephen. On Wed, Apr 27, 2011 at 8:38 AM, Stephen Connolly wrote: > Similar issue with the RPMs from riptano > > On 27 April 2011 11:01, Pierre-Yves Ritschard wrote: >> Thanks Jonathan, >> >> Should I repackage myself or do you think updated Deb

Re: memtablePostFlusher blocking writes?

2011-04-27 Thread Jonathan Ellis
MPF is indeed pretty lightweight, but since its job is to mark the commitlog replay position after a flush -- which has to be done in flush order to preserve correctness in failure scenarios -- you'll see the pending op count go up when you have multiple flushes happening. This is expected. Your r

Re: encryption_options & 0.8

2011-04-27 Thread David Strauss
On Wed, 2011-04-27 at 12:56 +0200, Sasha Dolgy wrote: > "IBM WebSphere applies a hardcoded XOR. Each caracter is XOR'd with > the caracter ‘_’, and the resulting string is encoded in base64. This > is not cryptography, it is just enough encoding so that a casual > glance at the file will not reveal

Re: Apt repositories

2011-04-27 Thread David Strauss
On Tue, 2011-04-26 at 19:03 -0500, Eric Evans wrote: > There is one for each version now (06x, 07x, and 08x). The unstable > suite continues to point to latest-and-greatest. The wiki has been > updated. Where, exactly, is this on the wiki? I had been using the CloudConfig page [1], which still on

Re: Apt repositories

2011-04-27 Thread Jonathan Ellis
On Wed, Apr 27, 2011 at 1:46 PM, David Strauss wrote: > On Tue, 2011-04-26 at 19:03 -0500, Eric Evans wrote: >> There is one for each version now (06x, 07x, and 08x). The unstable >> suite continues to point to latest-and-greatest.  The wiki has been >> updated. > > Where, exactly, is this on the

Pygmalion - a github project for pig + cassandra

2011-04-27 Thread Jeremy Hanna
Hi all, A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like: FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (nam

Re: Apt repositories

2011-04-27 Thread Jeremy Hanna
Thanks Eric! On Apr 26, 2011, at 7:03 PM, Eric Evans wrote: > On Sat, 2011-04-23 at 16:49 -0700, David Strauss wrote: >> I just noticed that, following the Cassandra 0.8 beta release, the Apt >> repository is encouraging servers in my clusters to upgrade. Beta >> releases should probably be on di

Re: Pygmalion - a github project for pig + cassandra

2011-04-27 Thread Jonathan Ellis
Nice! On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna wrote: > Hi all, > > A little while back, I started a project called pygmalion for example scripts > and UDFs for people using Pig with Cassandra.  Currently there are a few > handy UDFs in there like: > > FromCassandraBag: a way to convert fr

0.7.4: Replication assertion error after removetoken, removetoken force and a restart

2011-04-27 Thread Alexis Lê-Quôc
Hi, I've been getting the following lately, every few seconds. 2011-04-27T20:21:18.299885+00:00 10.202.61.193 [MiscStage: 97] Error in ThreadPoolExecutor 2011-04-27T20:21:18.299885+00:00 10.202.61.193 java.lang.AssertionError 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193 at org.a

Re: Compacting single file forever

2011-04-27 Thread Jonathan Ellis
https://issues.apache.org/jira/browse/CASSANDRA-2575 On Thu, Apr 21, 2011 at 11:56 PM, Jonathan Ellis wrote: > I suggest as a workaround making the forceUserDefinedCompaction method > ignore disk space estimates and attempt the requested compaction even > if it guesses it will not have enough spa

Re: JDBC Driver issue in 0.8beta1

2011-04-27 Thread Jonathan Ellis
What's the stacktrace? On Wed, Apr 27, 2011 at 9:45 AM, David McNelis wrote: > I have a feeling that I'm likely doing something dumb.  I have  the > following code compiling without any issues: > String url = null; > try { >      Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver"); >  

Re: Cassandra node throws NPE on startup

2011-04-27 Thread Aaron Morton
What approach did you take to restarting the cluster? It looks like the keyspace name was changed and the log replay tried to write to the old one. Aaron On 28/04/2011, at 12:03 AM, Subscriber wrote: > Hi again, > > some more remarks. > I renamed the commitlog directory on the third node

Re: JDBC Driver issue in 0.8beta1

2011-04-27 Thread David McNelis
Attached: 21 [main] INFO org.apache.cassandra.cql.jdbc.Connection - Connected to localhost:9160 Exception in thread "main" org.apache.cassandra.cql.jdbc.DriverResolverException: Required field 'replication_factor' was not found in serialized data! Struct: KsDef(name:system, strategy_class:org.apac

Re: JDBC Driver issue in 0.8beta1

2011-04-27 Thread Jonathan Ellis
That looks to me like it's using the thrift definitions from the 0.7 jar, rather than the 0.8. Are you sure the old Cassandra jar is no longer on your classpath? On Wed, Apr 27, 2011 at 4:29 PM, David McNelis wrote: > Attached: > 21 [main] INFO org.apache.cassandra.cql.jdbc.Connection - Connecte

Re: OOM on heavy write load

2011-04-27 Thread Aaron Morton
I'm a bit confused by the two different cases you described, so cannot comment specially on your case. In general if Cassandra is slowing down take a look at the thread pool stats, using nodetool tpstats to see where it is backing up and take at look at the logs to check for excessive GC. If no

Re: JDBC Driver issue in 0.8beta1

2011-04-27 Thread David McNelis
That was my issue. As suspected, falls into the "I must be doing something dumb" category. Thank you, Jonathon. On Wed, Apr 27, 2011 at 4:32 PM, Jonathan Ellis wrote: > That looks to me like it's using the thrift definitions from the 0.7 > jar, rather than the 0.8. Are you sure the old Cassan

Re: Expanding single node to 2 node cluster

2011-04-27 Thread Aaron Morton
You could try... - delete / move the system data directory - set the initial_token for each node to what they were before - restart and recreate the schema - run repair and then clean It would have been a good idea to drain the nodes, this would checkpoint the logs and clear them. If you do not

Re: nodes reference by hostname and not IP

2011-04-27 Thread Aaron Morton
It stores them, but they are not as important as the token. I.e. You can shutdown the node and bring it back on another ip and gossip with sort it out. Aaron On 28/04/2011, at 4:52 AM, Milind Parikh wrote: > > Most likely because in the wild, you can't assume a reliable DNS. > > Just as a

Re: memtablePostFlusher blocking writes?

2011-04-27 Thread Terje Marthinussen
It is a good question what is the problem here. I dont think it is the pending mutations and flushes, the real problem is what causes them, and it is not me! There was maybe a misleading comment in my original mail. It is not the hinted handoffs sent from this node that is the problem, but the 1.6

Re: memtablePostFlusher blocking writes?

2011-04-27 Thread Jonathan Ellis
On Wed, Apr 27, 2011 at 5:23 PM, Terje Marthinussen wrote: > I have two issues here. > - The massive amount of mutation caused by the hints playback I'm not sure how one node playing back hints could cause this. The intent of the code in HintedHandoffManager is to send a single mutation, wait fo

Re: Apt repositories

2011-04-27 Thread David Strauss
On Tue, 2011-04-26 at 19:03 -0500, Eric Evans wrote: > There was already a repo for cassandra-0.6 (called 06x), it just fell > through the cracks with the last release. > > There is one for each version now (06x, 07x, and 08x). The unstable > suite continues to point to latest-and-greatest. The w

Dropping a built in secondary index on a CF

2011-04-27 Thread Roshan Dawrani
Hi, Can someone please tell me how I can drop a built in secondary index on a column family attribute? I don't see any direct command to do that in the CLI help. -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani Skype: roshandawrani

Re: Dropping a built in secondary index on a CF

2011-04-27 Thread Xaero S
Hi, You just need to use the update column family command on the cassandra-cli and specify the columns and their metadata. To get the metadata of the columns in the CF, you can do describe keyspace . Keep in mind that, in your update CF command, the other columns that must continue to have the sec

Re: Performance tests using stress testing tool

2011-04-27 Thread Peter Schuller
> a) I am not seeing cpu usage more  than 10pct. Sounds like the benchmarking client is bottlenecking. > In some of the forums, i see > that 8 cpu 32 gb is considered as good sweet spot for cassandra. Is this > true? Seems reasonable in a very general sense, but of course varies with use-case.

Re: Dropping a built in secondary index on a CF

2011-04-27 Thread Roshan Dawrani
On Thu, Apr 28, 2011 at 9:56 AM, Xaero S wrote: > > You just need to use the update column family command on the cassandra-cli > and specify the columns and their metadata. To get the metadata of the > columns in the CF, you can do describe keyspace . Keep in mind > that, in your update CF comman