Cassandra counter column family performance

2014-05-13 Thread Batranut Bogdan
Hello all, I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter, c counter, d counter After inserting a few million keys... 55 mil, the performance goes down the drain, 2-3 nodes in the cluster are on medium load, and when inserting batches of same lengths writes take longer

Re: Schema disagreement errors

2014-05-13 Thread Duncan Sands
Hi Gaurav, a schema versioning bug was fixed in 2.0.7. Best wishes, Duncan. On 12/05/14 21:31, Gaurav Sehgal wrote: We have recently started seeing a lot of Schema Disagreement errors. We are using Cassandra 2.0.6 with Oracle Java 1.7. I went through the Cassandra FAQ and followed the below ste

Re: How to rebalance a cluster?

2014-05-13 Thread Jeremiah D Jordan
Unless the issue is "I have some giant partitions mixed in with non-giant ones" the usual reason for "data size imbalance" is STCS is being used. You can look at nodetool cfhistograms and cfstats to get info about partition sizes. If you copy the data off to a test node, and run "nodetool compa

Re: Cassandra & MapReduce/Storm/ etc

2014-05-13 Thread Shamim
Hi, check out these following links: 1) http://frommyworkshop.blogspot.ru/search/label/Cassandra 2) http://frommyworkshop.blogspot.ru/2012/07/single-node-hadoop-cassandra-pig-setup.html -- Best regards Shamim A. 11.05.2014, 22:17, "Manoj Khangaonkar" : > Hi, > > Searching for Cassandra with

Re: Cassandra 2.0.7 keeps reporting errors due to no space left on device

2014-05-13 Thread Yatong Zhang
Well, I finally resolved this issue by modifying cassandra to ignore sstables that had size bigger than a threshold. The leveled compaction will fall back to sized tiered compaction in some situation and that's why I always got some old huge sstables compacted. More details can be found in 'Levele

Schema errors when bootstrapping / restarting node

2014-05-13 Thread Adam Cramer
Hi All, I'm having some major issues bootstrapping a new node to my cluster. We are running 1.2.16, with vnodes enabled. When a new node starts up (with auto_bootstrap), it selects a host ID and finds the ring successfully: INFO 18:42:29,559 JOINING: waiting for ring information It successfull

Re: Can Cassandra client programs use hostnames instead of IPs?

2014-05-13 Thread Ben Bromhead
You can set listen_address in cassandra.yaml to a hostname (http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html). Cassandra will use the IP address returned by a DNS query for that hostname. On AWS you don't have to assign an elastic IP, all

Re: Disable reads during node rebuild

2014-05-13 Thread Aaron Morton
> I'm not able to replace a dead node using the ordinary procedure > (boostrap+join), and would like to rebuild the replacement node from another > DC. Normally when you want to add a new DC to the cluster the command to use is nodetool rebuild $DC_NAME .(with auto_bootstrap: false) That will ge

Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-13 Thread Chris Lohfink
Whats your data model look like? > I think it would be best to just disable compactions. Why? are you never doing reads? There is also a cost to repairs/bootstrapping when you have a ton of sstables. This might be a premature optimization. If the data is read from a slice of a partition that

Re: Schema disagreement errors

2014-05-13 Thread Vincent Mallet
Hey Gaurav, You should consider moving to 2.0.7 which fixes a bunch of these schema disagreement problems. You could also play around with nodetool resetlocalschema on the nodes that are behind, but be careful with that one. I'd go with 2.0.7 first for sure. Thanks, Vince. On Mon, May 12, 2

How to balance this cluster out ?

2014-05-13 Thread Oleg Dulin
I have a cluster that looks like this: Datacenter: us-east == Replicas: 2 Address RackStatus State LoadOwns Token 113427455640312821154458202477256070484 *.*.*.1 1

Re: Schema disagreement errors

2014-05-13 Thread Robert Coli
On Tue, May 13, 2014 at 5:11 PM, Donald Smith < donald.sm...@audiencescience.com> wrote: > I too have noticed that after doing “nodetool flush” (or “nodetool > drain”), the commit logs are still there. I think they’re NEW (empty) > commit logs, but I may be wrong. Anyone know? > Assuming they ar

Couter column family performance problems

2014-05-13 Thread Batranut Bogdan
Hello all, I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter, c counter, d counter After inserting a few million keys... 55 mil, the performance goes down the drain, 2-3 nodes in the cluster are on medium load, and when inserting batches of same lengths writes take longer

Datacenter understanding question

2014-05-13 Thread ng
If I have configuration of two data center with one node each. Replication factor is also 1. Will these 2 nodes going to be mirrored/replicated?

Re: Avoiding email duplicates when registering users

2014-05-13 Thread Nikolay Mihaylov
the real question is - if you want the email to be unique, why use "surrogate" primary key as UUID. I wonder what UUID gives you at all? If you want to have non email primary key, why not use md5(email) ? On Wed, May 7, 2014 at 2:19 AM, Tyler Hobbs wrote: > > On Mon, May 5, 2014 at 10:27 AM

Really need some advices on large data considerations

2014-05-13 Thread Yatong Zhang
Hi, We're going to deploy a large Cassandra cluster in PB level. Our scenario would be: 1. Lots of writes, about 150 writes/second at average, and about 300K size per write. 2. Relatively very small reads 3. Our data will be never updated 4. But we will delete old data periodically to free space

RE: Schema disagreement errors

2014-05-13 Thread Donald Smith
I too have noticed that after doing “nodetool flush” (or “nodetool drain”), the commit logs are still there. I think they’re NEW (empty) commit logs, but I may be wrong. Anyone know? Don From: Gaurav Sehgal [mailto:gsehg...@gmail.com] Sent: Monday, May 12, 2014 12:31 PM To: user@cassandra.apach

subscription test - please ignore

2014-05-13 Thread Maciej Miklas

Re: How long are expired values actually returned?

2014-05-13 Thread Sebastian Schmidt
Ah thank you! Am 12.05.2014 16:31, schrieb Peter Reilly: > You need to set grace period as well. > > Peter > > > On Thu, May 8, 2014 at 8:44 AM, Sebastian Schmidt > wrote: > > Hi, > > I'm using the TTL feature for my application. In my tests, when > using a >

RE: Datacenter understanding question

2014-05-13 Thread Romain HARDOUIN
RF=1 means no replication You have to set RF=2 in order to set up a mirroring -Romain ng a écrit sur 13/05/2014 19:37:08 : > De : ng > A : "user@cassandra.apache.org" , > Date : 14/05/2014 04:37 > Objet : Datacenter understanding question > > If I have configuration of two data center with o