date:20130516

Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Boris Yen

Hi All, Sorry for the wide distribution. Our cassandra is running on 1.0.10. Recently, we are facing a weird situation. We have a column family containing wide rows (each row might have a few million of columns). We delete the columns on a daily basis and we also run major compaction on it everyd

Re: Decommission nodes starts to appear from one node (1.0.11)

2013-05-16 Thread Roshan

I found this bug, seems it is fixed. But I can see that in my situation, the decommission node still I can see from the JMX console LoadMap attribute. Might this is the reason why hector says not enough replica?? Experts, any thoughts?? Thanks. -- View this message in context: http://cassand

Re: Decommission nodes starts to appear from one node (1.0.11)

2013-05-16 Thread Alain RODRIGUEZ

Not sure to understand you correctly, but if you are dealing with ghost nodes that you want to remove, I never saw a node that could resist to an "unsafeAssassinateEndpoint". http://grokbase.com/t/cassandra/user/12b9eaaqq4/remove-crashed-node http://grokbase.com/t/cassandra/user/133nmsm3hd/removin

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-16 Thread Alain RODRIGUEZ

@Rob: Thanks about the feedback. Yet I have a weird behavior still unexplained about repairing. Are counters supposed to be "repaired" too ? I mean, while reading at CL.ONE I can have different values depending on what node is answering. Even after a read repair or a full repair. Shouldn't a repai

vnodes ready for production ?

2013-05-16 Thread Alain RODRIGUEZ

Hi, Adding vnodes is a big improvement to Cassandra, specifically because we have a fluctuating load on our Cassandra depending on the week, and it is quite annoying to add some nodes for one week or two, move tokens and then having to remove them and then move tokens again. Even more if we could

best practices on EC2 question

2013-05-16 Thread Brian Tarbox

>From this list and the NYC* conference it seems that the consensus configuration of C* on EC2 is to put the data on an ephemeral drive and then periodically back it the drive to S3...relying on C*'s inherent fault tolerance to deal with any data loss. Fine, and we're doing this...but we find that

SSTable size versus read performance

2013-05-16 Thread Keith Wright

Hi all, I currently have 2 clusters, one running on 1.1.10 using CQL2 and one running on 1.2.4 using CQL3 and Vnodes. The machines in the 1.2.4 cluster are expected to have better IO performance as we are going from 1 SSD data disk per node in the 1.1 cluster to 3 SSD data disks per node

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo

I am not sure of the new default is to use compression, but I do not believe compression is a good default. I find compression is better for larger column families that are sparsely read. For high throughput CF's I feel that decompressing larger blocks hurts performance more then compression adds.

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright

The biggest reason I'm using compression here is that my data lends itself well to it due to the composite columns. My current compression ratio is 30.5%. Not sure it matters but my BF false positive ration os 0.048. From: Edward Capriolo mailto:edlinuxg...@gmail.com>> Reply-To: "user@cassandr

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo

With you use compression you should play with your block size. I believe the default may be 32K but I had more success with 8K, nearly same compression ratio, less young gen memory pressure. On Thu, May 16, 2013 at 10:42 AM, Keith Wright wrote: > The biggest reason I'm using compression here is

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright

Does Cassandra need to load the entire SSTable into memory to uncompress it or does it only load the relevant block? I ask because if its the latter, that would not explain why I'm seeing so much higher read MB/s in the 1.2 cluster as the block sizes are the same in both. From: Edward Capriolo

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-16 Thread Janne Jalkanen

Might you be experiencing this? https://issues.apache.org/jira/browse/CASSANDRA-4417 /Janne On May 16, 2013, at 14:49 , Alain RODRIGUEZ wrote: > @Rob: Thanks about the feedback. > > Yet I have a weird behavior still unexplained about repairing. Are counters > supposed to be "repaired" too ?

Re: best practices on EC2 question

2013-05-16 Thread Janne Jalkanen

On May 16, 2013, at 17:05 , Brian Tarbox wrote: > An alternative that we had explored for a while was to do a two stage backup: > 1) copy a C* snapshot from the ephemeral drive to an EBS drive > 2) do an EBS snapshot to S3. > > The idea being that EBS is quite reliable, S3 is still the emergency

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-16 Thread Alain RODRIGUEZ

I indeed had some of those in the past. But my point is not that much to understand how I can get different counts depending on the node (I consider this as a weakness of counters and I am aware of it), my wonder is more why those inconsistent, distinct counters never converge even after a repair.

Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Louvet, Jacques

Boris, We hit exactly the same issue, and you are correct the newly created SSTables are the cause of why most of the column-tombstone not being purged. There is an improvement in 1.2 train where both the minimum and maximum timestamp for a row is now stored and used during the compaction to de

Re: SSTable size versus read performance

2013-05-16 Thread Igor

My 5 cents: I'd check blockdev --getra for data drives - too high values for readahead (default to 256 for debian) can hurt read performance. On 05/16/2013 05:14 PM, Keith Wright wrote: Hi all, I currently have 2 clusters, one running on 1.1.10 using CQL2 and one running on 1.2.4 using CQ

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright

We actually have it set to 512. I have tried decreasing my SSTable size to 5 MB and changing the chunk size to 8 kb (and run an sstableupgrade to ensure they took effect) but am still seeing similar performance. Is anyone running lz4 compression in production? I'm thinking of reverting back t

Re: SSTable size versus read performance

2013-05-16 Thread Bryan Talbot

512 sectors for read-ahead. Are your new fancy SSD drives using large sectors? If your read-ahead is really reading 512 x 4KB per random IO, then that 2 MB per read seems like a lot of extra overhead. -Bryan On Thu, May 16, 2013 at 12:35 PM, Keith Wright wrote: > We actually have it set to

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo

I was going to say something similar I feel like the SSD drives read much "more" then the standard drive. Read Ahead/arge sectors could and probably does explain it. On Thu, May 16, 2013 at 3:43 PM, Bryan Talbot wrote: > 512 sectors for read-ahead. Are your new fancy SSD drives using large > se

Re: SSTable size versus read performance

2013-05-16 Thread Igor

just in case it will be useful to somebody - here is my checklist for better read performance from SSD 1. limit read-ahead to 16 or 32 2. enable 'trickle_fsync' (available starting from cassandra 1.1.x) 3. use 'deadline' io-scheduler (much more important for rotational drives then for SSD) 4.

Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Edward Capriolo

This makes sense. Unless you are running major compaction a delete could only happen if the bloom filters confirmed the row was not in the sstables not being compacted. If your rows are wide the odds are that they are in most/all sstables and then finally removing them would be tricky. On Thu, Ma

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright

Thank you for that. I did not have trickle_fsync enabled and will give it a try. I just noticed that when running a describe on my table, I do not see the sstable size parameter (compaction_strategy_options = {'sstable_size_in_mb':5}) included. Is that expected? Does it mean its using the de

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo

lz4 is supposed to achieve similar compression while using less resources then snappy. It is easy to test, just change then run a 'nodetool rebuild' . Not sure when lz4 was introduced but being that it is new to cassandra there may not be many large deployments running it yet. On Thu, May 16, 201

Re: Upgrade 1.1.10 -> 1.2.4

2013-05-16 Thread Everton Lima

But the problem is that I would like to use Cassandra embeeded? This is not possible any more? 2013/5/15 Edward Capriolo > > You are doing something wrong. What I was suggesting is only a hack for > unit tests. Your not supposed to interact with CassandraServer directly > like that as a client.

Re: Exception when running YCSB and Cassandra

2013-05-16 Thread aaron morton

You're nodes are overloaded. I'd recommend using m1.xlarge instead. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 15/05/2013, at 1:59 PM, Rodrigo Felix wrote: > Hi, > >I'm executing a workload on YCSB (50

Re:

2013-05-16 Thread aaron morton

Try the IRC room for the java driver or submit a ticket on the JIRA system, see the links here https://github.com/datastax/java-driver Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 15/05/2013, at 5:50 PM, bjbylh

Re: how to access data only on specific node

2013-05-16 Thread aaron morton

Are you using a multi get or a range slice ? Read Repair does not run for range slice queries. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 15/05/2013, at 6:51 PM, Sergey Naumov wrote: >> see that RR works, bu

Re: The action of the file system at drop column family execution

2013-05-16 Thread aaron morton

> When "drop column family" is executed irrespective of the existence of > generation of Snapshot, $KS/$CF/ directory certainly remains. I don't think there is any code there to delete the empty directories. We only care about the files in there. Cheers - Aaron Morton Freelan

Re: How to add new DC to cluster when GossipingPropertyFileSnitch is used

2013-05-16 Thread aaron morton

You should configure the seeds as recommended regardless of the snitch used. You need to update the yaml file to start using the GossipingPropertyFileSnitch but after that it reads the cassandra-rackdc.properties file to get information about the node. It reads uses the information in gossip to

Re: Multiple cursors

2013-05-16 Thread aaron morton

We don't have cursors in the RDBMS sense of things. If you are using thrift the recommendation is to use connection pooling and re-use connections for different requests. Note that you can not multiplex queries over the same thrift connection, you must wait for the response before issuing anoth

Re: C++ Thrift client

2013-05-16 Thread aaron morton

(Assuming you have enabled tcp_nodelay on the client socket) Check the server side latency, using nodetool cfstats or nodetool cfhistograms. Check the logs for messages from the GCInspector about ParNew pauses. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand

Re:

2013-05-16 Thread Dave Brosius

what version of netty is on your classpath? On 05/16/2013 07:33 PM, aaron morton wrote: Try the IRC room for the java driver or submit a ticket on the JIRA system, see the links here https://github.com/datastax/java-driver Cheers - Aaron Morton Freelance Cassandra Consultant

Re: Decommission nodes starts to appear from one node (1.0.11)

2013-05-16 Thread Roshan

Thanks. This is kind of a expert advice for me. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Decommission-nodes-starts-to-appear-from-one-node-1-0-11-tp7587842p7587876.html Sent from the cassandra-u...@incubator.apache.org mailing list archi

Re: pycassa failures in large batch cycling

2013-05-16 Thread John R. Frank

On Tue, 14 May 2013, aaron morton wrote: After several cycles, pycassa starts getting connection failures. Do you have the error stack ?Are the TimedOutExceptions or socket time outs or something else. I figured out the problem here and made this ticket in jira: https://issues.apa

Re: Upgrade 1.1.10 -> 1.2.4

2013-05-16 Thread Edward Capriolo

Please give an example of the code you are trying to execute. On Thu, May 16, 2013 at 6:26 PM, Everton Lima wrote: > But the problem is that I would like to use Cassandra embeeded? This is > not possible any more? > > > 2013/5/15 Edward Capriolo > >> >> You are doing something wrong. What I was

Announcing Mutagen

2013-05-16 Thread Todd Fast

Mutagen Cassandra is a framework providing schema versioning and mutation for Apache Cassandra. It is similar to Flyway for SQL databases. https://github.com/toddfast/mutagen-cassandra Mutagen is a lightweight framework for applying versioned changes (known as mutations) to a resource, in this ca

Re: Announcing Mutagen

2013-05-16 Thread Blair Zajac

On 5/16/13 10:22 PM, Todd Fast wrote: Mutagen Cassandra is a framework providing schema versioning and mutation for Apache Cassandra. It is similar to Flyway for SQL databases. https://github.com/toddfast/mutagen-cassandra Mutagen is a lightweight framework for applying versioned changes (known

37 matches

Mail list logo