Re: question on saved_cache_directory

2011-03-28 Thread Peter Schuller
>     I have sdd and normal disk .I am using sdd for data directory > should i also use sdd for saved_cache directory. It won't really hurt but there's no need. It's sequential dumping and reading of data. No random I/O. -- / Peter Schuller

Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Stephen Connolly
iterate. otherwise if that will be too slow and you will do it often, the nosql way is to create a separate column family updated with each row add/delete to hold the answer for you. - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense a

balance between concurrent_[reads|writes] and feeding/reading threads i clients

2011-03-28 Thread Terje Marthinussen
Hi, I was pondering about how the concurrent_read and write settings balances towards max read/write threads in clients. Lets say we have 3 nodes, and concurrent read/write set to 8. That is, 8*3=24 threads for reading and writing. Replication factor is 3. Lets say we have clients that in total

Re: How to repair HintsColumnFamily?

2011-03-28 Thread Shotaro Kamio
I see. Then, I'll remove the HintsColumnFamily. Because our cluster has a lot of data, running repair takes much time (more than a day). And it's a kind of pain. It often causes disk full, creates many sstables and degrades read performance. If it's easy to fix the hint, it could be less painful s

Help on how to configure an off-site DR node.

2011-03-28 Thread Brian Lycett
Hello. I'm setting up a cluster that has three nodes in our production rack. My intention is to have a replication factor of two for this. For disaster recovery purposes, I need to have another node (or two?) off-site. The off-site node is entirely for the purpose of having an offsite backup of t

Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Joshua Partogi
Not all NoSQL is like that. Or perhaps the term NoSQL has became vague these days. On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly wrote: > iterate. > > otherwise if that will be too slow and you will do it often, the nosql way > is to create a separate column family updated with each row add/d

Something about cassandra API

2011-03-28 Thread An Zhuo
HI, I've learned something about Cassandra and find that there are two packages about how to access cassandra: avro and thrift。 So how should I choose the suitable way with java, avro or thrift? thank you. 2011-03-28 An Zhuo

Re: Something about cassandra API

2011-03-28 Thread Norman Maurer
Hi there, you would be better of to use a high-level client like hector or pelops. See: http://wiki.apache.org/cassandra/ClientOptions But to answer your question... If you really want to use something lowlevel then Thrift is the way to go... Bye, Norman 2011/3/28 An Zhuo > HI, I've lear

RE: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Or Yanay
I use one of two ways to achieve that: 1. run a map reduce. Pig is really helpful in these cases. Make sure you run your MR using Hadoop task tracker on your nodes - or your performance will take a hit. 2. dump all keys using sstablekeys script from relevant files on all machines and count u

Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Stephen Connolly
ok, so not all nosql has column families... just s/nosql/cassandra/g on my previous post ;-) On 28 March 2011 13:38, Joshua Partogi wrote: > Not all NoSQL is like that. Or perhaps the term NoSQL has became vague > these days. > > On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly > wrote: >> i

Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Stephen Connolly
for #2 you could pipe through wc -l to get the answer sort -n keys.txt | uniq | wc -l but both examples are just refinements of iterate. #1 is just a distributed iterate #2 is just an optimized iterate based on knowledge of the on-disk format (and my give inaccurate results... tombstones...) On

Re: Something about cassandra API

2011-03-28 Thread Stephen Connolly
FYI Avro is in all likelyhood being removed in 0.8 2011/3/28 Norman Maurer : > Hi there, > > you would be better of to use a high-level client like hector or pelops. > > See: > http://wiki.apache.org/cassandra/ClientOptions > > But to answer your question... If you really want to use something low

Problem about freeing space after a major compaction

2011-03-28 Thread Roberto Bentivoglio
Hi all, we're working on a Cassandra 0.7.0 production enviroment with a store of data near to 500 GB. We need to periodically remove the tombstones from deleted/expired data performing a major compaction operation through nodetool. After invoking the compaction on a single column family we can see

Re: Poor performance on small data set

2011-03-28 Thread Sébastien Kondov
Hi, Just to inform that i finally compiled thrift extension to a .dll and performances are improved. I was forced to switch to a php vc9. vc6 isn't supported anymore by php. Average access time were pretty bad before (70-100ms) by row and now it's 5-10ms. So nearly 10X faster caused by new extens

Re: Problem about freeing space after a major compaction

2011-03-28 Thread Ching-Cheng Chen
tombstones removal also depends on your gc grace period setting. If you are pretty sure that you have proper gc grace period set and still on 0.7.0, then probably related to this bug. https://issues.apache.org/jira/browse/CASSANDRA-2059 Regards,

Re: Problem about freeing space after a major compaction

2011-03-28 Thread Roberto Bentivoglio
Hi Chen, we've set the gc grace period of the column families to 0 as suggest in a single node enviroment. Can this setting cause the problem? I don't think so... Thanks, Roberto On 28 March 2011 16:54, Ching-Cheng Chen wrote: > tombstones removal also depends on your gc grace period setting. >

Re: Problem about freeing space after a major compaction

2011-03-28 Thread Ching-Cheng Chen
AFAIK, setting gc_grace_period to 0 shouldn't cause this issue. In fact, that what I'm using now in a single node environment like yours. However, I'm using 0.7.2 with some patches. If you are still using 0.7.0, most likely you got hit with this bug. You might want to patch it or upgrade to la

Re: Problem about freeing space after a major compaction

2011-03-28 Thread Roberto Bentivoglio
Thanks you again, we're going to update our enviroment. Regards, Roberto On 28 March 2011 17:08, Ching-Cheng Chen wrote: > > AFAIK, setting gc_grace_period to 0 shouldn't cause this issue. In fact, > that what I'm using now in a single node environment like yours. > > However, I'm using 0.7.2

Re: Something about cassandra API

2011-03-28 Thread Eric Evans
On Mon, 2011-03-28 at 14:21 +0100, Stephen Connolly wrote: > FYI Avro is in all likelyhood being removed in 0.8 FWIW, Avro is long-gone at this point. -- Eric Evans eev...@rackspace.com

Re: Something about cassandra API

2011-03-28 Thread Eric Evans
On Mon, 2011-03-28 at 14:51 +0200, Norman Maurer wrote: > you would be better of to use a high-level client like hector or > pelops. > > See: > http://wiki.apache.org/cassandra/ClientOptions > > But to answer your question... If you really want to use something > lowlevel then Thrift is the way t

Re: debian/ubuntu mirror down?

2011-03-28 Thread Eric Evans
On Fri, 2011-03-25 at 13:54 -0700, Shashank Tiwari wrote: > The Ubuntu Software Update seems to complain -- > Failed to fetch > http://www.apache.org/dist/cassandra/debian/dists/unstable/main/binary-amd64/Packages.gz > 403 Forbidden [IP: 140.211.11.131 80] > Failed to fetch > http://www.apache.org

Re: Something about cassandra API

2011-03-28 Thread Stephen Connolly
On 28 March 2011 16:33, Eric Evans wrote: > On Mon, 2011-03-28 at 14:21 +0100, Stephen Connolly wrote: >> FYI Avro is in all likelyhood being removed in 0.8 > > FWIW, Avro is long-gone at this point. You have the advantage of actually being a Cassandra Dev as opposed to being a Cassandra Hanger-o

Re: ParNew (promotion failed)

2011-03-28 Thread Peter Schuller
> But he's talking about "promotion failed" which is about heap > fragmentation, not "concurrent mode failure" which would indicate CMS > too late.  So increasing young generation size + tenuring threshold is > probably the way to go (especially in a read-heavy workload; > increasing tenuring will

Re: fabric script for cassandra

2011-03-28 Thread Sal Fuentes
I know you can find other scripts for provisioning a cassandra cluster on github but they may be outdated (thinking 0.6 releases): [Chef scripts] https://github.com/b/cookbooks/tree/cassandra https://github.com/fuentesjr/cass-pack [Shell scripts] https://github.com/digitalreasoning/PyStratus Hop

New committer Sylvain Lebresne

2011-03-28 Thread Jonathan Ellis
The Cassandra PMC has voted to add Sylvain as a committer. Welcome, Sylvain, and thanks for the hard work! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com

Re: memtable_threshold

2011-03-28 Thread ruslan usifov
2011/3/29 Narendra Sharma > This is because the memtable threshold is not correct to the last byte. The > threshold basically account for column name, value and timestamp (or the > serialized column). It doesn't account for all the in-memory overhead for > maintaining the data and references etc.

Re: memtable_threshold

2011-03-28 Thread Narendra Sharma
Following shows how the size of memtable is updated: currentThroughput.addAndGet(cf.size()); The jconsole/JMX shows this and this doesn't account for the overhead of holding the data in in-memory data structures. The size of CF, SuperColumn and Column is calculated as following: Column Size: pub

Re: New committer Sylvain Lebresne

2011-03-28 Thread Edward Capriolo
Congratulations Sylvain! On Mon, Mar 28, 2011 at 4:33 PM, Jonathan Ellis wrote: > The Cassandra PMC has voted to add Sylvain as a committer. > > Welcome, Sylvain, and thanks for the hard work! > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for profe

Re: New committer Sylvain Lebresne

2011-03-28 Thread Chris Goffinet
Congratulations Sylvain! On Mon, Mar 28, 2011 at 2:56 PM, Edward Capriolo wrote: > Congratulations Sylvain! > > On Mon, Mar 28, 2011 at 4:33 PM, Jonathan Ellis wrote: > > The Cassandra PMC has voted to add Sylvain as a committer. > > > > Welcome, Sylvain, and thanks for the hard work! > > > > --

Re: memtable_threshold

2011-03-28 Thread Jonathan Ellis
It's closer to 8x than 2x for small values. Java objects simply use a lot more memory than you'd think, and it takes multiple objects to store a column. http://kohlerm.blogspot.com/2008/12/how-much-memory-is-used-by-my-java.html On Mon, Mar 28, 2011 at 4:15 PM, ruslan usifov wrote: > > > 2011/3/

design cassandra issue client when moving from version 0.6.* to 0.7.3

2011-03-28 Thread Anurag Gujral
Hi All, I am currently porting a cassandra c++ client from 0.6.* to 0.7.3. The c++ client I had in 0.6.* used to function conn->client->send_multiget_slice which used to take as parameter cseqid. The sign of the function in 0.6.* was void CassandraClient::send_multiget_slice(const std::st

Re: New committer Sylvain Lebresne

2011-03-28 Thread Jake Luciani
Great job, well deserved Sylvain! On Mon, Mar 28, 2011 at 4:33 PM, Jonathan Ellis wrote: > The Cassandra PMC has voted to add Sylvain as a committer. > > Welcome, Sylvain, and thanks for the hard work! > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source

atomicity in cassandra

2011-03-28 Thread Saurabh Sehgal
I have seen this question pop up once or twice in mailing lists regarding atomicity when using batch_mutate() operations. I understand that the operations in batch_mutate() should be idempotent and do not get rolled back on failures. However, a client crashing (due to machine issues, networking iss

Re: atomicity in cassandra

2011-03-28 Thread Narendra Sharma
There is no undo or redo log in Cassandra. From Cassandra perspective if the operation gets logged in commit log, it is considered committed. Remember the eventual consistency... On Mon, Mar 28, 2011 at 6:21 PM, Saurabh Sehgal wrote: > I have seen this question pop up once or twice in mailing

Gossip mysteries (0.7.4 on EC2)

2011-03-28 Thread Alexis Lê-Quôc
Hi, To make a long story short I'm trying to understand what the logic behind the gossip is. The following is an excerpt from a log captured today. 2011-03-28T18:37:56.505316+00:00 Node /10.96.81.193 has restarted, now UP again 2011-03-28T18:37:56.505316+00:00 Node /10.96.81.193 state jump to no