Re: Prevent queries from OOM nodes

2012-10-02 Thread Sylvain Lebresne
> Could you create one ? > https://issues.apache.org/jira/browse/CASSANDRA There's one already. See https://issues.apache.org/jira/browse/CASSANDRA-3702 that redirect to https://issues.apache.org/jira/browse/CASSANDRA-4415. -- Sylvain

Re: Why data tripled in size after repair?

2012-10-02 Thread Sylvain Lebresne
> It's in the 1.1 branch; I don't remember if it went into a release > yet. If not, it'll be in the next 1.1.x release. As the ticket says, this is in since 1.1.1. I don't pretend this is well documented, but it's in. -- Sylvain

Re: Advice on correct storage configuration

2012-10-02 Thread Lewis John Mcgibbney
Hi Dean, Thanks for the feedback. On Mon, Oct 1, 2012 at 3:12 PM, Hiller, Dean wrote: > What is really going to matter is what is the applications trying to read? > That is really the critical piece of context. Without knowing what the > application needs to read, it is very hard to design. >

Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Niteesh kumar
while looking at netstat table i observed that my cluster nodes not using persistent connection to talk among themselves on port 9160 to redirect request. I also observed that local write latency is around 30-40 microsecond, while its takes around .5 miliseconds if the chosen node is not the n

RE: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Viktor Jevdokimov
9160 is a client port. Nodes are using messaging service on storage_port (7000) for intra-node communication. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuan

Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread rohit bhatia
i guess 7000 is only for gossip protocol. Cassandra still uses 9160 for RPC even among nodes Also, I see Connections over port 9160 among various cassandra Nodes in my cluster. Please correct me if i am wrong.. PS: mentioned Here http://wiki.apache.org/cassandra/CloudConfig On Tue, Oct 2, 201

Re: Why data tripled in size after repair?

2012-10-02 Thread Andrey Ilinykh
On Tue, Oct 2, 2012 at 12:05 AM, Sylvain Lebresne wrote: >> It's in the 1.1 branch; I don't remember if it went into a release >> yet. If not, it'll be in the next 1.1.x release. > > As the ticket says, this is in since 1.1.1. I don't pretend this is > well documented, but it's in. > Nope. It is i

1000's of CF's. virtual CFs do NOT workŠ..map/reduce

2012-10-02 Thread Hiller, Dean
So basically, with moving towards the 1000's of CF all being put in one CF, our performance is going to tank on map/reduce, correct? I mean, from what I remember we could do map/reduce on a single CF, but by stuffing 1000's of virtual Cf's into one CF, our map/reduce will have to read in all 999 v

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, to address your question, read my last post but to summarize, yes, there is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT when doing map/reduce. Doing map/reduce, you will now have HUGE overhead in reading a whole slew of rows you don't care about as you can't map/

Re: Read latency issue

2012-10-02 Thread Hiller, Dean
Interesting results. With PlayOrm, we did a 6 node test of reading 100 rows from 1,000,000 using PlayOrm Scalable SQL. It only took 60ms. Maybe we have better hardware though??? We are using 7200 RPM drives so nothing fancy on the disk side of things. More nodes puts at a higher throughput

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Dean, On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean wrote: > Ben, > to address your question, read my last post but to summarize, yes, there > is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT > when doing map/reduce. Doing map/reduce, you will now have HUGE overhead > i

RE: Read latency issue

2012-10-02 Thread Roshni Rajagopal
Arindam, Did you also try the cassandra stress tool & compare results? I havent done a performance test as yet, the only ones published on the internet are of YCSB on an older version of apache cassandra, and it doesn't seem to be actively supported or updatedhttp://www.brianfrankcooper.net/p

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Without putting too much thought into it... Given the underlying architecture, I think you could/would have to write your own partitioner, which would partition based on the prefix/virtual keyspace. -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Scie

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Thanks for the idea but…(but please keep thinking on it)... 100% what we don't want since partitioned data resides on the same node. I want to map/reduce the column families and leverage the parallel disks :( :( I am sure others would want to do the same…..We almost need a feature of virtual Col

Re: 1000's of CF's. virtual CFs do NOT workŠ..map/reduce

2012-10-02 Thread Brian O'Neill
Dean, Great point. I hadn't considered that either. Per my other email, think we would need a custom partitioner for this? (a mix of OrderPreservingPartitioner and RandomPartitioner, OPP for the prefix) -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The S

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Agreed. Do we know yet what the overhead is for each column family? What is the limit? If you have a SINGLE keyspace w/ 2+ CF's, what happens? Anyone know? -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Dr

Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...

2012-10-02 Thread Hiller, Dean
Well, I think I know the direction we may follow so we can 1. Have Virtual CF's 2. Be able to map/reduce ONE Virtual CF Well, not map/reduce exactly but really really close. We use PlayOrm with it's partitioning so I am now thinking what we will do is have a compute grid where we can have each n

Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...

2012-10-02 Thread Brian O'Neill
Dean, We moved away from Hadoop and M/R, and instead we are using Storm as our compute grid. We queue keys in Kafka, then Storm distributes the work to the grid. Its working well so far, but we haven't taken it to prod yet. Data is read from Cassandra using a Cassandra-bolt. If you end up usin

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Brian, On Tue, Oct 2, 2012 at 2:20 PM, Brian O'Neill wrote: > > Without putting too much thought into it... > > Given the underlying architecture, I think you could/would have to write > your own partitioner, which would partition based on the prefix/virtual > keyspace. I might be barking up the

Re: Getting serialized Rows from CommitLogSegment file

2012-10-02 Thread Felipe Schmidt
I found a way how to do it, but now I have other issue. I'm getting a problem when trying to get the ColumnFamily using the CfId. This information is important to deserialize the stored ColumnFamily. When I try to use the method Schema.instance.getCF(cfId) to take the Pair it throws an 'UnknownCol

Re: Getting serialized Rows from CommitLogSegment file

2012-10-02 Thread Ben Hood
Filipe, On Tue, Oct 2, 2012 at 2:56 PM, Felipe Schmidt wrote: > Seems like the information was dropped or, maybe, not existent in this > instance of the Schema. But, as soon as I know, it's just one instance of > the schema in Cassandra, right? If I understand you correctly, you are trying to pr

RE: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Viktor Jevdokimov
Never seen connections between nodes on 9160 port, 7000 only. >From the source code, for example, thrift request goes to rpc port 9160 >(org.apache.cassandra.thrift.CassandraDaemon, >org.apache.cassandra.thrift.CassandraServer), then to StorageProxy >(org.apache.cassandra.service.StorageProxy),

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Exactly. --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 € healthmarketscience.com This information transmitted in this em

easy repair questions on -pr

2012-10-02 Thread Hiller, Dean
If I understand –pr correctly… 1. -pr forces only the current nodes' stables to be fixed (so I run on each node once) 2. Can I run node tool –pr repair on just 1/RF of my nodes if I do the correct nodes? 3. Without the –pr, it will fix all the stuff on the current node AND the nodes with

Re: 1000's of column families

2012-10-02 Thread Ben Hood
On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill wrote: > Exactly. So you're back to the deliberation between using multiple CFs (potentially with some known working upper bound*) or feeding your map reduce in some other way (as you decided to do with Storm). In my particular scenario I'd like to be

Re: easy repair questions on -pr

2012-10-02 Thread Sylvain Lebresne
The short version is: there is 2 use case for nodetool repair: 1) For periodic repair of the whole cluster (http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair). In that case, you should run repair *with* -pr and you should run it on *every* node. 2) When a node has been do

Re: easy repair questions on -pr

2012-10-02 Thread Hiller, Dean
GREAT answer, thanks and one last questionŠ So, I suspect I can expect those rows to finally go away when queried from cassandra-cli once GCGraceSeconds has passed then? Or will they always be there forever and ever and ever(this can't be true, right). Thanks, Dean On 10/2/12 9:34 AM, "Sylvain

Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Niteesh kumar
not only a node make connection to other nodes. i can also see nodes making connection to itself on port 9160. On Tuesday 02 October 2012 07:42 PM, Viktor Jevdokimov wrote: not a thrift por

Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Nick Bailey
The comments here so far are correct. Cassandra itself will never open a thrift connection. Thrift is only for clients. Not sure what exactly you are seeing but I don't think it's cassandra. On Tue, Oct 2, 2012 at 10:48 AM, Niteesh kumar wrote: > not only a node make connection to other nodes. i

Re: easy repair questions on -pr

2012-10-02 Thread Sylvain Lebresne
> So, I suspect I can expect those rows to finally go away when queried from > cassandra-cli once GCGraceSeconds has passed then? Yes. -- Sylvain

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, Brian, By the way, PlayOrm offers a NoSqlTypedSession that is different than the ORM half of PlayOrm dealing in raw stuff that does indexing(so you can do Scalable SQL on data that has no ORM on top of it). That is what we use for our 1000's of CF's as we don't know the format of any of t

Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Hiller, Dean
Can you just use netstat and dig into the process id and do a ps -ef | grep to clear up all the confusion. Doing so you can tell which process communicates with which process(I am assuming you are on linuxŠ.on MAC or windows it is different commands). Then, just paste all that in the email to th

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary index as an input to your mapreduce job. What you might do is add a field to the column family that represents which virtual column family that it is part of. Then when doing mapreduce jobs, you

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Jeremy, On Tuesday, October 2, 2012 at 17:06, Jeremy Hanna wrote: > Another option that may or may not work for you is the support in Cassandra > 1.1+ to use a secondary index as an input to your mapreduce job. What you > might do is add a field to the column family that represents which virt

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Because the data for an index is not all together(ie. Need a multi get to get the data). It is not contiguous. The prefix in a partition they keep the data so all data for a prefix from what I understand is contiguous. QUESTION: What I don't get in the comment is I assume you are referring to

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Dean, On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote: > Because the data for an index is not all together(ie. Need a multi get to get > the data). It is not contiguous. > > The prefix in a partition they keep the data so all data for a prefix from > what I understand is contiguous.

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
So you're saying that you can access the primary index with a key range, but to access the secondary index, you first need to get all keys and follow up with a multiget, which would use the secondary index to speed the lookup of the matching rows? Yes, that is how I "believe" it works. I am by

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
It's always had data locality (since hadoop support was added in 0.6). You don't need to specify a partition, you specify the input predicate with ConfigHelper or the cassandra.input.predicate property. On Oct 2, 2012, at 2:26 PM, "Hiller, Dean" wrote: > So you're saying that you can access th

Re: Cassandra vs Couchbase benchmarks

2012-10-02 Thread aaron morton
A few notes: * +1 for missing RF and CL cassandra stats. * Using stripped EBS for m1.xlarge is a bad choice, unless they are using provisioned IOPS. Which they do not say. * Cassandra JVM settings are *not* standard. It's a low new heap size and a larger than default heap size. * "memtable siz