If I understand CQL correctly, behind the scenes in wide rows, there is a B-tree. Even when doing the indexing in CQL, there is a B-tree, so CQL, Playorm, they all are really just using a wide row approach basically. I don't think you can avoid that. Behind the scenes, they are using the "compound" column name approach. CQL partitioning requires compound primary key approach as well where PlayOrm does not require this (which is why you can partition it two different ways…..say you partition trades by the accounts and partition it also by securities they are for)
The key is really the partitions as each partition will be backed by that wide row or B-tree (whichever you way you prefer to think about it). Obviously, you don't want a partitions with billions of rows as the B-tree starts to get a bit large. In both, you can have as many partitions as you like…billions, trillions. PlayOrm is just doing a range scan on your behalf. If you do a complex query like left join trade.account where account.isActive=true and trade.numShares>50, it is doing a range scan on a few indices but it does so in batches and eventually will do lookahead as well(ie. It will make requests for batches before it is done looping over the current batch) which can increase performance further in certain scenarios. It is actually quite interesting as it dynamically flips to a hash join when it can as well. Later, Dean From: Vivek Mishra <mishra.v...@gmail.com<mailto:mishra.v...@gmail.com>> Date: Tuesday, October 9, 2012 7:39 AM To: Nrel <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>>, "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Query over secondary indexes Thanks . This is what i have tried with cqlsh client. Is there any comparison matrix available with b/w PlayOrm and cqlsh command line client? Interesting to look into if it is faster than cql client. I guess problem is with secondary indexing not the volume, because i don't want to go for wide row indexing/compount primary key approach. -Vivek On Tue, Oct 9, 2012 at 6:20 PM, Hiller, Dean <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote: Another option may be PlayOrm for you and it's scalable-SQL. We queried one million rows for 100 results in just 60ms. (and it does joins). Query CL =QUORUM. Dean From: Vivek Mishra <mishra.v...@gmail.com<mailto:mishra.v...@gmail.com><mailto:mishra.v...@gmail.com<mailto:mishra.v...@gmail.com>>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>> Date: Monday, October 8, 2012 7:37 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>> Subject: Re: Query over secondary indexes I did wait for atleast 5 minutes before terminating it. Also sometimes it results in server crash as well, though data volume is not very huge. -Vivek On Tue, Oct 9, 2012 at 7:05 AM, Vivek Mishra <mishra.v...@gmail.com<mailto:mishra.v...@gmail.com><mailto:mishra.v...@gmail.com<mailto:mishra.v...@gmail.com>>> wrote: It was on 1 node and there is no error in server logs. -Vivek On Tue, Oct 9, 2012 at 1:21 AM, aaron morton <aa...@thelastpickle.com<mailto:aa...@thelastpickle.com><mailto:aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>>> wrote: get User where user_name = 'Vivek', it is taking ages to retrieve that data. Is there anything i am doing wrong? How long is ages and how many nodes do you have? Are there any errors in server logs ? When you do a get by secondary index at a CL higher than ONE ever RFth node is involved. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/10/2012, at 10:20 PM, Vivek Mishra <mishra.v...@gmail.com<mailto:mishra.v...@gmail.com><mailto:mishra.v...@gmail.com<mailto:mishra.v...@gmail.com>>> wrote: Thanks Rishabh. But i want to search over duplicate columns only. -Vivek On Fri, Oct 5, 2012 at 2:45 PM, Rishabh Agrawal <rishabh.agra...@impetus.co.in<mailto:rishabh.agra...@impetus.co.in><mailto:rishabh.agra...@impetus.co.in<mailto:rishabh.agra...@impetus.co.in>>> wrote: Try making user_name a primary key in combination with some other unique column and see if results are improving. -Rishabh From: Vivek Mishra [mailto:mishra.v...@gmail.com<mailto:mishra.v...@gmail.com><mailto:mishra.v...@gmail.com<mailto:mishra.v...@gmail.com>>] Sent: Friday, October 05, 2012 2:35 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Query over secondary indexes I have a column family "User" which is having a indexed column "user_name". My schema is having around 0.1 million records only and user_name is duplicated across all rows. Now when i am trying to retrieve it as: get User where user_name = 'Vivek', it is taking ages to retrieve that data. Is there anything i am doing wrong? Also, i tried get_indexed_slices via Thrift API by setting IndexClause.setCount(1), still no luck, it got hang and not even returning a single result. I believe 0.1 million is not a huge amount of data. Cassandra version : 1.1.2 Any idea? -Vivek ________________________________ Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012. Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor Interoperable Systems’ available at http://lf1.me/0E/. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.