Re: Query over secondary indexes

Hiller, Dean Tue, 09 Oct 2012 06:51:31 -0700

If I understand CQL correctly, behind the scenes in wide rows, there is a 
B-tree.  Even when doing the indexing in CQL, there is a B-tree, so CQL, 
Playorm, they all are really just using a wide row approach basically.  I don't 
think you can avoid that.  Behind the scenes, they are using the "compound" 
column name approach.  CQL partitioning requires compound primary key approach 
as well where PlayOrm does not require this (which is why you can partition it 
two different ways…..say you partition trades by the accounts and partition it 
also by securities they are for)


The key is really the partitions as each partition will be backed by that wide 
row or B-tree (whichever you way you prefer to think about it).  Obviously, you 
don't want a partitions with billions of rows as the B-tree starts to get a bit 
large.  In both, you can have as many partitions as you like…billions, 
trillions.

PlayOrm is just doing a range scan on your behalf.  If you do a complex query 
like left join trade.account where account.isActive=true and 
trade.numShares>50, it is doing a range scan on a few indices but it does so in 
batches and eventually will do lookahead as well(ie. It will make requests for 
batches before it is done looping over the current batch) which can increase 
performance further in certain scenarios.  It is actually quite interesting as 
it dynamically flips to a hash join when it can as well.

Later,
Dean



From: Vivek Mishra <mishra.v...@gmail.com<mailto:mishra.v...@gmail.com>>
Date: Tuesday, October 9, 2012 7:39 AM
To: Nrel <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>>, 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Query over secondary indexes

Thanks . This is what i have tried with cqlsh client.

Is there any comparison matrix available with b/w PlayOrm and cqlsh command 
line client? Interesting to look into if it is faster than cql client.

I guess problem is with secondary indexing not the volume, because i don't want 
to go for wide row indexing/compount primary key approach.

-Vivek

On Tue, Oct 9, 2012 at 6:20 PM, Hiller, Dean 
<dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote:
Another option may be PlayOrm for you and it's scalable-SQL.  We queried one 
million rows for 100 results in just 60ms.  (and it does joins).  Query CL 
=QUORUM.

Dean

From: Vivek Mishra 
<mishra.v...@gmail.com<mailto:mishra.v...@gmail.com><mailto:mishra.v...@gmail.com<mailto:mishra.v...@gmail.com>>>
Reply-To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Date: Monday, October 8, 2012 7:37 PM
To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Subject: Re: Query over secondary indexes

I did wait for atleast 5 minutes before terminating it. Also sometimes it 
results in server crash as well, though data volume is not very huge.

-Vivek

On Tue, Oct 9, 2012 at 7:05 AM, Vivek Mishra 
<mishra.v...@gmail.com<mailto:mishra.v...@gmail.com><mailto:mishra.v...@gmail.com<mailto:mishra.v...@gmail.com>>>
 wrote:
It was on 1 node and there is no error in server logs.

-Vivek


On Tue, Oct 9, 2012 at 1:21 AM, aaron morton 
<aa...@thelastpickle.com<mailto:aa...@thelastpickle.com><mailto:aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>>>
 wrote:
get User where user_name = 'Vivek', it is taking ages to retrieve that data. Is 
there anything i am doing wrong?
How long is ages and how many nodes do you have?
Are there any errors in server logs ?

When you do a get by secondary index at a CL higher than ONE ever RFth node is 
involved.

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/10/2012, at 10:20 PM, Vivek Mishra 
<mishra.v...@gmail.com<mailto:mishra.v...@gmail.com><mailto:mishra.v...@gmail.com<mailto:mishra.v...@gmail.com>>>
 wrote:

Thanks Rishabh. But i want to search over duplicate columns only.

-Vivek

On Fri, Oct 5, 2012 at 2:45 PM, Rishabh Agrawal 
<rishabh.agra...@impetus.co.in<mailto:rishabh.agra...@impetus.co.in><mailto:rishabh.agra...@impetus.co.in<mailto:rishabh.agra...@impetus.co.in>>>
 wrote:
Try making user_name a primary key in combination with some other unique column 
and see if results are improving.
-Rishabh
From: Vivek Mishra 
[mailto:mishra.v...@gmail.com<mailto:mishra.v...@gmail.com><mailto:mishra.v...@gmail.com<mailto:mishra.v...@gmail.com>>]
Sent: Friday, October 05, 2012 2:35 PM
To: 
user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Query over secondary indexes

I have a column family "User" which is having a indexed column "user_name". My 
schema is having around 0.1 million records only and user_name is duplicated  
across all rows.

Now when i am trying to retrieve it as:

get User where user_name = 'Vivek', it is taking ages to retrieve that data. Is 
there anything i am doing wrong?

Also, i tried get_indexed_slices via Thrift API by setting  
IndexClause.setCount(1), still  no luck, it got hang and not even returning a 
single result. I believe 0.1 million is not a huge amount of data.


Cassandra version : 1.1.2

Any idea?


-Vivek

________________________________

Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.

Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor 
Interoperable Systems’ available at http://lf1.me/0E/.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Re: Query over secondary indexes

Reply via email to