Re: Cassandra search performance

Maxim Potekhin Sun, 29 Apr 2012 16:33:39 -0700

Jason,

I'm using plenty of secondary indexes with no problem at all.


Looking at your example,as I think you understand, you forgo indexes by
combining two conditions in one query, thinking along the lines of what is
often done in RDBMS. A scan is expected in this case, and there is no
magic to avoid it.

However, if this query is important, you can easily index on two conditions,
using a composite type (look it up), or string concatenation for quick and
easy solution. Which is, you _create an additional column_ which contains a
combination of the two you want to use in a query. Then index on it.
Problem solved.
The composite solution is more elegant but what I describe works in
simple cases.
It works for me.

Maxim


On 4/25/2012 10:45 AM, Jason Tang wrote:
> 1.0.8
>
> 在 2012年4月25日 下午10:38，Philip Shon <philip.s...@gmail.com
> <mailto:philip.s...@gmail.com>>写 道：
>
>     what version of cassandra are you using. I found a big performance
>     hit when querying on the secondary index.
>
>     I came across this bug in versions prior to 1.1
>
>     https://issues.apache.org/jira/browse/CASSANDRA-3545
>
>     Hope that helps.
>
>     2012/4/25 Jason Tang <ares.t...@gmail.com
>     <mailto:ares.t...@gmail.com>>
>
>         And I found, if I only have the search condition "status", it
>         only scan 200 records.
>
>         But if I combine another condition "partition" then it scan
>         all records because "partition" condition match all records.
>
>         But combine with other condition such as "userName", even all
>         "userName" is same in the 1,000,000 records, it only scan 200
>         records.
>
>         So it impacted by scan execution plan, if we have several
>         search conditions, how it works? Do we have the similar
>         execution plan in Cassandra?
>
>
>         在 2012年4月25日 下午9:18，Jason Tang <ares.t...@gmail.com
>         <mailto:ares.t...@gmail.com>>写 道：
>
>             Hi
>
>             We have the such CF, and use secondary index to search for
>             simple data "status", and among 1,000,000 row records, we
>             have 200 records with status we want.
>
>             But when we start to search, the performance is very poor,
>             and check with the command "./bin/nodetool -h localhost -p
>             8199 cfstats" , Cassandra read 1,000,000 records, and
>             "Read Latency" is 0.2 ms, so totally it used 200 seconds.
>
>             It use lots of CPU, and check the stack, all thread in
>             Cassandra is read from socket.
>
>             So I wonder, how to really use index to find the 200
>             records instead of scan all rows. (Supper Column?)
>
>             /ColumnFamily: queue/
>             /Key Validation Class:
>             org.apache.cassandra.db.marshal.BytesType/
>             /Default column value validator:
>             org.apache.cassandra.db.marshal.BytesType/
>             /Columns sorted by: org.apache.cassandra.db.marshal.BytesType/
>             /Row cache size / save period in seconds / keys to save :
>             0.0/0/all/
>             /Row Cache Provider:
>             org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider/
>             /Key cache size / save period in seconds: 0.0/0/
>             /GC grace seconds: 0/
>             /Compaction min/max thresholds: 4/32/
>             /Read repair chance: 0.0/
>             /Replicate on write: false/
>             /Bloom Filter FP chance: default/
>             /Built indexes: [queue.idxStatus]/
>             /Column Metadata:/
>             /Column Name: status (737461747573)/
>             /Validation Class: org.apache.cassandra.db.marshal.AsciiType/
>             /Index Name: idxStatus/
>             /Index Type: KEYS/
>             /
>             /
>             BRs
>             //Jason
>
>
>
>

Re: Cassandra search performance

Reply via email to