Re: Why Cassandra secondary indexes are so slow on just 350k rows?

Dave Brosius Tue, 28 Aug 2012 20:30:16 -0700

If i understand you correctly, you are only ever querying for the rowswhere is_exported = false, and turning them into trues. What this meansis that eventually you will have 1 row in the secondary index table with350K columns that you will never look at.

It seems to me you that perhaps you should just hold your own "manualindex" cf that points to non exported rows, and just delete thosecolumns when they are exported.



On 08/28/2012 05:23 PM, Edward Kibardin wrote:

I have a column family with the secondary index. The secondary indexis basically a binary field, but I'm using a string for it. The fieldcalled *is_exported* and can be *'true'* or *'false'*. After requestall loaded rows are updated with *is_exported = 'false'*.
I'm polling this column table each ten minutes and exporting new rowsas they appear.
But here the problem: I'm seeing that time for this query grows prettylinear with amount of data in column table, and currently it takes*from 12 to 20 seconds (!!!) to find 5000 rows*. From myunderstanding, indexed request should not depend on number of rows inCF but from number of rows per one index value (cardinality), as it'sjust another hidden CF like:
        "true" : rowKey1 rowKey2 rowKey3 ...
        "false": rowKey1 rowKey2 rowKey3 ...

I'm using Pycassa to query the data, here the code I'm using:
column_family = pycassa.ColumnFamily(cassandra_pool,column_family_name, read_consistency_level=2)
        is_exported_expr = create_index_expression('is_exported', 'false')
        clause = create_index_clause([is_exported_expr], count = 5000)
        column_family.get_indexed_slices(clause)
Am I doing something wrong, but I expect this operation to work MUCHfaster.
Any ideas or suggestions?

Some config info:
 - Cassandra 1.1.0
 - RandomPartitioner
- I have 2 nodes and replication_factor = 2 (each server has a fulldata copy)
 - Using AWS EC2, large instances
 - Software raid0 on ephemeral drives

Thanks in advance!

Re: Why Cassandra secondary indexes are so slow on just 350k rows?

Reply via email to