On 06/06/2011 10:15, David Boxenhorn wrote:
Is there really a 10x difference between indexed CFs and non-indexed CFs?
Well, as for my test, it is!
I'm using 0.7.6-2, 9 nodes, 3 replicas, write_consistency_level QUORUM, about 90,000,000 rows (~ 1K per row)
I use 20 process, 20rows for each insertion.
the insertion time for the whole row is about 0.02 seconds without index
and then I add a secondary index, and update every row with the indexed column, the insertion time is about 2 seconds
and if I remove the index, and update the column, the time is about 0.002

Another thing I noticed is : if you first do insertion, and then build the secondary index use "update column family ...", and then do select based on the index, the result is not right (seems the index is still being built though the "update" commands returns quickly). And after a while, the get_indexed_slices() goes time out from time to time (with pycassa.ConnectionPool('keyspace1', ['host1','host2'], timeout=600, pool_size=1) ).

Does some one else have some same experiences using the secondary indexes?

--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018


Reply via email to