Hi again, Once started playing with CCM it's hard to stop, such a great tool. My issue with secondary indexes is following: neither explicit 'nodetool repair' nor implicit 'hinted handoffs/read repairs' resolve inconsistencies in data I get from secondary indexes. I observe this for both one- and 2-datacenter deployments, independent of caching settings. Rebuilding/droping and creating index or restarting nodes doesn't help.
In the following scenario I start up 2 nodes and insert some rows with CL.ONE. During this process I deliberately stop and start the nodes in order to trigger inconsistencies. I then query all data by its index with read CL.ONE and stop if I see that data is missing. I see that none of C* repair mechanisms work for secondary indexes. $ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner test2ndIndexRepair $ ccm start $ ccm node1 cli -> create keyspace and column family (please find schemas attached) $ python populate_repair.py (in first terminal) $ ccm node1 stop; sleep 10; ccm node1 start (in second terminal, while populate_repair.py runs) $ ccm node2 stop; sleep 10; ccm node2 start (in second terminal, while populate_repair.py runs. Hinted Handoffs do the work but unfortunately not on Secondary Indexes) $ python fetcher_repair.py .... 254 255 256 Traceback (most recent call last): File "fetcher_repair.py", line 19, in <module> raise Exception('missing rows for userId %s, data length is %d'%(userId, len(data))) Exception: missing rows for userId 256, data length is 0 $ ccm cli [default@unknown] use testks; Authenticated to keyspace: testks [default@testks] get cf1 where 'indexedColumn'='userId_256'; 0 Row Returned. Elapsed time: 47 msec(s). $ python fetcher_repair.py (running one more time in hope that 'read repair' kicked in after the last query, but unfortunately no) .... 254 255 256 Traceback (most recent call last): File "fetcher_repair.py", line 19, in <module> raise Exception('missing rows for userId %s, data length is %d'%(userId, len(data))) Exception: missing rows for userId 256, data length is 0 $ ccm node1 repair $ ccm node2 repair $ ccm cli [default@unknown] use testks; Authenticated to keyspace: testks [default@testks] get cf1 where 'indexedColumn'='userId_256'; 0 Row Returned. Both cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M Thanks for help. Best regards, Alexei ------START cassandra-cli schemas ------------ create keyspace testks with placement_strategy = 'NetworkTopologyStrategy' and strategy_options = {datacenter1 : 2} and durable_writes = true; use testks; create column family cf1 with column_type = 'Standard' and comparator = 'AsciiType' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 1.0 and dclocal_read_repair_chance = 1.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching = 'KEYS_ONLY' and column_metadata = [ {column_name : 'indexedColumn', validation_class : UTF8Type, index_name : 'INDEX1', index_type : 0}] and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; ------FINISH cassandra-cli schemas ------------ ------START populate_repair.py ---------- import datetime from pycassa.batch import Mutator import pycassa pool = pycassa.ConnectionPool('testks', timeout=5, server_list=['127.0.0.1:9160', '127.0.0.2:9160']) cf = pycassa.ColumnFamily(pool, 'cf1') for userId in xrange(0, 2000): print userId b = Mutator(pool, queue_size=200) for itemId in xrange(20): rowKey = 'userId_%s:itemId_%s'%(userId, itemId) for message_number in xrange(10): b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId, str(message_number): str(message_number)}) b.send() pool.dispose() ------FINISH populate_repair.py ---------- ------START fetcher_repair.py ---------- import pycassa from pycassa.columnfamily import ColumnFamily from pycassa.pool import ConnectionPool from pycassa.index import * pool = pycassa.ConnectionPool('testks', server_list=['127.0.0.1:9160', '127.0.0.2:9160']) cf = pycassa.ColumnFamily(pool, 'cf1') for userId in xrange(2000): print userId index_expr = create_index_expression('indexedColumn', 'userId_%s'%userId) index_clause = create_index_clause([index_expr], count=10000000) data = list(cf.get_indexed_slices(index_clause=index_clause)) if len(data) != 20: raise Exception('missing rows for userId %s, data length is %d'%(userId, len(data))) pool.dispose() ------FINISH fetcher_repair.py ----------