neither 'nodetool repair' nor 'hinted hanoff/read repair' work for secondary indexes

Alexei Bakanov Fri, 01 Feb 2013 06:15:22 -0800

Hi again,

Once started playing with CCM it's hard to stop, such a great tool.
My issue with secondary indexes is following: neither explicit
'nodetool repair' nor implicit 'hinted handoffs/read repairs' resolve
inconsistencies in data I get from secondary indexes.
I observe this for both one- and 2-datacenter deployments, independent
of caching settings. Rebuilding/droping and creating index or
restarting nodes doesn't help.


In the following scenario I start up 2 nodes and insert some rows with
CL.ONE. During this process I deliberately stop and start the nodes in
order to trigger inconsistencies.
I then query all data by its index with read CL.ONE and stop if I see
that data is missing. I see that none of C* repair mechanisms work for
secondary indexes.

$ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner
test2ndIndexRepair
$ ccm start
$ ccm node1 cli
-> create keyspace and column family  (please find schemas attached)
$ python populate_repair.py (in first terminal)
$ ccm node1 stop; sleep 10; ccm node1 start   (in second terminal,
while populate_repair.py runs)
$ ccm node2 stop; sleep 10; ccm node2 start   (in second terminal,
while populate_repair.py runs. Hinted Handoffs do the work but
unfortunately not on Secondary Indexes)

$ python fetcher_repair.py
....
254
255
256
Traceback (most recent call last):
  File "fetcher_repair.py", line 19, in <module>
    raise Exception('missing rows for userId %s, data length is
%d'%(userId, len(data)))
Exception: missing rows for userId 256, data length is 0

$ ccm cli
[default@unknown] use testks;
Authenticated to keyspace: testks
[default@testks] get cf1 where 'indexedColumn'='userId_256';

0 Row Returned.
Elapsed time: 47 msec(s).

$ python fetcher_repair.py  (running one more time in hope that 'read
repair' kicked in after the last query, but unfortunately no)
....
254
255
256
Traceback (most recent call last):
  File "fetcher_repair.py", line 19, in <module>
    raise Exception('missing rows for userId %s, data length is
%d'%(userId, len(data)))
Exception: missing rows for userId 256, data length is 0

$ ccm node1 repair
$ ccm node2 repair
$ ccm cli

[default@unknown] use testks;
Authenticated to keyspace: testks
[default@testks] get cf1 where 'indexedColumn'='userId_256';

0 Row Returned.


Both cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M

Thanks for help.

Best regards,
Alexei

------START cassandra-cli schemas ------------
create keyspace testks
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {datacenter1 : 2}
  and durable_writes = true;

use testks;

create column family cf1
  with column_type = 'Standard'
  and comparator = 'AsciiType'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 1.0
  and dclocal_read_repair_chance = 1.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'KEYS_ONLY'
  and column_metadata = [
    {column_name : 'indexedColumn',
    validation_class : UTF8Type,
    index_name : 'INDEX1',
    index_type : 0}]
  and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
------FINISH cassandra-cli schemas ------------

------START populate_repair.py ----------
import datetime
from pycassa.batch import Mutator

import pycassa

pool = pycassa.ConnectionPool('testks', timeout=5,
server_list=['127.0.0.1:9160', '127.0.0.2:9160'])
cf = pycassa.ColumnFamily(pool, 'cf1')

for userId in xrange(0, 2000):
    print userId
    b = Mutator(pool, queue_size=200)
    for itemId in xrange(20):
        rowKey = 'userId_%s:itemId_%s'%(userId, itemId)
        for message_number in xrange(10):
            b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId,
str(message_number): str(message_number)})
    b.send()

pool.dispose()
------FINISH populate_repair.py ----------

------START fetcher_repair.py ----------
import pycassa
from pycassa.columnfamily import ColumnFamily
from pycassa.pool import ConnectionPool
from pycassa.index import *

pool = pycassa.ConnectionPool('testks', server_list=['127.0.0.1:9160',
'127.0.0.2:9160'])
cf = pycassa.ColumnFamily(pool, 'cf1')

for userId in xrange(2000):
    print userId
    index_expr = create_index_expression('indexedColumn', 'userId_%s'%userId)
    index_clause = create_index_clause([index_expr], count=10000000)
    data = list(cf.get_indexed_slices(index_clause=index_clause))
    if len(data) != 20:
        raise Exception('missing rows for userId %s, data length is
%d'%(userId, len(data)))
pool.dispose()

------FINISH fetcher_repair.py ----------

neither 'nodetool repair' nor 'hinted hanoff/read repair' work for secondary indexes

Reply via email to