Hi,

We are having an issue with TTL on Secondary index columns. We get 0
rows in return when running queries on indexed columns that have TTL.
Everything works fine with small amounts of data, but when we get over
a ceratin threshold it looks like older rows dissapear from the index.
In the example below we create 70 rows with 45k columns each + one
indexed column with just the rowkey as value, so we have one row per
indexed value. When the script is finished the index contains rows
66-69. Rows 0-65 are gone from the index.
Using 'indexedColumn' without TTL fixes the problem.


------------- SCHEMA START -----------------
create keyspace ks123
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {datacenter1 : 1}
  and durable_writes = true;

use ks123;

create column family cf1
  with column_type = 'Standard'
  and comparator = 'AsciiType'
  and default_validation_class = 'AsciiType'
  and key_validation_class = 'AsciiType'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'KEYS_ONLY'
  and column_metadata = [
    {column_name : 'indexedColumn',
    validation_class : AsciiType,
    index_name : 'INDEX1',
    index_type : 0}]
  and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
------------- SCHEMA FINISH -----------------

------------- POPULATE START -----------------
from pycassa.batch import Mutator
import pycassa

pool = pycassa.ConnectionPool('ks123')
cf = pycassa.ColumnFamily(pool, 'cf1')

for rowKey in xrange(70):
    b = Mutator(pool)
    for datapoint in xrange(1, 45001):
        b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
    b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
    print 'row %d' % rowKey
    b.send()
    b = Mutator(pool)

pool.dispose()
------------- POPULATE FINISH -----------------

------------- QUERY START -----------------
[default@ks123] get cf1 where 'indexedColumn'='65';

0 Row Returned.
Elapsed time: 2.38 msec(s).

[default@ks123] get cf1 where 'indexedColumn'='66';
-------------------
RowKey: 66
=> (column=1, value=val, timestamp=1355818765548964, ttl=7884000)
...
=> (column=10087, value=val, timestamp=1355818766075538, ttl=7884000)
=> (column=indexedColumn, value=66, timestamp=1355818768119334, ttl=7887600)

1 Row Returned.
Elapsed time: 31 msec(s).
------------- QUERY FINISH -----------------

This is all using Cassandra 1.1.7 with default settings.

Best regards,

Alexei Bakanov

Reply via email to