Hi all, The problem: Map<key, value> is maintained as a simple Cassandra CF and there is a stream of put/deletes from clients. For newly inserted rows, I need to update solr/lucene index, by pooling from cassandra. (I know for solandra, not asking about this)
I am to use cassandra as a classical write ahead log, but with extra twist, deduplication and mutator operations aggregation. behind this idea is a Map<Key, SortedList<timestamp, value>> where list sorted on timestamp contains mutating operations (add(value) or delete). In order to update solr index I need to see which of keys are modified since last solr commit. Now I do not know how to do it efficiently with cassandra. After "commit" to solr I have either to: a) remember last timestamp and scan from there (secondary index on timestamp? Is cassandra native timestamp possible for this) or c) keep two CF, "dirty" and "clean" and migrate records from dirty to clean on commit or c) ??? Somehow I do not like a) b) as I know I do not yet understand cassandra :( Any best practices for such use case? Also, is there efficient operation addIfNotAlreadyThere(key...)... if(!contains(key)) add(key, value) in one network call. As far as I understand, I need to check it myself. As Example: add(1, AAA) add(2, BBB) add(1, CCC) //unconditional adIfNotThere(1, DDD) //noop as key 1 is already there, not deleted ------------------------------- should result in following solr indexing operations 1, AAA 2, BBB Another way to think of it is to identify last add() or last delete() operation from CF? Thanks, eks