Hello, My company is working on transition of our relational data model to Cassandra. Naturally, one of the basic demands is to have secondary indexes to answer queries quickly according to the application's needs. After looking at Cassandra's native support for secondary indexes, we decided not to use them due to the poor performance for high-cardinality values. Instead, we decide to implement secondary indexes manually. Some search led us to http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html which details a schema for such indexes. However, the method employed there specifically adds an index entries column family, whereas it seems like only 2 CFs are needed - one for the items and one for the indexes (assuming one has access to both old and new values when updating an item). The article actually mentioned that this is indeed not the obvious solution, "for a number of reasons related to Cassandra's model of eventual consistency ... will not reliably work" and "it's a really good idea to make sure you understand why this CF is necessary". However, no additional information is provided on what might be a critical issue, as dealing with corrupt indexes in a large production environment is surely to be a nightmare. What are the community's thoughts on this matter? Given the writer's credentials in the Cassandra realm, specifically regarding indexes, I'm inclined not to ignore his remarks. References to a document / system that implement similar indexes would be greatly appreciated as well.
- alon