Hi

I like to better understand the limitations of native indexes, potential
side effects and scenarios where they are required.

My understanding so far :
- Is that indexes on each node are storing indexes for data locally on the
node itself.
- Indexes do not return values in a sorted way (hashes of the indexed row
keys are defining the order)
- Given by the design referred in the first bullet, a coordinator node
receiving a read of a native index, needs to spawn a read to multiple
nodes(set of nodes together covering at least the complete key space +
potentially more to assure read consistency level).
- Each write to an indexed column leads to an additional local read of the
index to update the index (kind of obvious but easily forgotten when tuning
your system for write-only workload)
- When using a where clause in CQL you need at least to specify an equal
condition on a native indexed column. Additional conditions in the where
clause are filtered out by the coordinator node receiving the CQL query.
- native indexes do not support very well columns with high number of
discrete values throughout the entire CF.

Is upper understanding correct and complete?
Some doubts:
- about the limitation of indexing columns with high number of discrete
values:
I assume native indexes  are implemented with an internally managed CF per
index. With high cardinality values, in worst case, the number of rows in
the index are identical to the number of rows of the indexed CF. Or are
there other reasons for the limitation, and if that's the case, is there a
guideline on the max. nbr of cardinality that is still reasonable?
-Are column updates and the update of the indexes (read + write action)
atomic and isolated from concurrent updates?

Txs!

David

Reply via email to