Re: What is performance gain of clustering columns

2017-10-03 Thread kurt greaves
Clustering info is stored in the index of an SSTable, so if you are only querying a subset of rows within the partition you don't necessarily have to hit all SSTables, just the SSTables that contain the relevant clustering col's. They make a big improvement, and can also be used quite effectively i

What is performance gain of clustering columns

2017-10-03 Thread eugene miretsky
Hi, Clustering columns are used to order the data in a partition. However, since data is split into SSTables, the rows are ordered by clustering key only within each SSTable. Cassandra still needs to check all SSTables, and merge the data if it is found in several SSTables. The only scanario where