Thanks Sylvain, I guess I might have misunderstood the meaning of column_index_size_in_kb, My previous understanding about that was: it is the threshold size for a row to pass, after which its columns will be indexed.
If I have understood it correctly, it implies the size of the "blocks (containing columns) that are kept together on the same index". So if you make that high, a large no of columns will need to be deseralized for a single column access, in that block. And it you make it lower than optimal than indexes size will grow up, right? So I guess we should vary that depending on the size of our columns and not the size of rows !? I have valueless columns for my usecase. On Mon, Feb 14, 2011 at 2:06 PM, Sylvain Lebresne <sylv...@datastax.com>wrote: > As said by aaron, if the whole row is under 64k, it won't matter. But since > you spoke of very wide row, I'll assume the whole will be much more than > 64k. > > If so, the row is indexed by block (of 64k, configurable). Then the read > performance depends on how many of those block are needed for the query, > since each block potentially means a seek (potentially because some block > could happen to be sequential on disk). So if the columns you ask for are > really randomly distributed, then yes, the biggest the row is, the biggest > the chance is to have to hit many blocks and the biggest the chance is for > these block to be far apart on disk. > > -- > Sylvain > > On Sun, Feb 13, 2011 at 10:19 PM, Aditya Narayan <ady...@gmail.com> wrote: > >> Jonathan, >> If I ask for around 150-200 columns (totally random not sequential) from a >> very wide row that contains more than a million or even more columns then, >> is the read performance of the SliceQuery operation affected by or "depends >> on the length of the row" ?? (For my use case, I would use the column names >> list for this SliceQuery operation). >> >> >> Thanks >> Aditya >> >> >> On Sun, Feb 13, 2011 at 8:41 PM, Jonathan Ellis <jbel...@gmail.com>wrote: >> >>> On Sun, Feb 13, 2011 at 12:37 AM, E S <tr1skl...@yahoo.com> wrote: >>> > I've gotten myself really confused by >>> > http://wiki.apache.org/cassandra/ArchitectureInternals and am hoping >>> someone can >>> > help me understand what the io behavior of this operation would be. >>> > >>> > When I do a get_slice for a column range, will it seek to every >>> SSTable? I had >>> > thought that it would use the bloom filter on the row key so that it >>> would only >>> > do a seek to SSTables that have a very high probability of containing >>> columns >>> > for that row. >>> >>> Yes. >>> >>> > In the linked doc above, it seems to say that it is only used for >>> > exact column names. Am I misunderstanding this? >>> >>> Yes. You may be confusing multi-row behavior with multi-column. >>> >>> > On a related note, if instead of using a SliceRange I provide an >>> explicit list >>> > of columns, will I have to read all SSTables that have values for the >>> columns >>> >>> Yes. >>> >>> > or is it smart enough to stop after finding a value from the most >>> recent >>> > SSTable? >>> >>> There is no way to know which value is most recent without having to >>> read it first. >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> >> >