Re: A few quick questions to help me design a better schema..

Tyler Hobbs Sun, 09 Jan 2011 10:57:53 -0800

>
> 1. ) If certain columns in a row get mutated too frequently or if new
> columns are added to the row frequently then does the reads of old columns
> that rarely get changed is also affected ? In other words, is the
> performance of reads of almost infrequently changing columns in a row where
> some columns are frequently updated/inserted, affected in any manner ?
>


Yes, the performance of reading columns that you haven't changed will still
be affected by changing other columns in the row.  Constantly updating a row
causes it to be split across multiple SSTables.  If you are asking for the
columns by name, you may not need to actually read any extra data from most
of the SSTables, but you will need to at least read the per-row Bloom Filter
on each (or read the index and scan a portion of the row for slices); this
costs one seek for each SSTable.


> 2. ) Are all columns inside a super column family, supercolumns or can they
> may be simple columns+supercolumns  as well ?
>

They are all super columns.  There is no mixing of column types.


> 3. ) When row cache is enabled and certain  columns of a row are read then
> will the entire row be put into the cache or just those read columns are put
> into cache?
>

The entire row will be put into the cache.  This is good motivation for
splitting timelines into multiple rows by a relatively low timespan if you
mainly read the very end of the timeline.  Note that there has been
discussion somewhere of allowing you to only cache the last N columns of a
row in the row cache.


> 4. ) Does the larger no of column families has any impact on the
> performance(I read about it somewhere)? Should information for a particular
> row key be split in multiple column families according to the specific query
> demands or should all data related to a particular row key be kept together
> in a single column family ?
>

A higher number of column families requires more memory to be used and more
compactions to occur.  I can't answer the rest of the question accurately
without more detail on the particular use case.


> 5. ) Are there any limitation of valueless column to consider. I read in a
> ppt   "Only works with <= 2B columns in 0.7 valueless colum". I could
> understand the meaning of this statement.
>

I believe this is referring to the 2 billion column limit per row.  In the
real world, you generally don't want to get anywhere near that many columns
in a single row.

- Tyler

Re: A few quick questions to help me design a better schema..

Reply via email to