Re: understanding of native indexes: limitations, potential side effects,...

Dave Brosius Wed, 16 May 2012 12:04:49 -0700

Each index you define on the source CF is created using an internal CFthat has as its key the value of the column it's indexing, and as itscolumns, all the keys of all the rows in the source CF that have thatvalue. So if all your rows in your source CF have the same value, thenyour index cf will have one row with N columns for each N rows in theoriginal CF.



On 05/16/2012 02:58 PM, David Vanderfeesten wrote:

Txs Jeremiah,

But I am not sure I am following " number of columns could be equal tonumber of rows ". Is native index implemented as one cf shared overall the indexes (one row in the idx CF corresponding to one index) oris there an internal index cf per index?. My (potential wrong) mindsetwas the latter. In that case if you would index a column with a veryhigh cardinality like for example serialNbr, this correspondinginternal idx cf will just lead to almost the same nbr of rows as theoriginal cf containing the serialnbr. I can''t match that what you areexplaining...


- David

On Wed, May 16, 2012 at 6:23 PM, Jeremiah Jordan<jeremiah.jor...@morningstar.com<mailto:jeremiah.jor...@morningstar.com>> wrote:


    The limitation is because number of columns could be equal to
    number of rows.  If number of rows is large this can become an issue.

    -Jeremiah

    ------------------------------------------------------------------------
    *From:* David Vanderfeesten [feest...@gmail.com
    <mailto:feest...@gmail.com>]
    *Sent:* Wednesday, May 16, 2012 6:58 AM
    *To:* user@cassandra.apache.org <mailto:user@cassandra.apache.org>
    *Subject:* understanding of native indexes: limitations, potential
    side effects,...

    Hi

    I like to better understand the limitations of native indexes,
    potential side effects and scenarios where they are required.

    My understanding so far :
    - Is that indexes on each node are storing indexes for data
    locally on the node itself.
    - Indexes do not return values in a sorted way (hashes of the
    indexed row keys are defining the order)
    - Given by the design referred in the first bullet, a coordinator
    node receiving a read of a native index, needs to spawn a read to
    multiple nodes(set of nodes together covering at least the
    complete key space + potentially more to assure read consistency
    level).
    - Each write to an indexed column leads to an additional local
    read of the index to update the index (kind of obvious but easily
    forgotten when tuning your system for write-only workload)
    - When using a where clause in CQL you need at least to specify an
    equal condition on a native indexed column. Additional conditions
    in the where clause are filtered out by the coordinator node
    receiving the CQL query.
    - native indexes do not support very well columns with high number
    of discrete values throughout the entire CF.

    Is upper understanding correct and complete?
    Some doubts:
    - about the limitation of indexing columns with high number of
    discrete values:
    I assume native indexes  are implemented with an internally
    managed CF per index. With high cardinality values, in worst case,
    the number of rows in the index are identical to the number of
    rows of the indexed CF. Or are there other reasons for the
    limitation, and if that's the case, is there a guideline on the
    max. nbr of cardinality that is still reasonable?
    -Are column updates and the update of the indexes (read + write
    action) atomic and isolated from concurrent updates?

    Txs!

    David

Re: understanding of native indexes: limitations, potential side effects,...

Reply via email to