Re: How do secondary indices work

altanis Wed, 09 Feb 2011 07:13:57 -0800

One more question: does each node keep an index of their own values, or is
the index global?


Alexander

> Thank you very much, this is the information I was looking for. I started
> adding secondary index functionality to Cassandra myself, and it turns out
> I am doing almost exactly the same thing. I will try to change my code to
> use your implementation as well to compare results.
>
> Alexander
>
>> Alexander:
>>
>> The secondary indexes in 0.7.0 (type KEYS) are stored internally in a
>> column
>> family, and are kept synchronized with the base data via locking on a
>> local
>> node, meaning they are always consistent on the local node. Eventual
>> consistency still applies between nodes, but a returned result will
>> always
>> match your query.
>>
>> This index column family stores a mapping from index values to a sorted
>> list
>> of matching row keys. When you query for rows between x and y matching a
>> value z (via the get_indexed_slices call), Cassandra performs a lookup
>> to
>> the index column family for the slice of columns in row z between x and
>> y.
>> If any matches are found in the index, they are row keys that match the
>> index clause, and we query the base data to return you those rows.
>>
>> Iterating through all of the rows matching an index clause on your
>> cluster
>> is guaranteed to touch N/RF of the nodes in your cluster, because each
>> node
>> only knows about data that is indexed locally.
>>
>> Some portions of the indexing implementation are not fully baked yet:
>> for
>> instance, although the API allows you to specify multiple columns, only
>> one
>> index will actually be used per query, and the rest of the clauses will
>> be
>> brute forced.
>>
>> A second secondary index implementation has been on the back burner for
>> a
>> while: it provides an identical API, but does not use a column family to
>> store the index, and should be more efficient for append only data. See
>> https://issues.apache.org/jira/browse/CASSANDRA-1472
>>
>> Thanks,
>> Stu
>>
>> On Wed, Feb 9, 2011 at 2:35 AM, <alta...@ceid.upatras.gr> wrote:
>>
>>> Thank you for the links, I did read a bit in the comments of the
>>> ticket,
>>> but I couldn't get much out of it.
>>>
>>> I am mainly interested in how the index is stored and partitioned, not
>>> how
>>> it is used. I think the people in the dev list will probably be better
>>> qualified to answer that. My questions always seem to get moved to the
>>> user list, and usually with good cause, but I think this time it should
>>> be
>>> in the dev list :) Please move it back, if you can.
>>>
>>> Alexander
>>>
>>> > AFAIK this was the ticket the original work was done under
>>> > https://issues.apache.org/jira/browse/CASSANDRA-1415
>>> >
>>> > also  http://www.datastax.com/docs/0.7/data_model/secondary_indexes
>>> > and  http://pycassa.githubcom/pycassa/tutorial.html#indexes may help
>>> >
>>> > (sorry on reflection the email prob did not need to be moved from
>>> dev,
>>> my
>>> > bad)
>>> > Aaron
>>> >
>>> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton <aa...@thelastpickle.com>
>>> wrote:
>>> >
>>> > Moving to the user group.
>>> >
>>> >
>>> >
>>> > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote:
>>> >
>>> > Hello,
>>> >
>>> > I'd like some information about how secondary indices work under the
>>> hood.
>>> >
>>> > 1) Is data stored in some external data structure, or is it stored in
>>> an
>>> > actual Cassandra table, as columns within column families?
>>> > 2) Is data stored sorted or not? How is it partitioned?
>>> > 3) How can I access index data?
>>> >
>>> > Thanks in a advance,
>>> >
>>> > Alexander Altanis
>>> >
>>>
>>
>
>

Re: How do secondary indices work

Reply via email to