One more question: does each node keep an index of their own values, or is the index global?
Alexander > Thank you very much, this is the information I was looking for. I started > adding secondary index functionality to Cassandra myself, and it turns out > I am doing almost exactly the same thing. I will try to change my code to > use your implementation as well to compare results. > > Alexander > >> Alexander: >> >> The secondary indexes in 0.7.0 (type KEYS) are stored internally in a >> column >> family, and are kept synchronized with the base data via locking on a >> local >> node, meaning they are always consistent on the local node. Eventual >> consistency still applies between nodes, but a returned result will >> always >> match your query. >> >> This index column family stores a mapping from index values to a sorted >> list >> of matching row keys. When you query for rows between x and y matching a >> value z (via the get_indexed_slices call), Cassandra performs a lookup >> to >> the index column family for the slice of columns in row z between x and >> y. >> If any matches are found in the index, they are row keys that match the >> index clause, and we query the base data to return you those rows. >> >> Iterating through all of the rows matching an index clause on your >> cluster >> is guaranteed to touch N/RF of the nodes in your cluster, because each >> node >> only knows about data that is indexed locally. >> >> Some portions of the indexing implementation are not fully baked yet: >> for >> instance, although the API allows you to specify multiple columns, only >> one >> index will actually be used per query, and the rest of the clauses will >> be >> brute forced. >> >> A second secondary index implementation has been on the back burner for >> a >> while: it provides an identical API, but does not use a column family to >> store the index, and should be more efficient for append only data. See >> https://issues.apache.org/jira/browse/CASSANDRA-1472 >> >> Thanks, >> Stu >> >> On Wed, Feb 9, 2011 at 2:35 AM, <alta...@ceid.upatras.gr> wrote: >> >>> Thank you for the links, I did read a bit in the comments of the >>> ticket, >>> but I couldn't get much out of it. >>> >>> I am mainly interested in how the index is stored and partitioned, not >>> how >>> it is used. I think the people in the dev list will probably be better >>> qualified to answer that. My questions always seem to get moved to the >>> user list, and usually with good cause, but I think this time it should >>> be >>> in the dev list :) Please move it back, if you can. >>> >>> Alexander >>> >>> > AFAIK this was the ticket the original work was done under >>> > https://issues.apache.org/jira/browse/CASSANDRA-1415 >>> > >>> > also http://www.datastax.com/docs/0.7/data_model/secondary_indexes >>> > and http://pycassa.githubcom/pycassa/tutorial.html#indexes may help >>> > >>> > (sorry on reflection the email prob did not need to be moved from >>> dev, >>> my >>> > bad) >>> > Aaron >>> > >>> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton <aa...@thelastpickle.com> >>> wrote: >>> > >>> > Moving to the user group. >>> > >>> > >>> > >>> > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote: >>> > >>> > Hello, >>> > >>> > I'd like some information about how secondary indices work under the >>> hood. >>> > >>> > 1) Is data stored in some external data structure, or is it stored in >>> an >>> > actual Cassandra table, as columns within column families? >>> > 2) Is data stored sorted or not? How is it partitioned? >>> > 3) How can I access index data? >>> > >>> > Thanks in a advance, >>> > >>> > Alexander Altanis >>> > >>> >> > >