Re: Homebrew CF-indexing vs secondary indexing

Mohit Anchlia Fri, 25 Feb 2011 12:10:48 -0800

Does it mean that we should design data model such that row keys
actually become columns (and create secondary index) so that the data
retrieval is faster. I am soon setting up big test instances to test
all this.


On Fri, Feb 25, 2011 at 11:18 AM, Ed Anuff <e...@anuff.com> wrote:
> It's nice to see some testing in this regard, however, it's worth pointing
> out something that gets lost in CF index vs secondary index discussions.
> What you're really proving is that get_slice (across columns) is faster than
> get_indexed_slices (across keys).  For up to a certain size (and it would be
> nice if there were some emperical testing to determine what that size is),
> get_slice should be one of the most performant operations Cassandra can do.
> CF index approaches are basically all about getting your data into a
> situation where you can use get_slice to quickly perform the search.  The
> reasons for using Cassandra's built in secondary index support, IMHO, is
> that (1) it's easy to use whereas CF indexes are managed by the client  and
> (2) there's concern about how large an index you'd be able to effectively
> store in a CF index row.  The first point is more about Cassandra being
> easier for newcomers, the latter point is something I'd like to see some
> more data around.  Maybe you want to run your tests up to much larger sizes
> and see if there's a point where the results change?  FWIW, I recently
> switched back to CF-based indexes from secondary indexes, largely for the
> flexibility in the types of queries that became possible, but it's nice to
> see there's some performance benefit.  The other thing would be good to look
> at is timing the overhead of what it takes to update your index as you
> change the values that are being indexed.
>
>
>
> On Fri, Feb 25, 2011 at 10:23 AM, Ron Siemens <rsiem...@greatergood.com>
> wrote:
>>
>> I updated the cassandra version in the hector package from 7.0 to 7.2.
>>  The occasional slow-down in the CF-index went away.  I then upped the heap
>> to 512MB, and the secondary-indexing then works.  Seems awfully memory
>> hungry for my small dataset.  Even the CF-index was faster with more heap.
>>  These are the times with Cassandra-0.7.2 and 512M heap.  Slightly different
>> testing: I'm varying the index used which give different data size results.
>>  It still surprises me that the CF index does substantially better.
>>
>> Secondary Index
>>
>> DEBUG Retrieved THS / 7293 rows, in 1051 ms
>> DEBUG Retrieved TRS / 7289 rows, in 1448 ms
>> DEBUG Retrieved BCS / 7788 rows, in 1553 ms
>> DEBUG Retrieved ARS / 7426 rows, in 1479 ms
>> DEBUG Retrieved CHS / 7290 rows, in 1575 ms
>> DEBUG Retrieved MS / 4523 rows, in 766 ms
>> DEBUG Retrieved PRS / 562 rows, in 40 ms
>> DEBUG Retrieved GGF / 1162 rows, in 122 ms
>> DEBUG Retrieved VET / 7313 rows, in 1193 ms
>> DEBUG Retrieved AUT / 7287 rows, in 1746 ms
>> DEBUG Retrieved LIT / 7291 rows, in 1331 ms
>>
>> CF Index
>>
>> DEBUG Retrieved THS / 7293 rows, in 17 + 759 ms
>> DEBUG Retrieved TRS / 7289 rows, in 19 + 734 ms
>> DEBUG Retrieved BCS / 7788 rows, in 23 + 736 ms
>> DEBUG Retrieved ARS / 7426 rows, in 23 + 1448 ms
>> DEBUG Retrieved CHS / 7290 rows, in 18 + 638 ms
>> DEBUG Retrieved MS / 4523 rows, in 32 + 622 ms
>> DEBUG Retrieved PRS / 562 rows, in 2 + 50 ms
>> DEBUG Retrieved GGF / 1162 rows, in 3 + 79 ms
>> DEBUG Retrieved VET / 7313 rows, in 17 + 686 ms
>> DEBUG Retrieved AUT / 7287 rows, in 17 + 758 ms
>> DEBUG Retrieved LIT / 7291 rows, in 17 + 745 ms
>>
>> On Feb 24, 2011, at 3:39 PM, Ron Siemens wrote:
>>
>> >
>> > I failed to mention: this is just doing repeated data retrievals using
>> > the index.
>> >
>> >> ...
>> >>
>> >> Sample run: Secondary index.
>> >>
>> >> DEBUG Retrieved THS / 7293 rows, in 2012 ms
>> >> DEBUG Retrieved THS / 7293 rows, in 1956 ms
>> >> DEBUG Retrieved THS / 7293 rows, in 1843 ms
>> > ...
>> >
>>
>
>

Re: Homebrew CF-indexing vs secondary indexing

Reply via email to