Does it mean that we should design data model such that row keys actually become columns (and create secondary index) so that the data retrieval is faster. I am soon setting up big test instances to test all this.
On Fri, Feb 25, 2011 at 11:18 AM, Ed Anuff <e...@anuff.com> wrote: > It's nice to see some testing in this regard, however, it's worth pointing > out something that gets lost in CF index vs secondary index discussions. > What you're really proving is that get_slice (across columns) is faster than > get_indexed_slices (across keys). For up to a certain size (and it would be > nice if there were some emperical testing to determine what that size is), > get_slice should be one of the most performant operations Cassandra can do. > CF index approaches are basically all about getting your data into a > situation where you can use get_slice to quickly perform the search. The > reasons for using Cassandra's built in secondary index support, IMHO, is > that (1) it's easy to use whereas CF indexes are managed by the client and > (2) there's concern about how large an index you'd be able to effectively > store in a CF index row. The first point is more about Cassandra being > easier for newcomers, the latter point is something I'd like to see some > more data around. Maybe you want to run your tests up to much larger sizes > and see if there's a point where the results change? FWIW, I recently > switched back to CF-based indexes from secondary indexes, largely for the > flexibility in the types of queries that became possible, but it's nice to > see there's some performance benefit. The other thing would be good to look > at is timing the overhead of what it takes to update your index as you > change the values that are being indexed. > > > > On Fri, Feb 25, 2011 at 10:23 AM, Ron Siemens <rsiem...@greatergood.com> > wrote: >> >> I updated the cassandra version in the hector package from 7.0 to 7.2. >> The occasional slow-down in the CF-index went away. I then upped the heap >> to 512MB, and the secondary-indexing then works. Seems awfully memory >> hungry for my small dataset. Even the CF-index was faster with more heap. >> These are the times with Cassandra-0.7.2 and 512M heap. Slightly different >> testing: I'm varying the index used which give different data size results. >> It still surprises me that the CF index does substantially better. >> >> Secondary Index >> >> DEBUG Retrieved THS / 7293 rows, in 1051 ms >> DEBUG Retrieved TRS / 7289 rows, in 1448 ms >> DEBUG Retrieved BCS / 7788 rows, in 1553 ms >> DEBUG Retrieved ARS / 7426 rows, in 1479 ms >> DEBUG Retrieved CHS / 7290 rows, in 1575 ms >> DEBUG Retrieved MS / 4523 rows, in 766 ms >> DEBUG Retrieved PRS / 562 rows, in 40 ms >> DEBUG Retrieved GGF / 1162 rows, in 122 ms >> DEBUG Retrieved VET / 7313 rows, in 1193 ms >> DEBUG Retrieved AUT / 7287 rows, in 1746 ms >> DEBUG Retrieved LIT / 7291 rows, in 1331 ms >> >> CF Index >> >> DEBUG Retrieved THS / 7293 rows, in 17 + 759 ms >> DEBUG Retrieved TRS / 7289 rows, in 19 + 734 ms >> DEBUG Retrieved BCS / 7788 rows, in 23 + 736 ms >> DEBUG Retrieved ARS / 7426 rows, in 23 + 1448 ms >> DEBUG Retrieved CHS / 7290 rows, in 18 + 638 ms >> DEBUG Retrieved MS / 4523 rows, in 32 + 622 ms >> DEBUG Retrieved PRS / 562 rows, in 2 + 50 ms >> DEBUG Retrieved GGF / 1162 rows, in 3 + 79 ms >> DEBUG Retrieved VET / 7313 rows, in 17 + 686 ms >> DEBUG Retrieved AUT / 7287 rows, in 17 + 758 ms >> DEBUG Retrieved LIT / 7291 rows, in 17 + 745 ms >> >> On Feb 24, 2011, at 3:39 PM, Ron Siemens wrote: >> >> > >> > I failed to mention: this is just doing repeated data retrievals using >> > the index. >> > >> >> ... >> >> >> >> Sample run: Secondary index. >> >> >> >> DEBUG Retrieved THS / 7293 rows, in 2012 ms >> >> DEBUG Retrieved THS / 7293 rows, in 1956 ms >> >> DEBUG Retrieved THS / 7293 rows, in 1843 ms >> > ... >> > >> > >