Here is a look at query plans http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
tl;dr - wide rows require in index to be read from disk; the fastest query uses no start and no finish. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/03/2012, at 6:58 AM, Dave Brosius wrote: > > sorry, should have been: Given the hashtable nature of cassandra, finding a > row is probably 'relatively' constant no matter how many *rows* you have. > > > ----- Original Message ----- > From: "Dave Brosius" <dbros...@mebigfatguy.com> > Sent: Tue, March 13, 2012 13:43 > Subject: Re: Why is row lookup much faster than column lookup > > < div clas s="PrivateMsgDiv"> Given the hashtable nature of cassandra, > finding a row is probably 'relatively' constant no matter how many columns > you have. > > The smaller the number of columns, i suppose the more likely that all the > columns will be in one sstable. If you've got a ton of columns per row, it is > much more likely that these columns will be spread out in multple ss tables. > Plus, columns are read in chunks, depending on yaml settings. > > > ----- Original Message ----- > From: "A J" <s5a...@gmail.com> > Sent: Tue, March 13, 2012 13:35 > Subject: Why is row lookup much faster than column lookup > > From my tests, I am seeing that a CF that has less than 100 columns > but millions of rows has a much lower latency to read a column in a > row than a CF that has only a few thousands of rows but wide rows with > each having 20K columns. > > Example: > cf1 has 6 Million rows and each row has about 100 columns. > t1 = time.time() > cf1.get(1234,column_count=1) > t2 = time.time() - t1 > print int(t2*1000) > takes 3 ms > > cf2 has 5K rows and each row has about 18K columns. > t1 = time.time() > cf2.get(1234,column_count=1) > t2 = time.time() - t1 > print int(t2*1000) > takes 82ms > > Anything in general on the Cassandra architecture that causes row > lookup to be much faster than column lookup ? > > Thanks.