Hi, I was looking a bit on a case we have with columnfamily which has 439k supercolumns, each supercolumn with ~3 columns each so a total of some 1.3 million objects in total.
This takes about 9 second to read with thrift on first access, 4-5 second on second access. I took a little closer look at this, I noticed that roughly half of this time was spend in cassandra. I will look more at this, but I thought I would just ask people here as it could be that someone already had good explanations... Most of the time is spent here public void collectReducedColumns(IColumnContainer container, Iterator<IColumn> reducedColumns, int gcBefore) { int liveColumns = 0; AbstractType comparator = container.getComparator(); while (reducedColumns.hasNext()) { if (liveColumns >= count) break; IColumn column = reducedColumns.next(); if (logger.isDebugEnabled()) logger.debug(String.format("collecting %s of %s: %s", liveColumns, count, column.getString(comparator))); if (finish.length > 0 && ((!reversed && comparator.compare(column.name(), finish) > 0)) || (reversed && comparator.compare(column.name(), finish) < 0)) break; // only count live columns towards the `count` criteria if (!column.isMarkedForDelete() && (!container.isMarkedForDelete() || (ClockRelationship.GREATER_THAN == column.mostRecentLiveChangeAt().compare(container.getMarkedForDeleteAt())))) { liveColumns++; } // but we need to add all non-gc-able columns to the result for read repair: if (QueryFilter.isRelevant(column, container, gcBefore)) container.addColumn(column); } } Adding some time measuring print revealed a few interesting this. 1. First time I request the row (I request the entire row in this case), collectReducedColumns() is called twice. I have not had time to understand why yet, but there is one difference between the calls. All columns are returned both times, but the first call is done with MAXINT as "count" while the second call has the maxcount actually requested by the client as "count". The first call takes about 3.7 seconds to process, the second about 1.7 secs. Whatever reason, I think we should be able to remove one of these calls? 2. Almost half the time is spent in "container.addColumn". This is probably no big surprise as there is a lot of code hidden "down there". I am however very curious if it could not be significantly optimized, especially in the case where you have predicates which cases all columns to be included. That said, I have not manage to read enough cassandra code to understand tombstones or all the other things which is going on in that part of the code. 3. A bit more of a surprise, most of the remaining 50% of the time seems to occur at while (reducedColumns.hasNext()) That is, save system.nanoTime() at the end of the loop and after hasNext and sum up and it accounts for almost 50% of the total time. That seems quite weird to me. I will dig more, but I was curious if someone had answers already. Best regards, Terje