Hi, yes, I have been looking a bit on this, but not as much as I wanted to.
I kind of forgot about the whole double call on first access (probably something related to caching?) as even fixing that, the performance is not good enough for what I need. The hasNext() call causes interactions with both apache common-collections and google-collection and simply speaking, the amount of code which ends up being executed for the hasNext()/next() part of this loop is a bit ridiculous. I started to optimize some code in common-collections, got a 10-20% improvement (for my specific test case), but as I started understanding both the collection code and Cassandra code better it started dawning on me that: 1. There is just too much code and too many "layers" involved when hasNext is called. I suspect this requires a re-design and the google/common-collections may have to be thrown out. This would seem to be a pretty critical area of Cassandra code for performance, so I suspect the cost of making functionality specifically tuned for Cassandra should be worth it. 2. From what I understand, Columns from all SS/MemTables are merged before considering tombstones and applying the slice predicates. To reduce deserialization and Column object creation, maybe some of this filtering and column merging could/should be done before Columns objects are even created? While this may make things somewhat hairy codewise, JVM performance/memory consumption tend to not be happy in scenarios with heavy object creation, and we get a lot of objects here. 3. It would seem like there might be good possibilities to do a fair bit of optimization for the case where a complete set of columns are fetched (slices with null for start/end predicates), but I am not currently sure how much time it would save and if it would be worth the work and added complexity to the code. I may make an attempt at 1. but I don't feel entirely like I understand enough of the Cassandra code yet to do any of the above at the moment so it may take a little while. Regards, Terje On Thu, Jul 8, 2010 at 8:07 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > Hi Terje, > > Sorry to not get to this sooner. Are you still looking into this? > > On Tue, Jun 22, 2010 at 12:47 PM, Terje Marthinussen > <tmarthinus...@gmail.com> wrote: > > Hi, > > > > I was looking a bit on a case we have with columnfamily which has 439k > > supercolumns, each supercolumn with ~3 columns each so a total of some > 1.3 > > million objects in total. > > > > This takes about 9 second to read with thrift on first access, 4-5 second > on > > second access. > > > > I took a little closer look at this, I noticed that roughly half of this > > time was spend in cassandra. > > > > I will look more at this, but I thought I would just ask people here as > it > > could be that someone already had good explanations... > > > > Most of the time is spent here > > > > public void collectReducedColumns(IColumnContainer container, > > Iterator<IColumn> reducedColumns, int gcBefore) > > { > > int liveColumns = 0; > > AbstractType comparator = container.getComparator(); > > > > while (reducedColumns.hasNext()) > > { > > if (liveColumns >= count) > > break; > > > > IColumn column = reducedColumns.next(); > > if (logger.isDebugEnabled()) > > logger.debug(String.format("collecting %s of %s: %s", > > liveColumns, count, > > column.getString(comparator))); > > > > if (finish.length > 0 > > && ((!reversed && comparator.compare(column.name(), > > finish) > 0)) > > || (reversed && comparator.compare(column.name(), > > finish) < 0)) > > break; > > > > // only count live columns towards the `count` criteria > > if (!column.isMarkedForDelete() > > && (!container.isMarkedForDelete() > > || (ClockRelationship.GREATER_THAN == > > > column.mostRecentLiveChangeAt().compare(container.getMarkedForDeleteAt())))) > > { > > liveColumns++; > > } > > > > // but we need to add all non-gc-able columns to the > > result for read repair: > > if (QueryFilter.isRelevant(column, container, gcBefore)) > > container.addColumn(column); > > } > > } > > > > Adding some time measuring print revealed a few interesting this. > > 1. First time I request the row (I request the entire row in this case), > > collectReducedColumns() is called twice. I have not had time to > understand > > why yet, but there is one difference between the calls. All columns are > > returned both times, but the first call is done with MAXINT as "count" > while > > the second call has the maxcount actually requested by the client as > > "count". The first call takes about 3.7 seconds to process, the second > about > > 1.7 secs. Whatever reason, I think we should be able to remove one of > these > > calls? > > > > 2. Almost half the time is spent in "container.addColumn". This is > probably > > no big surprise as there is a lot of code hidden "down there". I am > however > > very curious if it could not be significantly optimized, especially in > the > > case where you have predicates which cases all columns to be included. > That > > said, I have not manage to read enough cassandra code to understand > > tombstones or all the other things which is going on in that part of the > > code. > > > > 3. A bit more of a surprise, most of the remaining 50% of the time seems > to > > occur at while (reducedColumns.hasNext()) > > That is, save system.nanoTime() at the end of the loop and after hasNext > and > > sum up and it accounts for almost 50% of the total time. That seems quite > > weird to me. > > > > I will dig more, but I was curious if someone had answers already. > > > > Best regards, > > Terje > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >