For the benefit of others, I ended up finding out that the CQL library I was using (https://github.com/gocql/gocql) at this time leaves paging page size defaulted to no paging, so Cassandra was trying to pull all rows of the partition into memory at once. Setting the page size to a reasonable number seems to have done the trick.
On Tue, Nov 25, 2014 at 2:54 PM, Dan Kinder <dkin...@turnitin.com> wrote: > Thanks, very helpful Rob, I'll watch for that. > > On Tue, Nov 25, 2014 at 11:45 AM, Robert Coli <rc...@eventbrite.com> > wrote: > >> On Tue, Nov 25, 2014 at 10:45 AM, Dan Kinder <dkin...@turnitin.com> >> wrote: >> >>> To be clear, I expect this range query to take a long time and perform >>> relatively heavy I/O. What I expected Cassandra to do was use auto-paging ( >>> https://issues.apache.org/jira/browse/CASSANDRA-4415, >>> http://stackoverflow.com/questions/17664438/iterating-through-cassandra-wide-row-with-cql3) >>> so that we aren't literally pulling the entire thing in. Am I >>> misunderstanding this use case? Could you clarify why exactly it would slow >>> way down? It seems like with each read it should be doing a simple range >>> read from one or two sstables. >>> >> >> If you're paging through a single partition, that's likely to be fine. >> When you said "range reads ... over rows" my impression was you were >> talking about attempting to page through millions of partitions. >> >> With that confusion cleared up, the likely explanation for lack of >> availability in your case is heap pressure/GC time. Look for GCs around >> that time. Also, if you're using authentication, make sure that your >> authentication keyspace has a replication factor greater than 1. >> >> =Rob >> >> >> > > > -- > Dan Kinder > Senior Software Engineer > Turnitin – www.turnitin.com > dkin...@turnitin.com > -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com