How big are the multi get batches ? How do the wide row get_slice calls behave when the multi gets are not running ?
Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 9/05/2012, at 1:47 AM, Luís Ferreira wrote: > Maybe one of the problems is that I am reading the columns in a row and the > rows themselves in batches, using the count attribute in the SliceRange and > by changing the start column or the corresponding for rows with the KeyRange. > According to your blog post, using start key to read for millions of > rows/columns has a lot of latency, but how else can I read an entire row that > does not fit into memory? > > I'll have to run some tests again and check the tpstats. Still, do you think > that adding more machines to the cluster will help a lot? I say this, because > I started with a 3 node cluster and have scaled to a 5 node cluster with > little improvement... > > Thanks anyway. > > On May 8, 2012, at 9:54 AM, aaron morton wrote: > >> If I was rebuilding my power after spending the first thousand years of the >> Third Age as a shapeless evil I would cast my Eye of Fire in the direction >> of the filthy little multi_gets. >> >> A node can fail to respond to a query with rpc_timeout for two reasons: >> either the command did not run or the command started but did not complete. >> The former is much more likely. If it is happening you will see large >> pending counts and dropped messages in nodetool tpstats, you will also see >> log entries about dropped messages. >> >> When you send a multi_get each row you request becomes a message in the read >> thread pool. If you request 100 rows you will put 100 messages in the pool, >> which by default has 32 threads. If some clients are sending large multi get >> (or batch mutations) you can overload nodes and starve other clients. >> >> for background, some metrics here for selecting from 10million columns >> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ >> >> Hope that helps. >> >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 6/05/2012, at 7:14 AM, Luís Ferreira wrote: >> >>> Hi, >>> >>> I'm doing get_slice on huge rows (3 million columns) and even though I am >>> doing it iteratively I keep getting TimeoutExceptions. I've tried to change >>> the number of columns fetched but it did not work. >>> >>> I have a 5 machine cluster, each with 4GB of which 3 are dedicated to >>> cassandra's heap, but still the all consume all of the memory and get huge >>> IO wait due to the amout of reads. >>> >>> I am running tests with 100 clients all performing multiple operations >>> mostly get_slice, get and multi_get, but the timeouts only occur in the >>> get_slice. >>> >>> Does this have anything to do with cassandra's ability (or lack thereof) to >>> keep the rows in memory? Or am I doing anything wrong? Any tips? >>> >>> Cumpliments, >>> Luís Ferreira >>> >>> >>> >>> >> > > Cumprimentos, > Luís Ferreira > > >