Sounds like you're simply throwing more seq scans at it via m/r than your disk can handle. iostat could confirm that disk is the bottleneck. But "real" monitoring would be better. http://www.datastax.com/products/opscenter
On Thu, Dec 8, 2011 at 1:02 AM, Patrik Modesto <patrik.mode...@gmail.com> wrote: > Hi Jake, > > I see the timeouts in mappers as well as at random-access backend > daemons (for web services). There are now 10 mappers, 2 reducers on > each node. There is one big 4-disk raid10 array on each node on which > there is cassandra together with HDFS. We store just few GB of files > on HDFS, otherwise we don't use it. > > Regards, > P. > > On Wed, Dec 7, 2011 at 15:33, Jake Luciani <jak...@gmail.com> wrote: >> Where do you see the timeout exceptions? in the mappers? >> >> How many mappers reducers slots are you using? What does your disk setup >> look like? do you have HDFS on same disk as cassandra data dir? >> >> -Jake >> >> >> On Tue, Dec 6, 2011 at 4:50 AM, Patrik Modesto <patrik.mode...@gmail.com> >> wrote: >>> >>> Hi, >>> >>> I'm quite desperate about Cassandra's performance in our production >>> cluster. We have 8 real-HW nodes, 32core CPU, 32GB memory, 4 disks in >>> raid10, cassandra 0.8.8, RF=3 and Hadoop. >>> We four keyspaces, one is the large one, it has 2 CFs, one is kind of >>> index, the other holds data. There are about 7milinon rows, mean row >>> size is 7kB. We run several mapreduce tasks, most of them just reads >>> from cassandra and writes to hdfs, but one fetch rows from cassnadra, >>> compute something and write it back, for each row we compute three new >>> json values, about 1kB each (they get overwritten next round). >>> >>> We got lots and lots of Timeout exceptions, LiveSSTablesCount over >>> 100. Reapir doesn't finish even in 24hours, reading from the other >>> keyspaces timeouts as well. We set compaction_throughput_mb_per_sec: >>> 0 but it didn't help. >>> >>> Did we choose wrong DB for our usecase? >>> >>> Regards, >>> Patrik >>> >>> This is from one node: >>> >>> INFO 10:28:40,035 Pool Name Active Pending Blocked >>> INFO 10:28:40,036 ReadStage 96 695 0 >>> INFO 10:28:40,037 RequestResponseStage 0 0 0 >>> INFO 10:28:40,037 ReadRepairStage 0 0 0 >>> INFO 10:28:40,037 MutationStage 1 1 0 >>> INFO 10:28:40,038 ReplicateOnWriteStage 0 0 0 >>> INFO 10:28:40,038 GossipStage 0 0 0 >>> INFO 10:28:40,038 AntiEntropyStage 0 0 0 >>> INFO 10:28:40,039 MigrationStage 0 0 0 >>> INFO 10:28:40,039 StreamStage 0 0 0 >>> INFO 10:28:40,040 MemtablePostFlusher 0 0 0 >>> INFO 10:28:40,040 FlushWriter 0 0 0 >>> INFO 10:28:40,040 MiscStage 0 0 0 >>> INFO 10:28:40,041 FlushSorter 0 0 0 >>> INFO 10:28:40,041 InternalResponseStage 0 0 0 >>> INFO 10:28:40,041 HintedHandoff 1 5 0 >>> INFO 10:28:40,042 CompactionManager n/a 27 >>> INFO 10:28:40,042 MessagingService n/a 0,16559 >>> >>> And here is the nodetool ring output: >>> >>> 10.2.54.91 NG RAC1 Up Normal 118.04 GB >>> 12.50% 0 >>> 10.2.54.92 NG RAC1 Up Normal 102.74 GB >>> 12.50% 21267647932558653966460912964485513216 >>> 10.2.54.93 NG RAC1 Up Normal 76.95 GB >>> 12.50% 42535295865117307932921825928971026432 >>> 10.2.54.94 NG RAC1 Up Normal 56.97 GB >>> 12.50% 63802943797675961899382738893456539648 >>> 10.2.54.95 NG RAC1 Up Normal 75.55 GB >>> 12.50% 85070591730234615865843651857942052864 >>> 10.2.54.96 NG RAC1 Up Normal 102.57 GB >>> 12.50% 106338239662793269832304564822427566080 >>> 10.2.54.97 NG RAC1 Up Normal 68.03 GB >>> 12.50% 127605887595351923798765477786913079296 >>> 10.2.54.98 NG RAC1 Up Normal 194.6 GB >>> 12.50% 148873535527910577765226390751398592512 >> >> >> >> >> -- >> http://twitter.com/tjake -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com