Sounds like you're simply throwing more seq scans at it via m/r than
your disk can handle.  iostat could confirm that disk is the
bottleneck.  But "real" monitoring would be better.
http://www.datastax.com/products/opscenter

On Thu, Dec 8, 2011 at 1:02 AM, Patrik Modesto <patrik.mode...@gmail.com> wrote:
> Hi Jake,
>
> I see the timeouts in mappers as well as at random-access backend
> daemons (for web services). There are now 10 mappers, 2 reducers on
> each node. There is one big 4-disk raid10 array on each node on which
> there is cassandra together with HDFS. We store just few GB of files
> on HDFS, otherwise we don't use it.
>
> Regards,
> P.
>
> On Wed, Dec 7, 2011 at 15:33, Jake Luciani <jak...@gmail.com> wrote:
>> Where do you see the timeout exceptions? in the mappers?
>>
>> How many mappers reducers slots are you using?  What does your disk setup
>> look like? do you have HDFS on same disk as cassandra data dir?
>>
>> -Jake
>>
>>
>> On Tue, Dec 6, 2011 at 4:50 AM, Patrik Modesto <patrik.mode...@gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> I'm quite desperate about Cassandra's performance in our production
>>> cluster. We have 8 real-HW nodes, 32core CPU, 32GB memory, 4 disks in
>>> raid10, cassandra 0.8.8, RF=3 and Hadoop.
>>> We four keyspaces, one is the large one, it has 2 CFs, one is kind of
>>> index, the other holds data. There are about 7milinon rows, mean row
>>> size is 7kB. We run several mapreduce tasks, most of them just reads
>>> from cassandra and writes to hdfs, but one fetch rows from cassnadra,
>>> compute something and write it back, for each row we compute three new
>>> json values, about 1kB each (they get overwritten next round).
>>>
>>> We got lots and lots of Timeout exceptions, LiveSSTablesCount over
>>> 100. Reapir doesn't finish even in 24hours, reading from the other
>>> keyspaces timeouts as well.  We set compaction_throughput_mb_per_sec:
>>> 0 but it didn't help.
>>>
>>> Did we choose wrong DB for our usecase?
>>>
>>> Regards,
>>> Patrik
>>>
>>> This is from one node:
>>>
>>>  INFO 10:28:40,035 Pool Name                    Active   Pending   Blocked
>>>  INFO 10:28:40,036 ReadStage                        96       695         0
>>>  INFO 10:28:40,037 RequestResponseStage              0         0         0
>>>  INFO 10:28:40,037 ReadRepairStage                   0         0         0
>>>  INFO 10:28:40,037 MutationStage                     1         1         0
>>>  INFO 10:28:40,038 ReplicateOnWriteStage             0         0         0
>>>  INFO 10:28:40,038 GossipStage                       0         0         0
>>>  INFO 10:28:40,038 AntiEntropyStage                  0         0         0
>>>  INFO 10:28:40,039 MigrationStage                    0         0         0
>>>  INFO 10:28:40,039 StreamStage                       0         0         0
>>>  INFO 10:28:40,040 MemtablePostFlusher               0         0         0
>>>  INFO 10:28:40,040 FlushWriter                       0         0         0
>>>  INFO 10:28:40,040 MiscStage                         0         0         0
>>>  INFO 10:28:40,041 FlushSorter                       0         0         0
>>>  INFO 10:28:40,041 InternalResponseStage             0         0         0
>>>  INFO 10:28:40,041 HintedHandoff                     1         5         0
>>>  INFO 10:28:40,042 CompactionManager               n/a        27
>>>  INFO 10:28:40,042 MessagingService                n/a   0,16559
>>>
>>> And here is the nodetool ring  output:
>>>
>>> 10.2.54.91      NG          RAC1        Up     Normal  118.04 GB
>>> 12.50%  0
>>> 10.2.54.92      NG          RAC1        Up     Normal  102.74 GB
>>> 12.50%  21267647932558653966460912964485513216
>>> 10.2.54.93      NG          RAC1        Up     Normal  76.95 GB
>>> 12.50%  42535295865117307932921825928971026432
>>> 10.2.54.94      NG          RAC1        Up     Normal  56.97 GB
>>> 12.50%  63802943797675961899382738893456539648
>>> 10.2.54.95      NG          RAC1        Up     Normal  75.55 GB
>>> 12.50%  85070591730234615865843651857942052864
>>> 10.2.54.96      NG          RAC1        Up     Normal  102.57 GB
>>> 12.50%  106338239662793269832304564822427566080
>>> 10.2.54.97      NG          RAC1        Up     Normal  68.03 GB
>>> 12.50%  127605887595351923798765477786913079296
>>> 10.2.54.98      NG          RAC1        Up     Normal  194.6 GB
>>> 12.50%  148873535527910577765226390751398592512
>>
>>
>>
>>
>> --
>> http://twitter.com/tjake



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to