Thank you Jeremy, I've already changed the max.*.failures to 20, it
help jobs to finish but doesn't solve the source of the timeouts. I'll
try the other tips.

Regards,
Patrik

On Wed, Dec 7, 2011 at 17:29, Jeremy Hanna <jeremy.hanna1...@gmail.com> wrote:
> If you're getting lots of timeout exceptions with mapreduce, you might take a 
> look at http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
> We saw that and tweaked a variety of things - all of which are listed there.  
> Ultimately, we also boosted hadoop's tolerance for them as well and it was 
> just fine - so that it could retry more.  A coworker had the same experience 
> running hadoop over elastic search - having to up that tolerance.  An example 
> configuration for modifying that is shown in the link above.
>
> Hopefully that will help for your mapreduce jobs at least.  We've had good 
> luck with MR/Pig over Cassandra, but it's after some lessons learned wrt 
> configuration of both Cassandra and Hadoop.
>
> On Dec 6, 2011, at 3:50 AM, Patrik Modesto wrote:
>
>> Hi,
>>
>> I'm quite desperate about Cassandra's performance in our production
>> cluster. We have 8 real-HW nodes, 32core CPU, 32GB memory, 4 disks in
>> raid10, cassandra 0.8.8, RF=3 and Hadoop.
>> We four keyspaces, one is the large one, it has 2 CFs, one is kind of
>> index, the other holds data. There are about 7milinon rows, mean row
>> size is 7kB. We run several mapreduce tasks, most of them just reads
>> from cassandra and writes to hdfs, but one fetch rows from cassnadra,
>> compute something and write it back, for each row we compute three new
>> json values, about 1kB each (they get overwritten next round).
>>
>> We got lots and lots of Timeout exceptions, LiveSSTablesCount over
>> 100. Reapir doesn't finish even in 24hours, reading from the other
>> keyspaces timeouts as well.  We set compaction_throughput_mb_per_sec:
>> 0 but it didn't help.
>>
>> Did we choose wrong DB for our usecase?
>>
>> Regards,
>> Patrik
>>
>> This is from one node:
>>
>> INFO 10:28:40,035 Pool Name                    Active   Pending   Blocked
>> INFO 10:28:40,036 ReadStage                        96       695         0
>> INFO 10:28:40,037 RequestResponseStage              0         0         0
>> INFO 10:28:40,037 ReadRepairStage                   0         0         0
>> INFO 10:28:40,037 MutationStage                     1         1         0
>> INFO 10:28:40,038 ReplicateOnWriteStage             0         0         0
>> INFO 10:28:40,038 GossipStage                       0         0         0
>> INFO 10:28:40,038 AntiEntropyStage                  0         0         0
>> INFO 10:28:40,039 MigrationStage                    0         0         0
>> INFO 10:28:40,039 StreamStage                       0         0         0
>> INFO 10:28:40,040 MemtablePostFlusher               0         0         0
>> INFO 10:28:40,040 FlushWriter                       0         0         0
>> INFO 10:28:40,040 MiscStage                         0         0         0
>> INFO 10:28:40,041 FlushSorter                       0         0         0
>> INFO 10:28:40,041 InternalResponseStage             0         0         0
>> INFO 10:28:40,041 HintedHandoff                     1         5         0
>> INFO 10:28:40,042 CompactionManager               n/a        27
>> INFO 10:28:40,042 MessagingService                n/a   0,16559
>>
>> And here is the nodetool ring  output:
>>
>> 10.2.54.91      NG          RAC1        Up     Normal  118.04 GB
>> 12.50%  0
>> 10.2.54.92      NG          RAC1        Up     Normal  102.74 GB
>> 12.50%  21267647932558653966460912964485513216
>> 10.2.54.93      NG          RAC1        Up     Normal  76.95 GB
>> 12.50%  42535295865117307932921825928971026432
>> 10.2.54.94      NG          RAC1        Up     Normal  56.97 GB
>> 12.50%  63802943797675961899382738893456539648
>> 10.2.54.95      NG          RAC1        Up     Normal  75.55 GB
>> 12.50%  85070591730234615865843651857942052864
>> 10.2.54.96      NG          RAC1        Up     Normal  102.57 GB
>> 12.50%  106338239662793269832304564822427566080
>> 10.2.54.97      NG          RAC1        Up     Normal  68.03 GB
>> 12.50%  127605887595351923798765477786913079296
>> 10.2.54.98      NG          RAC1        Up     Normal  194.6 GB
>> 12.50%  148873535527910577765226390751398592512
>

Reply via email to