If you're getting lots of timeout exceptions with mapreduce, you might take a 
look at http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
We saw that and tweaked a variety of things - all of which are listed there.  
Ultimately, we also boosted hadoop's tolerance for them as well and it was just 
fine - so that it could retry more.  A coworker had the same experience running 
hadoop over elastic search - having to up that tolerance.  An example 
configuration for modifying that is shown in the link above.

Hopefully that will help for your mapreduce jobs at least.  We've had good luck 
with MR/Pig over Cassandra, but it's after some lessons learned wrt 
configuration of both Cassandra and Hadoop.

On Dec 6, 2011, at 3:50 AM, Patrik Modesto wrote:

> Hi,
> 
> I'm quite desperate about Cassandra's performance in our production
> cluster. We have 8 real-HW nodes, 32core CPU, 32GB memory, 4 disks in
> raid10, cassandra 0.8.8, RF=3 and Hadoop.
> We four keyspaces, one is the large one, it has 2 CFs, one is kind of
> index, the other holds data. There are about 7milinon rows, mean row
> size is 7kB. We run several mapreduce tasks, most of them just reads
> from cassandra and writes to hdfs, but one fetch rows from cassnadra,
> compute something and write it back, for each row we compute three new
> json values, about 1kB each (they get overwritten next round).
> 
> We got lots and lots of Timeout exceptions, LiveSSTablesCount over
> 100. Reapir doesn't finish even in 24hours, reading from the other
> keyspaces timeouts as well.  We set compaction_throughput_mb_per_sec:
> 0 but it didn't help.
> 
> Did we choose wrong DB for our usecase?
> 
> Regards,
> Patrik
> 
> This is from one node:
> 
> INFO 10:28:40,035 Pool Name                    Active   Pending   Blocked
> INFO 10:28:40,036 ReadStage                        96       695         0
> INFO 10:28:40,037 RequestResponseStage              0         0         0
> INFO 10:28:40,037 ReadRepairStage                   0         0         0
> INFO 10:28:40,037 MutationStage                     1         1         0
> INFO 10:28:40,038 ReplicateOnWriteStage             0         0         0
> INFO 10:28:40,038 GossipStage                       0         0         0
> INFO 10:28:40,038 AntiEntropyStage                  0         0         0
> INFO 10:28:40,039 MigrationStage                    0         0         0
> INFO 10:28:40,039 StreamStage                       0         0         0
> INFO 10:28:40,040 MemtablePostFlusher               0         0         0
> INFO 10:28:40,040 FlushWriter                       0         0         0
> INFO 10:28:40,040 MiscStage                         0         0         0
> INFO 10:28:40,041 FlushSorter                       0         0         0
> INFO 10:28:40,041 InternalResponseStage             0         0         0
> INFO 10:28:40,041 HintedHandoff                     1         5         0
> INFO 10:28:40,042 CompactionManager               n/a        27
> INFO 10:28:40,042 MessagingService                n/a   0,16559
> 
> And here is the nodetool ring  output:
> 
> 10.2.54.91      NG          RAC1        Up     Normal  118.04 GB
> 12.50%  0
> 10.2.54.92      NG          RAC1        Up     Normal  102.74 GB
> 12.50%  21267647932558653966460912964485513216
> 10.2.54.93      NG          RAC1        Up     Normal  76.95 GB
> 12.50%  42535295865117307932921825928971026432
> 10.2.54.94      NG          RAC1        Up     Normal  56.97 GB
> 12.50%  63802943797675961899382738893456539648
> 10.2.54.95      NG          RAC1        Up     Normal  75.55 GB
> 12.50%  85070591730234615865843651857942052864
> 10.2.54.96      NG          RAC1        Up     Normal  102.57 GB
> 12.50%  106338239662793269832304564822427566080
> 10.2.54.97      NG          RAC1        Up     Normal  68.03 GB
> 12.50%  127605887595351923798765477786913079296
> 10.2.54.98      NG          RAC1        Up     Normal  194.6 GB
> 12.50%  148873535527910577765226390751398592512

Reply via email to