Re: Experiences with Map&Reduce Stress Tests

Subscriber Mon, 02 May 2011 04:26:03 -0700

Hi Jeremy, 

thanks for the link.
I doubled the rpc_timeout (20 seconds) and reduced the range-batch-size to 
2048, but I still get timeouts...


Udo

Am 29.04.2011 um 18:53 schrieb Jeremy Hanna:

> It sounds like there might be some tuning you can do to your jobs - take a 
> look at the wiki's HadoopSupport page, specifically the Troubleshooting 
> section:
> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
> 
> On Apr 29, 2011, at 11:45 AM, Subscriber wrote:
> 
>> Hi all, 
>> 
>> We want to share our experiences we got during our Cassandra plus Hadoop 
>> Map/Reduce evaluation.
>> Our question was whether Cassandra is suitable for massive distributed data 
>> writes using Hadoop's Map/Reduce feature.
>> 
>> Our setup is described in the attached file 'cassandra_stress_setup.txt'.
>> 
>> <cassandra_stress_setup.txt>
>> 
>> The stress test uses 800 map-tasks to generate data and store it into 
>> cassandra.
>> Each map task writes 500.000 items (i.e. rows) resulting in totally 
>> 400.000.000 items. 
>> There are max. 8 map tasks in parallel on each node. An item contains 
>> (beside the key) two long and two double values, 
>> so that items are a few 100 bytes in size. This leads to a total data size 
>> of approximately 120GB.
>> 
>> The Map-Tasks uses the Hector API. Hector is "feeded" with all three data 
>> nodes. The data is written in chunks of 1000 items.
>> The ConsitencyLevel is set to ONE.
>> 
>> We ran the stress tests in several runs with different configuration 
>> settings (for example I started with cassandra's default configuration and I 
>> used Pelops for another test).
>> 
>> Our observations are like this:
>> 
>> 1) Cassandra is really fast - we are really impressed about the huge write 
>> throughput. A map task writing 500.000 items (appr. 200MB) usually finishes 
>> under 5 minutes.
>> 2) However - unfortunately all tests failed in the end
>> 
>> In the beginning there are no problems. The first 100 (in some tests the 
>> first 300(!)) map tasks are looking fine. But then the trouble starts.
>> 
>> Hadoop's sample output after ~15 minutes:
>> 
>> Kind % Complete      Num Tasks       Pending Running Complete        Killed  
>> Failed/Killed Task Attempts
>> map  14.99%          800             680     24      96              0       
>> 0 / 0
>> reduce       3.99%           1               0       1       0               
>> 0       0 / 0
>> 
>> Some stats:
>>> top
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
>>                                                                              
>>                                                                              
>>                           
>> 31159 xxxx      20   0 2569m 2.2g 9.8m S  450 18.6  61:44.73 java            
>>                                                                              
>>                                                                              
>>                              
>> 
>>> vmstat 1 5
>> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>> 2  1  36832 353688 242820 6837520    0    0    15    73    3    2  3  0 96  0
>> 11  1  36832 350992 242856 6852136    0    0  1024 20900 4508 11738 19  1 74 
>>  6
>> 8  0  36832 339728 242876 6859828    0    0     0  1068 45809 107008 69 10 
>> 20  0
>> 1  0  36832 330212 242884 6868520    0    0     0    80 42112 92930 71  8 21 
>>  0
>> 2  0  36832 311888 242908 6887708    0    0  1024     0 20277 46669 46  7 47 
>>  0
>> 
>>> cassandra/bin/nodetool -h tirdata1 -p 28080 ring
>> Address         Status State   Load            Owns    Token                 
>>                      
>>                                                       
>> 113427455640312821154458202477256070484     
>> 192.168.11.198  Up     Normal  6.72 GB         33.33%  0                     
>>                      
>> 192.168.11.199  Up     Normal  6.72 GB         33.33%  
>> 56713727820156410577229101238628035242      
>> 192.168.11.202  Up     Normal  6.68 GB         33.33%  
>> 113427455640312821154458202477256070484     
>> 
>> 
>> Hadoop's sample output after ~20 minutes:
>> 
>> Kind % Complete      Num Tasks       Pending Running Complete        Killed  
>> Failed/Killed Task Attempts
>> map  15.49%          800             673     24      103             0       
>> 6 / 0
>> reduce       4.16%           1               0       1       0               
>> 0       0 / 0
>> 
>> What went wrong? It's always the same. The clients cannot reach the nodes 
>> anymore. 
>> 
>> java.lang.RuntimeException: work failed
>>      at 
>> com.zfabrik.hadoop.impl.HadoopProcessRunner.work(HadoopProcessRunner.java:109)
>>      at 
>> com.zfabrik.hadoop.impl.DelegatingMapper.run(DelegatingMapper.java:40)
>>      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:625)
>>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>      at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.reflect.InvocationTargetException
>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>      at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>      at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>      at 
>> com.zfabrik.hadoop.impl.HadoopProcessRunner.work(HadoopProcessRunner.java:107)
>>      ... 4 more
>> Caused by: java.lang.RuntimeException: 
>> me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be 
>> enough replicas present to handle consistency level.
>>      at 
>> com.zfabrik.hadoop.impl.DelegatingMapper$1.run(DelegatingMapper.java:47)
>>      at com.zfabrik.work.WorkUnit.work(WorkUnit.java:342)
>>      at 
>> com.zfabrik.impl.launch.ProcessRunnerImpl.work(ProcessRunnerImpl.java:189)
>>      ... 9 more
>> Caused by: me.prettyprint.hector.api.exceptions.HUnavailableException: : May 
>> not be enough replicas present to handle consistency level.
>>      at 
>> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
>>      at 
>> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)
>>      at 
>> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:88)
>>      at 
>> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
>>      at 
>> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:221)
>>      at 
>> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129)
>>      at 
>> me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:100)
>>      at 
>> me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:106)
>>      at 
>> me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java:203)
>>      at 
>> me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java:200)
>>      at 
>> me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
>>      at 
>> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
>>      at 
>> me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:200)
>>      at 
>> sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper._flush(HectorBasedMassItemGenMapper.java:122)
>>      at 
>> sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper.map(HectorBasedMassItemGenMapper.java:103)
>>      at 
>> sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper.map(HectorBasedMassItemGenMapper.java:1)
>>      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>      at 
>> com.zfabrik.hadoop.impl.DelegatingMapper$1.run(DelegatingMapper.java:45)
>>      ... 11 more
>> Caused by: UnavailableException()
>>      at 
>> org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:16485)
>>      at 
>> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:916)
>>      at 
>> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:890)
>>      at 
>> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:93)
>>      ... 27 more
>> 
>> -------
>> Task attempt_201104291345_0001_m_000028_0 failed to report status for 602 
>> seconds. Killing!
>> 
>> 
>> I also observed also that when connecting to cassandra-cli during the stress 
>> test, it was not possible to list the items written so far:
>> 
>> [default@unknown] use ItemRepo;
>> Authenticated to keyspace: ItemRepo
>> [default@ItemRepo] list Items;
>> Using default limit of 100
>> Internal error processing get_range_slices
>> 
>> 
>> It seems to me that from the point I performed the read operation in the cli 
>> tool, the node becomes somehow confused.
>> Looking on the jconsole shows that up to this point the heap is well: it 
>> grows and gcs clears it again. 
>> But from this point on, gcs doesn't really help anymore (see attached 
>> screenshot).
>> 
>> 
>> <Bildschirmfoto 2011-04-29 um 18.30.14.png>
>> 
>> 
>> This has also impact on the other nodes as one can see in the second 
>> screenshot. The CPU Usage goes down as well as the heap memory usage.  
>> <Bildschirmfoto 2011-04-29 um 18.36.34.png>
>> 
>> I'll run another stress test at the weekend with
>> 
>> * MAX_HEAP_SIZE="4G"
>> * HEAP_NEWSIZE="400M"
>> 
>> Best Regards
>> Udo
>> 
>

Re: Experiences with Map&Reduce Stress Tests

Reply via email to