Re: Cassandra + Hadoop - 2 Task attempts with million of rows

aaron morton Thu, 25 Apr 2013 19:04:25 -0700

> 2013-04-23 16:09:17,838 INFO 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: 
> Current split being processed ColumnFamilySplit((9197470410121435301, '-1] 
> @[p00nosql02.00, p00nosql01.00])
> Why it's split data from two nodes? we have 6 nodes cassandra cluster + 
> hadoop slaves -  every task should get local input split from local cassandra 
> - am i right? 
My understanding is that it may get it locally, but it's not something that has 
to happen. Once of the Hadoop guys will have a better idea.


Try reducing the cassandra.range.batch.size and/or if you are using wide rows 
enable cassandra.input.widerows

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/04/2013, at 7:55 PM, Shamim <sre...@yandex.ru> wrote:

> Hello Aaron,
>  I have got the following Log from the server (Sorry for being late)
> 
> job_201304231203_0004
>       attempt_201304231203_0004_m_000501_0
> 
>       2013-04-23 16:09:14,196 INFO org.apache.hadoop.util.NativeCodeLoader: 
> Loaded the native-hadoop library
> 2013-04-23 16:09:14,438 INFO 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/pigContext
>  <- 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/pigContext
> 2013-04-23 16:09:14,453 INFO 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/dk
>  <- 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/dk
> 2013-04-23 16:09:14,456 INFO 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/META-INF
>  <- 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/META-INF
> 2013-04-23 16:09:14,459 INFO 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/org
>  <- 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/org
> 2013-04-23 16:09:14,469 INFO 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/com
>  <- 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/com
> 2013-04-23 16:09:14,471 INFO 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/.job.jar.crc
>  <- 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/.job.jar.crc
> 2013-04-23 16:09:14,474 INFO 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/job.jar
>  <- 
> /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/job.jar
> 2013-04-23 16:09:17,329 INFO org.apache.hadoop.util.ProcessTree: setsid 
> exited with exit code 0
> 2013-04-23 16:09:17,387 INFO org.apache.hadoop.mapred.Task:  Using 
> ResourceCalculatorPlugin : 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@256ef705
> 2013-04-23 16:09:17,838 INFO 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: 
> Current split being processed ColumnFamilySplit((9197470410121435301, '-1] 
> @[p00nosql02.00, p00nosql01.00])
> 2013-04-23 16:09:18,088 INFO org.apache.pig.data.SchemaTupleBackend: Key 
> [pig.schematuple] was not set... will not generate code.
> 2013-04-23 16:09:19,784 INFO 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: 
> Aliases being processed per job phase (AliasName[line,offset]): M: 
> data[12,7],null[-1,-1],filtered[14,11],null[-1,-1],c1[23,5],null[-1,-1],updated[111,10]
>  C:  R: 
> 2013-04-23 17:35:11,199 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-04-23 17:35:11,384 INFO org.apache.hadoop.io.nativeio.NativeIO: 
> Initialized cache for UID to User mapping with a cache timeout of 14400 
> seconds.
> 2013-04-23 17:35:11,385 INFO org.apache.hadoop.io.nativeio.NativeIO: Got 
> UserName cassandra for UID 500 from the native implementation
> 2013-04-23 17:35:11,417 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.lang.RuntimeException: TimedOutException()
>        at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
>        at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
>        at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
>        at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>        at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>        at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169)
>        at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514)
>        at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539)
>        at 
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: TimedOutException()
>        at 
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932)
>        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>        at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
>        at 
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
>        at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
>        ... 17 more
> 2013-04-23 17:35:11,427 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
> for the task
> 
> These Two tasks hanged for long time and crashes with timeout exception. Very 
> interesting part is as follows
> 2013-04-23 16:09:17,838 INFO 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: 
> Current split being processed ColumnFamilySplit((9197470410121435301, '-1] 
> @[p00nosql02.00, p00nosql01.00])
> Why it's split data from two nodes? we have 6 nodes cassandra cluster + 
> hadoop slaves -  every task should get local input split from local cassandra 
> - am i right? 
> 
> -- 
> Best regards
>   Shamim A.
> 
> 24.04.2013, 10:59, "Shamim" <sre...@yandex.ru>:
>> Hello Aron,
>> We have build up our new cluster from the scratch with version 1.2 - 
>> partition murmor3. We are not using vnodes at all.
>> Actually log is clean and nothing serious, now investigating logs and post 
>> soon if found something criminal
>> 
>>>>>  Our cluster is evenly partitioned (Murmur3Partitioner) > > 
>>>>> Murmor3Partitioner is only available in 1.2 and changing partitioners is 
>>>>> not supported. Did you change from Random Partitioner under 1.1? > > Are 
>>>>> you using virtual nodes in your 1.2 cluster ? > >>> We have roughly 
>>>>> 97million rows in our cluster. Why we are getting above behavior? Do you 
>>>>> have any suggestion or clue to trouble shoot in this issue? > > Can you 
>>>>> make some of the logs from the tasks available? > > Cheers > > --
>> 
>> --------------- > Aaron Morton > Freelance Cassandra Consultant > New 
>> Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 23/04/2013, 
>> at 5:50 AM, Shamim  wrote: > >> We are using Hadoop 1.0.3 and pig 0.11.1 
>> version >> >> -- >> Best regards >> Shamim A. >> >> 22.04.2013, 21:48, 
>> "Shamim" : >> >>> Hello all, >>> recently we have upgrade our cluster (6 
>> nodes) from cassandra version 1.1.6 to 1.2.1. Our cluster is evenly 
>> partitioned (Murmur3Partitioner). We are using pig for parse and compute 
>> aggregate data. >>> >>> When we submit job through pig, what i consistently 
>> see is that, while most of the task have 20-25k row assigned each (Map input 
>> records), only 2 of them (always 2 ) getting more than 2 million rows. This 
>> 2 tasks always complete 100% and hang for long time. Also most of the time 
>> we are getting killed task (2%) with TimeoutException. >>> >>> We increased 
>> rpc_timeout to 60000, also set cassandra.input.split.size=1024 but nothing 
>> help. >>> >>> We have roughly 97million rows in our cluster. Why we are 
>> getting above behavior? Do you have any suggestion or clue to trouble shoot 
>> in this issue? Any help will be highly thankful. Thankx in advance. >>> >>> 
>> -- >>> Best regards >>> Shamim A. -- Best regards
>>   Shamim A.

Re: Cassandra + Hadoop - 2 Task attempts with million of rows

Reply via email to