Good luck Share your results with us Daniel
> On 24 בנוב׳ 2014, at 19:36, Amit Behera <amit.bd...@gmail.com> wrote: > > Hi Daniel, > > Thanks a lot, > > > I will do that and rerun the query. :) > >> On Mon, Nov 24, 2014 at 10:59 PM, Daniel Haviv >> <daniel.ha...@veracity-group.com> wrote: >> It is a problem as the application master needs to contact the other nodes >> >> Try updating the hosts file on all the machines and try again. >> >> Daniel >> >>> On 24 בנוב׳ 2014, at 19:26, Amit Behera <amit.bd...@gmail.com> wrote: >>> >>> I did not modify in all the slaves. except slave >>> >>> will it be a problem ? >>> >>> But for small data (up to 20 GB table) it is running and for 300GB table >>> only count(*) running sometimes and sometimes failed >>> >>> Thanks >>> Amit >>> >>>> On Mon, Nov 24, 2014 at 10:37 PM, Daniel Haviv >>>> <daniel.ha...@veracity-group.com> wrote: >>>> did you copy the hosts file to all the nodes? >>>> >>>> Daniel >>>> >>>>> On 24 בנוב׳ 2014, at 19:04, Amit Behera <amit.bd...@gmail.com> wrote: >>>>> >>>>> hi Daniel, >>>>> >>>>> >>>>> this stacktrace same for other query . >>>>> for different run I am getting slave7 sometime slave8... >>>>> >>>>> And also I registered all machine IPs in /etc/hosts >>>>> >>>>> Regards >>>>> Amit >>>>> >>>>> >>>>> >>>>>> On Mon, Nov 24, 2014 at 10:22 PM, Daniel Haviv >>>>>> <daniel.ha...@veracity-group.com> wrote: >>>>>> It seems that the application master can't resolve slave6's name to an IP >>>>>> >>>>>> Daniel >>>>>> >>>>>>> On 24 בנוב׳ 2014, at 18:49, Amit Behera <amit.bd...@gmail.com> wrote: >>>>>>> >>>>>>> Hi Users, >>>>>>> >>>>>>> my cluster(1+8) configuration: >>>>>>> >>>>>>> RAM : 32 GB each >>>>>>> HDFS : 1.5 TB SSD >>>>>>> CPU : 8 core each >>>>>>> >>>>>>> ----------------------------------------------- >>>>>>> >>>>>>> I am trying to query on 300GB of table but I am able to run only select >>>>>>> query. >>>>>>> >>>>>>> Except select query , for all other query I am getting following >>>>>>> exception. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Total jobs = 1 >>>>>>> Stage-1 is selected by condition resolver. >>>>>>> Launching Job 1 out of 1 >>>>>>> Number of reduce tasks not specified. Estimated >>>>>>> from input data size: 183 >>>>>>> In order to change the average load for a >>>>>>> reducer (in bytes): >>>>>>> set >>>>>>> hive.exec.reducers.bytes.per.reducer=<number> >>>>>>> In order to limit the maximum number of >>>>>>> reducers: >>>>>>> set hive.exec.reducers.max=<number> >>>>>>> In order to set a constant number of reducers: >>>>>>> set mapreduce.job.reduces=<number> >>>>>>> Starting Job = job_1416831990090_0005, Tracking >>>>>>> URL = http://master:8088/proxy/application_1416831990090_0005/ >>>>>>> Kill Command = /root/hadoop/bin/hadoop job >>>>>>> -kill job_1416831990090_0005 >>>>>>> Hadoop job information for Stage-1: number of >>>>>>> mappers: 679; number of reducers: 183 >>>>>>> 2014-11-24 19:43:01,523 Stage-1 map = 0%, >>>>>>> reduce = 0% >>>>>>> 2014-11-24 19:43:22,730 Stage-1 map = 53%, >>>>>>> reduce = 0%, Cumulative CPU 625.19 sec >>>>>>> 2014-11-24 19:43:23,778 Stage-1 map = 100%, >>>>>>> reduce = 100% >>>>>>> MapReduce Total cumulative CPU time: 10 minutes >>>>>>> 25 seconds 190 msec >>>>>>> Ended Job = job_1416831990090_0005 with errors >>>>>>> Error during job, obtaining debugging >>>>>>> information... >>>>>>> Examining task ID: >>>>>>> task_1416831990090_0005_m_000005 (and more) from job >>>>>>> job_1416831990090_0005 >>>>>>> Examining task ID: >>>>>>> task_1416831990090_0005_m_000042 (and more) from job >>>>>>> job_1416831990090_0005 >>>>>>> Examining task ID: >>>>>>> task_1416831990090_0005_m_000035 (and more) from job >>>>>>> job_1416831990090_0005 >>>>>>> Examining task ID: >>>>>>> task_1416831990090_0005_m_000065 (and more) from job >>>>>>> job_1416831990090_0005 >>>>>>> Examining task ID: >>>>>>> task_1416831990090_0005_m_000002 (and more) from job >>>>>>> job_1416831990090_0005 >>>>>>> Examining task ID: >>>>>>> task_1416831990090_0005_m_000007 (and more) from job >>>>>>> job_1416831990090_0005 >>>>>>> Examining task ID: >>>>>>> task_1416831990090_0005_m_000058 (and more) from job >>>>>>> job_1416831990090_0005 >>>>>>> Examining task ID: >>>>>>> task_1416831990090_0005_m_000043 (and more) from job >>>>>>> job_1416831990090_0005 >>>>>>> >>>>>>> >>>>>>> Task with the most failures(4): >>>>>>> ----- >>>>>>> Task ID: >>>>>>> task_1416831990090_0005_m_000005 >>>>>>> >>>>>>> >>>>>>> URL: >>>>>>> >>>>>>> http://master:8088/taskdetails.jsp?jobid=job_1416831990090_0005&tipid=task_1416831990090_0005_m_000005 >>>>>>> ----- >>>>>>> Diagnostic Messages for this Task: >>>>>>> Container launch failed for >>>>>>> container_1416831990090_0005_01_000112 : >>>>>>> java.lang.IllegalArgumentException: java.net.UnknownHostException: >>>>>>> slave6 >>>>>>> at >>>>>>> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) >>>>>>> at >>>>>>> org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:397) >>>>>>> at >>>>>>> org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:233) >>>>>>> at >>>>>>> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:211) >>>>>>> at >>>>>>> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:189) >>>>>>> at >>>>>>> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:110) >>>>>>> at >>>>>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) >>>>>>> at >>>>>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) >>>>>>> at >>>>>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) >>>>>>> at >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>> at >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>> Caused by: java.net.UnknownHostException: slave6 >>>>>>> ... 12 more >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> FAILED: Execution Error, return code 2 from >>>>>>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask >>>>>>> MapReduce Jobs Launched: >>>>>>> Job 0: Map: 679 Reduce: 183 Cumulative CPU: >>>>>>> 625.19 sec HDFS Read: 0 HDFS Write: 0 FAIL >>>>>>> Total MapReduce CPU Time Spent: 10 minutes 25 >>>>>>> seconds 190 mse >>>>>>> >>>>>>> >>>>>>> >>>>>>> Please help me to fix the issue. >>>>>>> >>>>>>> Thanks >>>>>>> Amit >