[ https://issues.apache.org/jira/browse/HIVE-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591369#comment-15591369 ]
Alexandre Linte commented on HIVE-15017: ---------------------------------------- Hi [~sershe], The "yarn logs" command doesn't return the logs as you can see below. {noformat} [root@namenode01 ~]# yarn logs -applicationId application_1475850791417_0105 /Products/YARN/logs/hdfs/logs/application_1475850791417_0105 does not exist. Log aggregation has not completed or is not enabled. {noformat} So I decided to dig into the logs manually. I found interesting things on both datanode05 and datanode06. The error "255" appears regularly, I think this is the cause of the container crash. I uploaded the relevant part of the logs. > Random job failures with MapReduce and Tez > ------------------------------------------ > > Key: HIVE-15017 > URL: https://issues.apache.org/jira/browse/HIVE-15017 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 2.1.0 > Environment: Hadoop 2.7.2, Hive 2.1.0 > Reporter: Alexandre Linte > Priority: Critical > Attachments: hive_cli_mr.txt, hive_cli_tez.txt, > nodemanager_logs_mr_job.txt, yarn_syslog_mr_job.txt, yarn_syslog_tez_job.txt > > > Since Hive 2.1.0, we are facing a blocking issue on our cluster. All the jobs > are failing randomly on mapreduce and tez as well. > In both case, we don't have any ERROR or WARN message in the logs. You can > find attached: > - hive cli output errors > - yarn logs for a tez and mapreduce job > - nodemanager logs (mr only, we have the same logs with tez) > Note: This issue doesn't exist with Pig jobs (mr + tez), Spark jobs (mr), so > this cannot be an Hadoop / Yarn issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)