[jira] [Commented] (HIVE-15017) Random job failures with MapReduce and Tez

Alexandre Linte (JIRA) Thu, 20 Oct 2016 02:49:13 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591369#comment-15591369
 ]


Alexandre Linte commented on HIVE-15017:
----------------------------------------

Hi [~sershe],
The "yarn logs" command doesn't return the logs as you can see below.
{noformat}
[root@namenode01 ~]# yarn logs -applicationId application_1475850791417_0105
/Products/YARN/logs/hdfs/logs/application_1475850791417_0105 does not exist.
Log aggregation has not completed or is not enabled.
{noformat}
So I decided to dig into the logs manually. I found interesting things on both 
datanode05 and datanode06. The error "255" appears regularly, I think this is 
the cause of the container crash.

I uploaded the relevant part of the logs.

> Random job failures with MapReduce and Tez
> ------------------------------------------
>
>                 Key: HIVE-15017
>                 URL: https://issues.apache.org/jira/browse/HIVE-15017
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.1.0
>         Environment: Hadoop 2.7.2, Hive 2.1.0
>            Reporter: Alexandre Linte
>            Priority: Critical
>         Attachments: hive_cli_mr.txt, hive_cli_tez.txt, 
> nodemanager_logs_mr_job.txt, yarn_syslog_mr_job.txt, yarn_syslog_tez_job.txt
>
>
> Since Hive 2.1.0, we are facing a blocking issue on our cluster. All the jobs 
> are failing randomly on mapreduce and tez as well. 
> In both case, we don't have any ERROR or WARN message in the logs. You can 
> find attached:
> - hive cli output errors 
> - yarn logs for a tez and mapreduce job
> - nodemanager logs (mr only, we have the same logs with tez)
> Note: This issue doesn't exist with Pig jobs (mr + tez), Spark jobs (mr), so 
> this cannot be an Hadoop / Yarn issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15017) Random job failures with MapReduce and Tez

Reply via email to