Hi,

I want to know how to investigate the count(*) query error on Hive 3.1.2 &
Tez 0.9.2, which is 'being failed for too many output errors' in the Mapper.

The query is just simple like "select count(*) from MY_DB.ORC_TABLE where
part_date='2020-06-30';" where ORC files of MY_DB.ORC_TABLE are bucketed.

But the query for the same table (ORC files) is running normally on Hive
2.3.2 & Tez 0.9.1.


The error on Hive 3.1.2 is like below;

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING
PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 ..         container       RUNNING     75         22        0
   53      38       0
Reducer 2        container        INITED      1          0        0
    1       0       0
----------------------------------------------------------------------------------------------
VERTICES: 00/02  [=======>>-------------------] 28%   ELAPSED TIME: 10.44 s
----------------------------------------------------------------------------------------------
20/07/01 15:36:49 ERROR SessionState: Status: Failed
20/07/01 15:36:49 ERROR SessionState: Vertex failed, vertexName=Map 1,
vertexId=vertex_1591769205146_436476_1_00, diagnostics=[Task failed,
taskId=task_1591769205146_436476_1_00_000055, diagnostics=[TaskAttempt
0 failed, info=[attempt_1591769205146_436476_1_00_000055_0 being
failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 1 failed, info=[attempt_1591769205146_436476_1_00_000055_1
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 2 failed, info=[attempt_1591769205146_436476_1_00_000055_2
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 3 failed, info=[attempt_1591769205146_436476_1_00_000055_3
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]],
Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
killedTasks:52, Vertex vertex_1591769205146_436476_1_00 [Map 1]
killed/failed due to:OWN_TASK_FAILURE]
20/07/01 15:36:49 ERROR SessionState: Vertex killed,
vertexName=Reducer 2, vertexId=vertex_1591769205146_436476_1_01,
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did
not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1,
Vertex vertex_1591769205146_436476_1_01 [Reducer 2] killed/failed due
to:OTHER_VERTEX_FAILURE]
20/07/01 15:36:49 ERROR SessionState: DAG did not succeed due to
VERTEX_FAILURE. failedVertices:1 killedVertices:1
20/07/01 15:36:49 ERROR ql.Driver: FAILED: Execution Error, return
code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Map 1, vertexId=vertex_1591769205146_436476_1_00,
diagnostics=[Task failed,
taskId=task_1591769205146_436476_1_00_000055, diagnostics=[TaskAttempt
0 failed, info=[attempt_1591769205146_436476_1_00_000055_0 being
failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 1 failed, info=[attempt_1591769205146_436476_1_00_000055_1
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 2 failed, info=[attempt_1591769205146_436476_1_00_000055_2
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 3 failed, info=[attempt_1591769205146_436476_1_00_000055_3
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]],
Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
killedTasks:52, Vertex vertex_1591769205146_436476_1_00 [Map 1]
killed/failed due to:OWN_TASK_FAILURE]Vertex killed,
vertexName=Reducer 2, vertexId=vertex_1591769205146_436476_1_01,
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did
not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1,
Vertex vertex_1591769205146_436476_1_01 [Reducer 2] killed/failed due
to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE.
failedVertices:1 killedVertices:1
20/07/01 15:36:49 ERROR operation.Operation: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing
statement: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Map 1, vertexId=vertex_1591769205146_436476_1_00,
diagnostics=[Task failed,
taskId=task_1591769205146_436476_1_00_000055, diagnostics=[TaskAttempt
0 failed, info=[attempt_1591769205146_436476_1_00_000055_0 being
failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 1 failed, info=[attempt_1591769205146_436476_1_00_000055_1
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 2 failed, info=[attempt_1591769205146_436476_1_00_000055_2
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 3 failed, info=[attempt_1591769205146_436476_1_00_000055_3
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]],
Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
killedTasks:52, Vertex vertex_1591769205146_436476_1_00 [Map 1]
killed/failed due to:OWN_TASK_FAILURE]Vertex killed,
vertexName=Reducer 2, vertexId=vertex_1591769205146_436476_1_01,
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did
not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1,
Vertex vertex_1591769205146_436476_1_01 [Reducer 2] killed/failed due
to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE.
failedVertices:1 killedVertices:1
        at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
~[hive-service-3.1.2.jar:3.1.2]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226)
~[hive-service-3.1.2.jar:3.1.2]
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
~[hive-service-3.1.2.jar:3.1.2]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)
~[hive-service-3.1.2.jar:3.1.2]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_131]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
~[hadoop-common-3.1.1.3.1.2-14.jar:?]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329)
~[hive-service-3.1.2.jar:3.1.2]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_131]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_131]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[?:1.8.0_131]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Vertex
failed, vertexName=Map 1, vertexId=vertex_1591769205146_436476_1_00,
diagnostics=[Task failed,
taskId=task_1591769205146_436476_1_00_000055, diagnostics=[TaskAttempt
0 failed, info=[attempt_1591769205146_436476_1_00_000055_0 being
failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 1 failed, info=[attempt_1591769205146_436476_1_00_000055_1
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 2 failed, info=[attempt_1591769205146_436476_1_00_000055_2
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
TaskAttempt 3 failed, info=[attempt_1591769205146_436476_1_00_000055_3
being failed for too many output errors. failureFraction=1.0,
MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
MAX_ALLOWED_OUTPUT_FAILURES=10,
MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]],
Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
killedTasks:52, Vertex vertex_1591769205146_436476_1_00 [Map 1]
killed/failed due to:OWN_TASK_FAILURE]Vertex killed,
vertexName=Reducer 2, vertexId=vertex_1591769205146_436476_1_01,
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did
not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1,
Vertex vertex_1591769205146_436476_1_01 [Reducer 2] killed/failed due
to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE.
failedVertices:1 killedVertices:1
        at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:236)
~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
~[hive-exec-3.1.2.jar:3.1.2]
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
~[hive-exec-3.1.2.jar:3.1.2]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
~[hive-exec-3.1.2.jar:3.1.2]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)
~[hive-service-3.1.2.jar:3.1.2]
        ... 11 more


The plan of Hive 3.1.2 is

+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Plan optimized by CBO.                             |
|                                                    |
| Vertex dependency in root stage                    |
| Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)            |
|                                                    |
| Stage-0                                            |
|   Fetch Operator                                   |
|     limit:-1                                       |
|     Stage-1                                        |
|       Reducer 2                                    |
|       File Output Operator [FS_7]                  |
|         Group By Operator [GBY_5] (rows=1 width=8) |
|           Output:["_col0"],aggregations:["count(VALUE._col0)"] |
|         <-Map 1 [CUSTOM_SIMPLE_EDGE]               |
|           PARTITION_ONLY_SHUFFLE [RS_4]            |
|             Group By Operator [GBY_3] (rows=1 width=8) |
|               Output:["_col0"],aggregations:["count()"] |
|               Select Operator [SEL_2] (rows=44050597 width=4160) |
|                 TableScan [TS_0] (rows=44050597 width=4160) |
|                   MY_DB@ORC_TABLE,ORC_TABLE,Tbl:COMPLETE,Col:NONE |
|                                                    |
+----------------------------------------------------+


And one of Hive 2.3.2 is

+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Plan optimized by CBO.                             |
|                                                    |
| Vertex dependency in root stage                    |
| Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)            |
|                                                    |
| Stage-0                                            |
|   Fetch Operator                                   |
|     limit:-1                                       |
|     Stage-1                                        |
|       Reducer 2                                    |
|       File Output Operator [FS_7]                  |
|         Group By Operator [GBY_5] (rows=1 width=8) |
|           Output:["_col0"],aggregations:["count(VALUE._col0)"] |
|         <-Map 1 [CUSTOM_SIMPLE_EDGE]               |
|           PARTITION_ONLY_SHUFFLE [RS_4]            |
|             Group By Operator [GBY_3] (rows=1 width=8) |
|               Output:["_col0"],aggregations:["count()"] |
|               Select Operator [SEL_2] (rows=1 width=18325049344) |
|                 TableScan [TS_0] (rows=1 width=18325049344) |
|                   MY_DB@ORC_TABLE,ORC_TABLE,Tbl:PARTIAL,Col:NONE |
|                                                    |
+----------------------------------------------------+


Best regards,
Eugene Chung (Korean : 정의근)

Reply via email to