Thanks for the reply, Zoltan.

I found the error from the reducer task attempt log exactly like below;

https://issues.apache.org/jira/plugins/servlet/mobile#issue/TEZ-4071
(
https://issues.apache.org/jira/plugins/servlet/mobile#issue/TEZ-3894 )

They say the error is resolved at Tez 0.9.2 but I think it’s not.


Regards,
Eugene

On Thu, Jul 2, 2020 at 5:59 Zoltan Haindrich <k...@rxd.hu> wrote:

> Hey Eugene!
>
> I don't see any hints in these outputs what could be the issue...have you
> checked the tez container logs?
>
> cheers,
> Zoltan
>
>
> On 7/1/20 9:58 AM, Eugene Chung wrote:
> > Hi,
> >
> > I want to know how to investigate the count(*) query error on Hive 3.1.2
> & Tez 0.9.2, which is 'being failed for too many output errors' in the
> Mapper.
> >
> > The query is just simple like "select count(*) from MY_DB.ORC_TABLE
> where part_date='2020-06-30';" where ORC files of MY_DB.ORC_TABLE are
> bucketed.
> >
> > But the query for the same table (ORC files) is running normally on Hive
> 2.3.2 & Tez 0.9.1.
> >
> >
> > The error on Hive 3.1.2 is like below;
> >
> >
> |----------------------------------------------------------------------------------------------
> VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
> >
> ----------------------------------------------------------------------------------------------
> Map 1 .. container RUNNING 75 22 0 53 38 0 Reducer 2 container INITED 1 0 0
> 1
> > 0 0
> ----------------------------------------------------------------------------------------------
> VERTICES: 00/02 [=======>>-------------------] 28% ELAPSED TIME: 10.44 s
> >
> ----------------------------------------------------------------------------------------------
> 20/07/01 15:36:49 ERROR SessionState: Status: Failed 20/07/01 15:36:49
> ERROR
> > SessionState: Vertex failed, vertexName=Map 1,
> vertexId=vertex_1591769205146_436476_1_00, diagnostics=[Task failed,
> taskId=task_1591769205146_436476_1_00_000055,
> > diagnostics=[TaskAttempt 0 failed,
> info=[attempt_1591769205146_436476_1_00_000055_0 being failed for too many
> output errors. failureFraction=1.0,
> > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
> MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> > TaskAttempt 1 failed, info=[attempt_1591769205146_436476_1_00_000055_1
> being failed for too many output errors. failureFraction=1.0,
> > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
> MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> > TaskAttempt 2 failed, info=[attempt_1591769205146_436476_1_00_000055_2
> being failed for too many output errors. failureFraction=1.0,
> > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
> MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> > TaskAttempt 3 failed, info=[attempt_1591769205146_436476_1_00_000055_3
> being failed for too many output errors. failureFraction=1.0,
> > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
> MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]],
> > Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
> killedTasks:52, Vertex vertex_1591769205146_436476_1_00 [Map 1]
> killed/failed due to:OWN_TASK_FAILURE]
> > 20/07/01 15:36:49 ERROR SessionState: Vertex killed, vertexName=Reducer
> 2, vertexId=vertex_1591769205146_436476_1_01, diagnostics=[Vertex received
> Kill while in RUNNING
> > state., Vertex did not succeed due to OTHER_VERTEX_FAILURE,
> failedTasks:0 killedTasks:1, Vertex vertex_1591769205146_436476_1_01
> [Reducer 2] killed/failed due
> > to:OTHER_VERTEX_FAILURE] 20/07/01 15:36:49 ERROR SessionState: DAG did
> not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
> 20/07/01 15:36:49 ERROR
> > ql.Driver: FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map
> 1,
> > vertexId=vertex_1591769205146_436476_1_00, diagnostics=[Task failed,
> taskId=task_1591769205146_436476_1_00_000055, diagnostics=[TaskAttempt 0
> failed,
> > info=[attempt_1591769205146_436476_1_00_000055_0 being failed for too
> many output errors. failureFraction=1.0,
> MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1,
> > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> TaskAttempt 1 failed,
> > info=[attempt_1591769205146_436476_1_00_000055_1 being failed for too
> many output errors. failureFraction=1.0,
> MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1,
> > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> TaskAttempt 2 failed,
> > info=[attempt_1591769205146_436476_1_00_000055_2 being failed for too
> many output errors. failureFraction=1.0,
> MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1,
> > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> TaskAttempt 3 failed,
> > info=[attempt_1591769205146_436476_1_00_000055_3 being failed for too
> many output errors. failureFraction=1.0,
> MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1,
> > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]], Vertex
> did not succeed due to
> > OWN_TASK_FAILURE, failedTasks:1 killedTasks:52, Vertex
> vertex_1591769205146_436476_1_00 [Map 1] killed/failed due
> to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2,
> > vertexId=vertex_1591769205146_436476_1_01, diagnostics=[Vertex received
> Kill while in RUNNING state., Vertex did not succeed due to
> OTHER_VERTEX_FAILURE, failedTasks:0
> > killedTasks:1, Vertex vertex_1591769205146_436476_1_01 [Reducer 2]
> killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to
> VERTEX_FAILURE. failedVertices:1
> > killedVertices:1 20/07/01 15:36:49 ERROR operation.Operation: Error
> running hive query: org.apache.hive.service.cli.HiveSQLException: Error
> while processing statement:
> > FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map
> 1, vertexId=vertex_1591769205146_436476_1_00,
> > diagnostics=[Task failed, taskId=task_1591769205146_436476_1_00_000055,
> diagnostics=[TaskAttempt 0 failed,
> info=[attempt_1591769205146_436476_1_00_000055_0 being failed for
> > too many output errors. failureFraction=1.0,
> MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
> MAX_ALLOWED_OUTPUT_FAILURES=10,
> > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> TaskAttempt 1 failed, info=[attempt_1591769205146_436476_1_00_000055_1
> being failed for too many output
> > errors. failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1,
> uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10,
> > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> TaskAttempt 2 failed, info=[attempt_1591769205146_436476_1_00_000055_2
> being failed for too many output
> > errors. failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1,
> uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10,
> > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> TaskAttempt 3 failed, info=[attempt_1591769205146_436476_1_00_000055_3
> being failed for too many output
> > errors. failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1,
> uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10,
> > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]],
> Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
> killedTasks:52, Vertex
> > vertex_1591769205146_436476_1_00 [Map 1] killed/failed due
> to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2,
> vertexId=vertex_1591769205146_436476_1_01,
> > diagnostics=[Vertex received Kill while in RUNNING state., Vertex did
> not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1,
> Vertex
> > vertex_1591769205146_436476_1_01 [Reducer 2] killed/failed due
> to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE.
> failedVertices:1 killedVertices:1 at
> >
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
> ~[hive-service-3.1.2.jar:3.1.2] at
> >
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226)
> ~[hive-service-3.1.2.jar:3.1.2] at
> >
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> ~[hive-service-3.1.2.jar:3.1.2] at
> >
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)
> ~[hive-service-3.1.2.jar:3.1.2] at
> > java.security.AccessController.doPrivileged(Native Method)
> ~[?:1.8.0_131] at javax.security.auth.Subject.doAs(Subject.java:422)
> ~[?:1.8.0_131] at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> ~[hadoop-common-3.1.1.3.1.2-14.jar:?] at
> >
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329)
> ~[hive-service-3.1.2.jar:3.1.2] at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_131] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_131] at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[?:1.8.0_131] at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[?:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> Caused by:
> > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed,
> vertexName=Map 1, vertexId=vertex_1591769205146_436476_1_00,
> diagnostics=[Task failed,
> > taskId=task_1591769205146_436476_1_00_000055, diagnostics=[TaskAttempt 0
> failed, info=[attempt_1591769205146_436476_1_00_000055_0 being failed for
> too many output errors.
> > failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1,
> uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300,
> > readErrorTimespan=0], TaskAttempt 1 failed,
> info=[attempt_1591769205146_436476_1_00_000055_1 being failed for too many
> output errors. failureFraction=1.0,
> > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
> MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> > TaskAttempt 2 failed, info=[attempt_1591769205146_436476_1_00_000055_2
> being failed for too many output errors. failureFraction=1.0,
> > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
> MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0],
> > TaskAttempt 3 failed, info=[attempt_1591769205146_436476_1_00_000055_3
> being failed for too many output errors. failureFraction=1.0,
> > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1,
> MAX_ALLOWED_OUTPUT_FAILURES=10,
> MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]],
> > Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
> killedTasks:52, Vertex vertex_1591769205146_436476_1_00 [Map 1]
> killed/failed due to:OWN_TASK_FAILURE]Vertex
> > killed, vertexName=Reducer 2, vertexId=vertex_1591769205146_436476_1_01,
> diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not
> succeed due to
> > OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex
> vertex_1591769205146_436476_1_01 [Reducer 2] killed/failed due
> to:OTHER_VERTEX_FAILURE]DAG did not succeed due to
> > VERTEX_FAILURE. failedVertices:1 killedVertices:1 at
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:236)
> ~[hive-exec-3.1.2.jar:3.1.2] at
> > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> ~[hive-exec-3.1.2.jar:3.1.2] at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> > ~[hive-exec-3.1.2.jar:3.1.2] at
> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
> ~[hive-exec-3.1.2.jar:3.1.2] at
> > org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
> ~[hive-exec-3.1.2.jar:3.1.2] at
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
> > ~[hive-exec-3.1.2.jar:3.1.2] at
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
> ~[hive-exec-3.1.2.jar:3.1.2] at
> > org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
> ~[hive-exec-3.1.2.jar:3.1.2] at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
> > ~[hive-exec-3.1.2.jar:3.1.2] at
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)
> ~[hive-service-3.1.2.jar:3.1.2] ... 11 more|
> >
> >
> > The plan of Hive 3.1.2 is
> >
> > |+----------------------------------------------------+ | Explain |
> +----------------------------------------------------+ | Plan optimized by
> CBO. | | | | Vertex
> > dependency in root stage | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) | |
> | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 | |
> File Output Operator
> > [FS_7] | | Group By Operator [GBY_5] (rows=1 width=8) | |
> Output:["_col0"],aggregations:["count(VALUE._col0)"] | | <-Map 1
> [CUSTOM_SIMPLE_EDGE] | | PARTITION_ONLY_SHUFFLE
> > [RS_4] | | Group By Operator [GBY_3] (rows=1 width=8) | |
> Output:["_col0"],aggregations:["count()"] | | Select Operator [SEL_2]
> (rows=44050597 width=4160) | | TableScan
> > [TS_0] (rows=44050597 width=4160) | | 
> > MY_DB@ORC_TABLE,ORC_TABLE,Tbl:COMPLETE,Col:NONE
> | | | +----------------------------------------------------+|
> >
> >
> > And one of Hive 2.3.2 is
> >
> > |+----------------------------------------------------+ | Explain |
> +----------------------------------------------------+ | Plan optimized by
> CBO. | | | | Vertex
> > dependency in root stage | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) | |
> | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 | |
> File Output Operator
> > [FS_7] | | Group By Operator [GBY_5] (rows=1 width=8) | |
> Output:["_col0"],aggregations:["count(VALUE._col0)"] | | <-Map 1
> [CUSTOM_SIMPLE_EDGE] | | PARTITION_ONLY_SHUFFLE
> > [RS_4] | | Group By Operator [GBY_3] (rows=1 width=8) | |
> Output:["_col0"],aggregations:["count()"] | | Select Operator [SEL_2]
> (rows=1 width=18325049344) | | TableScan
> > [TS_0] (rows=1 width=18325049344) | | 
> > MY_DB@ORC_TABLE,ORC_TABLE,Tbl:PARTIAL,Col:NONE
> | | | +----------------------------------------------------+|
> >
> >
> > Best regards,
> > Eugene Chung (Korean : 정의근)
>
-- 
Best regards,
Eugene Chung (Korean : 정의근)

Reply via email to