Thanks for the reply, Zoltan. I found the error from the reducer task attempt log exactly like below;
https://issues.apache.org/jira/plugins/servlet/mobile#issue/TEZ-4071 ( https://issues.apache.org/jira/plugins/servlet/mobile#issue/TEZ-3894 ) They say the error is resolved at Tez 0.9.2 but I think it’s not. Regards, Eugene On Thu, Jul 2, 2020 at 5:59 Zoltan Haindrich <k...@rxd.hu> wrote: > Hey Eugene! > > I don't see any hints in these outputs what could be the issue...have you > checked the tez container logs? > > cheers, > Zoltan > > > On 7/1/20 9:58 AM, Eugene Chung wrote: > > Hi, > > > > I want to know how to investigate the count(*) query error on Hive 3.1.2 > & Tez 0.9.2, which is 'being failed for too many output errors' in the > Mapper. > > > > The query is just simple like "select count(*) from MY_DB.ORC_TABLE > where part_date='2020-06-30';" where ORC files of MY_DB.ORC_TABLE are > bucketed. > > > > But the query for the same table (ORC files) is running normally on Hive > 2.3.2 & Tez 0.9.1. > > > > > > The error on Hive 3.1.2 is like below; > > > > > |---------------------------------------------------------------------------------------------- > VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > > > ---------------------------------------------------------------------------------------------- > Map 1 .. container RUNNING 75 22 0 53 38 0 Reducer 2 container INITED 1 0 0 > 1 > > 0 0 > ---------------------------------------------------------------------------------------------- > VERTICES: 00/02 [=======>>-------------------] 28% ELAPSED TIME: 10.44 s > > > ---------------------------------------------------------------------------------------------- > 20/07/01 15:36:49 ERROR SessionState: Status: Failed 20/07/01 15:36:49 > ERROR > > SessionState: Vertex failed, vertexName=Map 1, > vertexId=vertex_1591769205146_436476_1_00, diagnostics=[Task failed, > taskId=task_1591769205146_436476_1_00_000055, > > diagnostics=[TaskAttempt 0 failed, > info=[attempt_1591769205146_436476_1_00_000055_0 being failed for too many > output errors. failureFraction=1.0, > > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, > MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > > TaskAttempt 1 failed, info=[attempt_1591769205146_436476_1_00_000055_1 > being failed for too many output errors. failureFraction=1.0, > > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, > MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > > TaskAttempt 2 failed, info=[attempt_1591769205146_436476_1_00_000055_2 > being failed for too many output errors. failureFraction=1.0, > > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, > MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > > TaskAttempt 3 failed, info=[attempt_1591769205146_436476_1_00_000055_3 > being failed for too many output errors. failureFraction=1.0, > > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, > MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]], > > Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:52, Vertex vertex_1591769205146_436476_1_00 [Map 1] > killed/failed due to:OWN_TASK_FAILURE] > > 20/07/01 15:36:49 ERROR SessionState: Vertex killed, vertexName=Reducer > 2, vertexId=vertex_1591769205146_436476_1_01, diagnostics=[Vertex received > Kill while in RUNNING > > state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, > failedTasks:0 killedTasks:1, Vertex vertex_1591769205146_436476_1_01 > [Reducer 2] killed/failed due > > to:OTHER_VERTEX_FAILURE] 20/07/01 15:36:49 ERROR SessionState: DAG did > not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 > 20/07/01 15:36:49 ERROR > > ql.Driver: FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map > 1, > > vertexId=vertex_1591769205146_436476_1_00, diagnostics=[Task failed, > taskId=task_1591769205146_436476_1_00_000055, diagnostics=[TaskAttempt 0 > failed, > > info=[attempt_1591769205146_436476_1_00_000055_0 being failed for too > many output errors. failureFraction=1.0, > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, > > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > TaskAttempt 1 failed, > > info=[attempt_1591769205146_436476_1_00_000055_1 being failed for too > many output errors. failureFraction=1.0, > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, > > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > TaskAttempt 2 failed, > > info=[attempt_1591769205146_436476_1_00_000055_2 being failed for too > many output errors. failureFraction=1.0, > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, > > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > TaskAttempt 3 failed, > > info=[attempt_1591769205146_436476_1_00_000055_3 being failed for too > many output errors. failureFraction=1.0, > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, > > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]], Vertex > did not succeed due to > > OWN_TASK_FAILURE, failedTasks:1 killedTasks:52, Vertex > vertex_1591769205146_436476_1_00 [Map 1] killed/failed due > to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, > > vertexId=vertex_1591769205146_436476_1_01, diagnostics=[Vertex received > Kill while in RUNNING state., Vertex did not succeed due to > OTHER_VERTEX_FAILURE, failedTasks:0 > > killedTasks:1, Vertex vertex_1591769205146_436476_1_01 [Reducer 2] > killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to > VERTEX_FAILURE. failedVertices:1 > > killedVertices:1 20/07/01 15:36:49 ERROR operation.Operation: Error > running hive query: org.apache.hive.service.cli.HiveSQLException: Error > while processing statement: > > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map > 1, vertexId=vertex_1591769205146_436476_1_00, > > diagnostics=[Task failed, taskId=task_1591769205146_436476_1_00_000055, > diagnostics=[TaskAttempt 0 failed, > info=[attempt_1591769205146_436476_1_00_000055_0 being failed for > > too many output errors. failureFraction=1.0, > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, > MAX_ALLOWED_OUTPUT_FAILURES=10, > > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > TaskAttempt 1 failed, info=[attempt_1591769205146_436476_1_00_000055_1 > being failed for too many output > > errors. failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, > > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > TaskAttempt 2 failed, info=[attempt_1591769205146_436476_1_00_000055_2 > being failed for too many output > > errors. failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, > > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > TaskAttempt 3 failed, info=[attempt_1591769205146_436476_1_00_000055_3 > being failed for too many output > > errors. failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, > > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]], > Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:52, Vertex > > vertex_1591769205146_436476_1_00 [Map 1] killed/failed due > to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, > vertexId=vertex_1591769205146_436476_1_01, > > diagnostics=[Vertex received Kill while in RUNNING state., Vertex did > not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, > Vertex > > vertex_1591769205146_436476_1_01 [Reducer 2] killed/failed due > to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. > failedVertices:1 killedVertices:1 at > > > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335) > ~[hive-service-3.1.2.jar:3.1.2] at > > > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226) > ~[hive-service-3.1.2.jar:3.1.2] at > > > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > ~[hive-service-3.1.2.jar:3.1.2] at > > > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316) > ~[hive-service-3.1.2.jar:3.1.2] at > > java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_131] at javax.security.auth.Subject.doAs(Subject.java:422) > ~[?:1.8.0_131] at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > ~[hadoop-common-3.1.1.3.1.2-14.jar:?] at > > > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329) > ~[hive-service-3.1.2.jar:3.1.2] at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_131] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_131] at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[?:1.8.0_131] at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[?:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] > Caused by: > > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 1, vertexId=vertex_1591769205146_436476_1_00, > diagnostics=[Task failed, > > taskId=task_1591769205146_436476_1_00_000055, diagnostics=[TaskAttempt 0 > failed, info=[attempt_1591769205146_436476_1_00_000055_0 being failed for > too many output errors. > > failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, > uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, > > readErrorTimespan=0], TaskAttempt 1 failed, > info=[attempt_1591769205146_436476_1_00_000055_1 being failed for too many > output errors. failureFraction=1.0, > > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, > MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > > TaskAttempt 2 failed, info=[attempt_1591769205146_436476_1_00_000055_2 > being failed for too many output errors. failureFraction=1.0, > > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, > MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], > > TaskAttempt 3 failed, info=[attempt_1591769205146_436476_1_00_000055_3 > being failed for too many output errors. failureFraction=1.0, > > MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, > MAX_ALLOWED_OUTPUT_FAILURES=10, > MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]], > > Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:52, Vertex vertex_1591769205146_436476_1_00 [Map 1] > killed/failed due to:OWN_TASK_FAILURE]Vertex > > killed, vertexName=Reducer 2, vertexId=vertex_1591769205146_436476_1_01, > diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not > succeed due to > > OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex > vertex_1591769205146_436476_1_01 [Reducer 2] killed/failed due > to:OTHER_VERTEX_FAILURE]DAG did not succeed due to > > VERTEX_FAILURE. failedVertices:1 killedVertices:1 at > org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:236) > ~[hive-exec-3.1.2.jar:3.1.2] at > > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > ~[hive-exec-3.1.2.jar:3.1.2] at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > > ~[hive-exec-3.1.2.jar:3.1.2] at > org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) > ~[hive-exec-3.1.2.jar:3.1.2] at > > org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) > ~[hive-exec-3.1.2.jar:3.1.2] at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) > > ~[hive-exec-3.1.2.jar:3.1.2] at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) > ~[hive-exec-3.1.2.jar:3.1.2] at > > org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) > ~[hive-exec-3.1.2.jar:3.1.2] at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) > > ~[hive-exec-3.1.2.jar:3.1.2] at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224) > ~[hive-service-3.1.2.jar:3.1.2] ... 11 more| > > > > > > The plan of Hive 3.1.2 is > > > > |+----------------------------------------------------+ | Explain | > +----------------------------------------------------+ | Plan optimized by > CBO. | | | | Vertex > > dependency in root stage | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) | | > | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 | | > File Output Operator > > [FS_7] | | Group By Operator [GBY_5] (rows=1 width=8) | | > Output:["_col0"],aggregations:["count(VALUE._col0)"] | | <-Map 1 > [CUSTOM_SIMPLE_EDGE] | | PARTITION_ONLY_SHUFFLE > > [RS_4] | | Group By Operator [GBY_3] (rows=1 width=8) | | > Output:["_col0"],aggregations:["count()"] | | Select Operator [SEL_2] > (rows=44050597 width=4160) | | TableScan > > [TS_0] (rows=44050597 width=4160) | | > > MY_DB@ORC_TABLE,ORC_TABLE,Tbl:COMPLETE,Col:NONE > | | | +----------------------------------------------------+| > > > > > > And one of Hive 2.3.2 is > > > > |+----------------------------------------------------+ | Explain | > +----------------------------------------------------+ | Plan optimized by > CBO. | | | | Vertex > > dependency in root stage | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) | | > | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 | | > File Output Operator > > [FS_7] | | Group By Operator [GBY_5] (rows=1 width=8) | | > Output:["_col0"],aggregations:["count(VALUE._col0)"] | | <-Map 1 > [CUSTOM_SIMPLE_EDGE] | | PARTITION_ONLY_SHUFFLE > > [RS_4] | | Group By Operator [GBY_3] (rows=1 width=8) | | > Output:["_col0"],aggregations:["count()"] | | Select Operator [SEL_2] > (rows=1 width=18325049344) | | TableScan > > [TS_0] (rows=1 width=18325049344) | | > > MY_DB@ORC_TABLE,ORC_TABLE,Tbl:PARTIAL,Col:NONE > | | | +----------------------------------------------------+| > > > > > > Best regards, > > Eugene Chung (Korean : 정의근) > -- Best regards, Eugene Chung (Korean : 정의근)