Philippe Verhaeghe created HIVE-10316: -----------------------------------------
Summary: same query works with TEXTFILE and fails with ORC Key: HIVE-10316 URL: https://issues.apache.org/jira/browse/HIVE-10316 Project: Hive Issue Type: Bug Components: Compression Affects Versions: 0.14.0 Environment: hortonworks HDP 2.2 running on Linux Reporter: Philippe Verhaeghe See also related answer in mailing list : http://mail-archives.apache.org/mod_mbox/hive-user/201504.mbox/%3CD15184D6.27779%25gopal%40hortonworks.com%3E I’m getting an error in Hive when executing a query on a table in ORC format. After several trials, I succeeded to run the same query on the same table in TEXTFILE format. I ‘ve been able to reproduce the error with the simple sql script below. I create the same table in TEXFILE and in ORC and I run a SELECT …GROUP BY on the tables. The first SELECT issued on the TEXTFILE table succeeds. The second SELECT issued on the ORC table fails. NB : There is a CONCAT in the query. If I remove the CONCAT the query is running ok with both tables … Example script to reproduce the error : USE pvr_temp; DROP TABLE IF EXISTS students_text; CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS TEXTFILE; INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32); SELECT CONCAT(TO_DATE(datetime), '-'), SUM(gpa) FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '-'); DROP TABLE IF EXISTS students_orc; CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC; INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13 SELECT CONCAT(TO_DATE(datetime), '-'), SUM(gpa) FROM students_orc GROUP BY CONCAT(TO_DATE(datetime), '-'); 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32); Log where you can see the error : [pvr@tpcalr01s ~]$ cat test.log scan complete in 9ms Connecting to jdbc:hive2://tpcrmm03s:10000 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Connected to: Apache Hive (version 0.14.0.2.2.0.0-2041) Driver: Hive JDBC (version 0.14.0.2.2.0.0-2041) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://tpcrmm03s:10000> USE pvr_temp; No rows affected (0.061 seconds) 0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_text; No rows affected (0.12 seconds) 0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS TEXTFILE; No rows affected (0.057 seconds) 0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32); INFO : Tez session hasn't been created yet. Opening session INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047) INFO : Map 1: -/- INFO : Map 1: 0/1 No rows affected (14.134 seconds) INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Loading data to table pvr_temp.students_text from hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-08_445_2811483497310651606-20/-ext-10000 INFO : Table pvr_temp.students_text stats: [numFiles=1, numRows=2, totalSize=86, rawDataSize=84] 0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), '-'), SUM(gpa) FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '-'); INFO : Session is already open INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047) INFO : Map 1: -/- Reducer 2: 0/1 INFO : Map 1: 0/1 Reducer 2: 0/1 INFO : Map 1: 0(+1)/1 Reducer 2: 0/1 INFO : Map 1: 1/1 Reducer 2: 0(+1)/1 INFO : Map 1: 1/1 Reducer 2: 1/1 +--------------+------+--+ | _c0 | _c1 | +--------------+------+--+ | 2015-04-13- | 3.6 | +--------------+------+--+ 1 row selected (3.258 seconds) 0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_orc; No rows affected (0.109 seconds) 0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC; No rows affected (0.063 seconds) 0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32); No rows affected (2.125 seconds) INFO : Session is already open INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047) INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Loading data to table pvr_temp.students_orc from hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-26_056_1247475009666467472-20/-ext-10000 INFO : Table pvr_temp.students_orc stats: [numFiles=1, numRows=2, totalSize=590, rawDataSize=508] 0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), '-'), SUM(gpa) FROM students_orc GROUP BY CONCAT(TO_DATE(datetime), '-'); INFO : Session is already open INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047) INFO : Map 1: -/- Reducer 2: 0/1 INFO : Map 1: 0(+1)/1 Reducer 2: 0/1 INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/1 INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/1 INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/1 INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/1 INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/1 INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/1 ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1428656093356_0047_4_00, diagnostics=[Task failed, taskId=task_1428656093356_0047_4_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139) at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201) ... 14 more ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139) at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201) ... 14 more ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139) at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201) ... 14 more ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139) at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201) ... 14 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1428656093356_0047_4_00 [Map 1] killed/failed due to:null] ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1428656093356_0047_4_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1428656093356_0047_4_01 [Reducer 2] killed/failed due to:null] ERROR : DAG failed due to vertex failure. failedVertices:1 killedVertices:1 Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=2) Closing: 0: jdbc:hive2://tpcrmm03s:10000 -- This message was sent by Atlassian JIRA (v6.3.4#6332)