[ https://issues.apache.org/jira/browse/HIVE-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493032#comment-14493032 ]
Sergey Shelukhin commented on HIVE-10316: ----------------------------------------- [~prasanth_j] [~owen.omalley] fyi > same query works with TEXTFILE and fails with ORC > ------------------------------------------------- > > Key: HIVE-10316 > URL: https://issues.apache.org/jira/browse/HIVE-10316 > Project: Hive > Issue Type: Bug > Components: Compression > Affects Versions: 0.14.0 > Environment: hortonworks HDP 2.2 running on Linux > Reporter: Philippe Verhaeghe > > See also related answer in mailing list : > http://mail-archives.apache.org/mod_mbox/hive-user/201504.mbox/%3CD15184D6.27779%25gopal%40hortonworks.com%3E > I’m getting an error in Hive when executing a query on a table in ORC format. > After several trials, I succeeded to run the same query on the same table in > TEXTFILE format. > I ‘ve been able to reproduce the error with the simple sql script below. > I create the same table in TEXFILE and in ORC and I run a SELECT …GROUP BY on > the tables. > The first SELECT issued on the TEXTFILE table succeeds. > The second SELECT issued on the ORC table fails. > NB : There is a CONCAT in the query. If I remove the CONCAT the query is > running ok with both tables … > Example script to reproduce the error : > USE pvr_temp; > DROP TABLE IF EXISTS students_text; > CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, > gpa DECIMAL(3, 2)) STORED AS TEXTFILE; > INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13 > 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32); > SELECT CONCAT(TO_DATE(datetime), '-'), SUM(gpa) FROM students_text GROUP > BY CONCAT(TO_DATE(datetime), '-'); > DROP TABLE IF EXISTS students_orc; > CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa > DECIMAL(3, 2)) STORED AS ORC; > INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13 > SELECT CONCAT(TO_DATE(datetime), '-'), SUM(gpa) FROM students_orc GROUP BY > CONCAT(TO_DATE(datetime), '-'); > 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32); > Log where you can see the error : > [pvr@tpcalr01s ~]$ cat test.log > scan complete in 9ms > Connecting to jdbc:hive2://tpcrmm03s:10000 > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Connected to: Apache Hive (version 0.14.0.2.2.0.0-2041) > Driver: Hive JDBC (version 0.14.0.2.2.0.0-2041) > Transaction isolation: TRANSACTION_REPEATABLE_READ > 0: jdbc:hive2://tpcrmm03s:10000> USE pvr_temp; > No rows affected (0.061 seconds) > 0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_text; > No rows affected (0.12 seconds) > 0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_text (name > VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS > TEXTFILE; > No rows affected (0.057 seconds) > 0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_text VALUES > ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, > '2015-04-13 13:40:00', 2.32); > INFO : Tez session hasn't been created yet. Opening session > INFO : > INFO : Status: Running (Executing on YARN cluster with App id > application_1428656093356_0047) > INFO : Map 1: -/- > INFO : Map 1: 0/1 > No rows affected (14.134 seconds) > INFO : Map 1: 0/1 > INFO : Map 1: 0(+1)/1 > INFO : Map 1: 0(+1)/1 > INFO : Map 1: 1/1 > INFO : Loading data to table pvr_temp.students_text from > hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-08_445_2811483497310651606-20/-ext-10000 > INFO : Table pvr_temp.students_text stats: [numFiles=1, numRows=2, > totalSize=86, rawDataSize=84] > 0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), '-'), > SUM(gpa) FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '-'); > INFO : Session is already open > INFO : > INFO : Status: Running (Executing on YARN cluster with App id > application_1428656093356_0047) > INFO : Map 1: -/- Reducer 2: 0/1 > INFO : Map 1: 0/1 Reducer 2: 0/1 > INFO : Map 1: 0(+1)/1 Reducer 2: 0/1 > INFO : Map 1: 1/1 Reducer 2: 0(+1)/1 > INFO : Map 1: 1/1 Reducer 2: 1/1 > +--------------+------+--+ > | _c0 | _c1 | > +--------------+------+--+ > | 2015-04-13- | 3.6 | > +--------------+------+--+ > 1 row selected (3.258 seconds) > 0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_orc; > No rows affected (0.109 seconds) > 0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_orc (name VARCHAR(64), > age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC; > No rows affected (0.063 seconds) > 0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_orc VALUES ('fred > flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, > '2015-04-13 13:40:00', 2.32); > No rows affected (2.125 seconds) > INFO : Session is already open > INFO : > INFO : Status: Running (Executing on YARN cluster with App id > application_1428656093356_0047) > INFO : Map 1: 0/1 > INFO : Map 1: 0(+1)/1 > INFO : Map 1: 1/1 > INFO : Loading data to table pvr_temp.students_orc from > hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-26_056_1247475009666467472-20/-ext-10000 > INFO : Table pvr_temp.students_orc stats: [numFiles=1, numRows=2, > totalSize=590, rawDataSize=508] > 0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), '-'), > SUM(gpa) FROM students_orc GROUP BY CONCAT(TO_DATE(datetime), '-'); > INFO : Session is already open > INFO : > INFO : Status: Running (Executing on YARN cluster with App id > application_1428656093356_0047) > INFO : Map 1: -/- Reducer 2: 0/1 > INFO : Map 1: 0(+1)/1 Reducer 2: 0/1 > INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/1 > INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/1 > INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/1 > INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/1 > INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/1 > INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/1 > ERROR : Status: Failed > ERROR : Vertex failed, vertexName=Map 1, > vertexId=vertex_1428656093356_0047_4_00, diagnostics=[Task failed, > taskId=task_1428656093356_0047_4_00_000000, diagnostics=[TaskAttempt 0 > failed, info=[Error: Failure while running task:java.lang.RuntimeException: > java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) > ... 13 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported > vector output type: StringGroup > at > org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139) > at > org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201) > ... 14 more > ], TaskAttempt 1 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator > initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) > ... 13 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported > vector output type: StringGroup > at > org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139) > at > org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201) > ... 14 more > ], TaskAttempt 2 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator > initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) > ... 13 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported > vector output type: StringGroup > at > org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139) > at > org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201) > ... 14 more > ], TaskAttempt 3 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator > initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) > ... 13 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported > vector output type: StringGroup > at > org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139) > at > org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201) > ... 14 more > ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex > vertex_1428656093356_0047_4_00 [Map 1] killed/failed due to:null] > ERROR : Vertex killed, vertexName=Reducer 2, > vertexId=vertex_1428656093356_0047_4_01, diagnostics=[Vertex received Kill > while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, > Vertex vertex_1428656093356_0047_4_01 [Reducer 2] killed/failed due to:null] > ERROR : DAG failed due to vertex failure. failedVertices:1 killedVertices:1 > Error: Error while processing statement: FAILED: Execution Error, return code > 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=2) > Closing: 0: jdbc:hive2://tpcrmm03s:10000 -- This message was sent by Atlassian JIRA (v6.3.4#6332)