[ https://issues.apache.org/jira/browse/HIVE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743828#comment-16743828 ]
Mani M commented on HIVE-21016: ------------------------------- HI [~pvary] Is it correct to put the validation check before this line [https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java#L359] > Duplicate column name in GROUP BY statement causing Vertex failures > ------------------------------------------------------------------- > > Key: HIVE-21016 > URL: https://issues.apache.org/jira/browse/HIVE-21016 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.2.1 > Reporter: Bjorn Olsen > Priority: Major > > Hive queries fail with "Vertex failure" messages when the user submits a > query containing duplicate GROUP BY columns. The Hive query parser should > detect and reject this scenario with a meaningful error message, rather than > executing the query and failing with an obfuscated message. For complex > queries this can result in a lot of debugging effort, whereas a simple error > message could have saved some time. > To repeat the issue, choose any table and perform a GROUP BY with a duplicate > column name. > {{For example:}} > select count( * ), party_id from party {{group by party_id, party_id;}} > Note the duplicate column in the GROUP BY. > This will fail with messages similar to below: > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing vector batch (tag=0) 0000ffb9-5fb1-3024-922a-10cc313a7c171 > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > ... 14 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing vector batch (tag=0) > 0000ffb9-5fb1-3024-922a-10cc313a7c171 > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381) > ... 17 more > *Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector* -- This message was sent by Atlassian JIRA (v7.6.3#76005)