[jira] [Commented] (HIVE-21016) Duplicate column name in GROUP BY statement causing Vertex failures

Mani M (JIRA) Wed, 16 Jan 2019 02:24:25 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743828#comment-16743828
 ]


Mani M commented on HIVE-21016:
-------------------------------

HI [~pvary]

Is it correct to put the validation check before this line

[https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java#L359]

 

> Duplicate column name in GROUP BY statement causing Vertex failures
> -------------------------------------------------------------------
>
>                 Key: HIVE-21016
>                 URL: https://issues.apache.org/jira/browse/HIVE-21016
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: Bjorn Olsen
>            Priority: Major
>
> Hive queries fail with "Vertex failure" messages when the user submits a 
> query containing duplicate GROUP BY columns. The Hive query parser should 
> detect and reject this scenario with a meaningful error message, rather than 
> executing the query and failing with an obfuscated message. For complex 
> queries this can result in a lot of debugging effort, whereas a simple error 
> message could have saved some time.
> To repeat the issue, choose any table and perform a GROUP BY with a duplicate 
> column name.
> {{For example:}}
> select count( * ), party_id from party {{group by party_id, party_id;}}
> Note the duplicate column in the GROUP BY.
> This will fail with messages similar to below:
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) 0000ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
>  ... 14 more
>  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing vector batch (tag=0) 
> 0000ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381)
>  ... 17 more
>  *Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21016) Duplicate column name in GROUP BY statement causing Vertex failures

Reply via email to