[ 
https://issues.apache.org/jira/browse/HIVE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163093#comment-14163093
 ] 

Navis commented on HIVE-6198:
-----------------------------

[~jdere] Look like both patches are needed to fix this. I think field name 
could be case sensitive but it's not debated properly (see 
http://www.mail-archive.com/dev%40hive.apache.org/msg76572.html).

> ORC file and struct column names are case sensitive
> ---------------------------------------------------
>
>                 Key: HIVE-6198
>                 URL: https://issues.apache.org/jira/browse/HIVE-6198
>             Project: Hive
>          Issue Type: Bug
>          Components: CLI, File Formats
>    Affects Versions: 0.11.0, 0.12.0
>            Reporter: Viraj Bhat
>            Assignee: Navis
>         Attachments: HIVE-6198.1.patch.txt, HIVE-6198.2.patch.txt, 
> HIVE-6198.3.patch.txt
>
>
> HiveQL document states that the "Table names and column names are case 
> insensitive". But the struct behavior for ORC file is different. 
> Consider a sample text file:
> {code}
> $ cat data.txt
> line1|key11:value11,key12:value12,key13:value13|a,b,c|one,two
> line2|key21:value21,key22:value22,key23:value23|d,e,f|three,four
> line3|key31:value31,key32:value32,key33:value33|g,h,i|five,six
> {code}
> Creating a table stored as txt and then using this to create a table stored 
> as orc 
> {code}
> CREATE TABLE orig (
>   str STRING,
>   mp  MAP<STRING,STRING>,
>   lst ARRAY<STRING>,
>   strct STRUCT<A:STRING,B:STRING>
> ) ROW FORMAT DELIMITED
>     FIELDS TERMINATED BY '|'
>     COLLECTION ITEMS TERMINATED BY ','
>     MAP KEYS TERMINATED BY ':';
> LOAD DATA LOCAL INPATH 'data.txt' INTO TABLE orig;
> CREATE TABLE tableorc (
>   str STRING,
>   mp  MAP<STRING,STRING>,
>   lst ARRAY<STRING>,
>   strct STRUCT<A:STRING,B:STRING>
> ) STORED AS ORC;
> INSERT OVERWRITE TABLE tableorc SELECT * FROM orig;
> {code}
> Suppose we project columns or read the *strct* columns for both table types, 
> here are the results. I have also tested the same with *RC*. The behavior is 
> similar to *txt* files.
> {code}
> hive> SELECT * FROM orig;
> line1   {"key11":"value11","key12":"value12","key13":"value13"} ["a","b","c"] 
>  
> {"a":"one","b":"two"}
> line2   {"key21":"value21","key22":"value22","key23":"value23"} ["d","e","f"] 
>  
> {"a":"three","b":"four"}
> line3   {"key31":"value31","key32":"value32","key33":"value33"} ["g","h","i"] 
>  
> {"a":"five","b":"six"}
> Time taken: 0.126 seconds, Fetched: 3 row(s)
> hive> SELECT * FROM tableorc;
> line1   {"key12":"value12","key11":"value11","key13":"value13"} ["a","b","c"] 
>  
> {"A":"one","B":"two"}
> line2   {"key21":"value21","key23":"value23","key22":"value22"} ["d","e","f"] 
>  
> {"A":"three","B":"four"}
> line3   {"key33":"value33","key31":"value31","key32":"value32"} ["g","h","i"] 
>  
> {"A":"five","B":"six"}
> Time taken: 0.178 seconds, Fetched: 3 row(s)
> hive> SELECT strct FROM tableorc;
> {"a":"one","b":"two"}
> {"a":"three","b":"four"}
> {"a":"five","b":"six"}
> hive>SELECT strct.A FROM orig;
> one
> three
> five
> hive>SELECT strct.a FROM orig;
> one
> three
> five
> hive>SELECT strct.A FROM tableorc;
> one
> three
> five
> hive>SELECT strct.a FROM tableorc;
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched: 
> Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
> {code}
> So it seems that ORC behaves differently for struct columns. Also why are we 
> storing the column names for struct for the other types as CASE SENSITIVE? 
> What is the standard for Hive QL with respect to structs?
> Regards
> Viraj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to