[jira] [Comment Edited] (HIVE-19580) Hive 2.3.2 with ORC files stored on S3 are case sensitive

Darrell Ross (JIRA) Tue, 19 Feb 2019 10:41:03 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772216#comment-16772216
 ]


Darrell Ross edited comment on HIVE-19580 at 2/19/19 6:39 PM:
--------------------------------------------------------------

Has anyone found a solution to this? I have not reproduced it outside of s3 
because our Hortonworks cluster is running Hive 1.x.

EMR 5.2.2 runs Hive 2.1.0 and does not have the bug.

Using EMR 5.20.0 with Hive 2.3.2 to connect to a Glue backed metastore for Hive 
with AWS' glue crawler creating the tables produces the same problem.

Attempting to get Amazon to look into it.

 


was (Author: eukota):
Has anyone found a solution to this? I have not reproduced it outside of s3 
because our Hortonworks cluster is running Hive 1.x.

EMR 5.2.2 runs Hive 2.1.0 and does not have the bug.

Using EMR 5.20.0 to connect to a Glue backed metastore for Hive with AWS' glue 
crawler creating the tables produces the same problem.

Attempting to get Amazon to look into it.

 

> Hive 2.3.2 with ORC files stored on S3 are case sensitive
> ---------------------------------------------------------
>
>                 Key: HIVE-19580
>                 URL: https://issues.apache.org/jira/browse/HIVE-19580
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.3.2
>         Environment: AWS S3 to store files
> Spark 2.3 but also true for lower versions
> Hive 2.3.2
>            Reporter: Arthur Baudry
>            Priority: Major
>             Fix For: 2.3.2
>
>
> Original file is csv:
> COL1,COL2
>  1,2
> ORC file are created with Spark 2.3:
> scala> val df = spark.read.option("header","true").csv("/user/hadoop/file")
> scala> df.printSchema
>  root
> |– COL1: string (nullable = true)|
> |– COL2: string (nullable = true)|
> scala> df.write.orc("s3://bucket/prefix")
> In Hive:
> hive> CREATE EXTERNAL TABLE test_orc(COL1 STRING, COL2 STRING) STORED AS ORC 
> LOCATION ("s3://bucket/prefix");
> hive> SELECT * FROM test_orc;
>  OK
>  NULL NULL
> *Everyfield is null. However if fields are generated using lower case in 
> Spark schemas then everything works.*
> The reason why I'm raising this bug is that we have customers using Hive 
> 2.3.2 to read files we generate through Spark and all our code base is 
> addressing fields using upper case while this is incompatible with their Hive 
> instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-19580) Hive 2.3.2 with ORC files stored on S3 are case sensitive

Reply via email to