[ 
https://issues.apache.org/jira/browse/HIVE-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved HIVE-8419.
-----------------------------
    Resolution: Fixed
      Assignee: Sergio Peña

I think that this was fixed by Sergio's recent work, as he pointed out. If not, 
please reopen the issue.

> Hive doesn't properly write NULL values in Parquet files when the type is 
> struct<...>.
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-8419
>                 URL: https://issues.apache.org/jira/browse/HIVE-8419
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 0.13.1
>            Reporter: Frédéric TERRAZZONI
>            Assignee: Sergio Peña
>
> Hive doesn't seem to be able to write NULL values in a column of type 
> "struct". Instead, it replaces them by empty objects (= non NULL objects 
> containing only NULL values).
> Here is a short example demonstrating the issue. We start with a small Avro 
> table "avro_table".
> {code} SELECT  * from avro_table {code}
> || mycol ||
> || struct<field1:string,field2:double> || 
> | {"field1":"blabla","field2":1.0} | 
>  | {"field1":"blabla","field2":2.0} | 
>  | NULL | 
>  | {"field1":"blabla","field2":4.0} | 
>  | {"field1":"blabla","field2":5.0} | 
> As you can see here, the third row contains a NULL cell. Then, let's copy it 
> using Hive (INSERT OVERWRITE ...) into a Parquet table named "parquet_table". 
> Finally, when you try to display it:
> {code} SELECT  * from parquet_table {code}
> || mycol ||
> || struct<field1:string,field2:double> || 
> | {"field1":"blabla","field2":1.0} | 
>  | {"field1":"blabla","field2":2.0}  | 
>  | {"field1":null,"field2":null} | 
>  | {"field1":"blabla","field2":4.0} | 
>  | {"field1":"blabla","field2":5.0} | 
> I tried to generate a (correct) Parquet file using our software (Dataiku), 
> and Hive had no problem reading null values, even when the column type was 
> "struct". 
> Consequently, I suspect the bug to be located in the Parquet writer code.
> This bug also recursively propagates to nested types. For instance a NULL 
> cell of type {code} struct<field1:struct<field3:string>,field2:double> {code} 
> will be become {code} {"field1":{"field3":null},"field2":null} {code} when 
> written in a Parquet file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to