[ https://issues.apache.org/jira/browse/HIVE-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Blue resolved HIVE-8419. ----------------------------- Resolution: Fixed Assignee: Sergio Peña I think that this was fixed by Sergio's recent work, as he pointed out. If not, please reopen the issue. > Hive doesn't properly write NULL values in Parquet files when the type is > struct<...>. > -------------------------------------------------------------------------------------- > > Key: HIVE-8419 > URL: https://issues.apache.org/jira/browse/HIVE-8419 > Project: Hive > Issue Type: Bug > Components: File Formats > Affects Versions: 0.13.1 > Reporter: Frédéric TERRAZZONI > Assignee: Sergio Peña > > Hive doesn't seem to be able to write NULL values in a column of type > "struct". Instead, it replaces them by empty objects (= non NULL objects > containing only NULL values). > Here is a short example demonstrating the issue. We start with a small Avro > table "avro_table". > {code} SELECT * from avro_table {code} > || mycol || > || struct<field1:string,field2:double> || > | {"field1":"blabla","field2":1.0} | > | {"field1":"blabla","field2":2.0} | > | NULL | > | {"field1":"blabla","field2":4.0} | > | {"field1":"blabla","field2":5.0} | > As you can see here, the third row contains a NULL cell. Then, let's copy it > using Hive (INSERT OVERWRITE ...) into a Parquet table named "parquet_table". > Finally, when you try to display it: > {code} SELECT * from parquet_table {code} > || mycol || > || struct<field1:string,field2:double> || > | {"field1":"blabla","field2":1.0} | > | {"field1":"blabla","field2":2.0} | > | {"field1":null,"field2":null} | > | {"field1":"blabla","field2":4.0} | > | {"field1":"blabla","field2":5.0} | > I tried to generate a (correct) Parquet file using our software (Dataiku), > and Hive had no problem reading null values, even when the column type was > "struct". > Consequently, I suspect the bug to be located in the Parquet writer code. > This bug also recursively propagates to nested types. For instance a NULL > cell of type {code} struct<field1:struct<field3:string>,field2:double> {code} > will be become {code} {"field1":{"field3":null},"field2":null} {code} when > written in a Parquet file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)