Szehon: another question related to the types support:
if I convert an avro field of UNION to parquet, does hive support that UNION field ? a UNION is needed because avro field can not take NULL, and I have to define every field as an UNION of original type and NULL. Thanks Yang On Mon, Feb 9, 2015 at 1:05 PM, Yang <teddyyyy...@gmail.com> wrote: > Thanks Szehon! > > On Tue, Feb 3, 2015 at 7:33 PM, Szehon Ho <sze...@cloudera.com> wrote: > >> Hi Yang >> >> I saw you posted this question in several places, I gave an answer in >> HIVE-6394 as I saw that one first, to the timestamp query. >> >> Can't speak about about date support, as its not in my knowledge. >> >> Thanks >> Szehon >> >> On Mon, Feb 2, 2015 at 4:15 PM, Yang <teddyyyy...@gmail.com> wrote: >> >>> the parquet spec about logical types and Timestamp specifically, seems >>> to say >>> https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md >>> "TIMESTAMP_MILLIS is used for a combined logical date and time type. It >>> must annotate an int64 that stores the number of milliseconds from the >>> Unix epoch, 00:00:00.000 on 1 January 1970, UTC. >>> >>> <https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#interval> >>> " >>> >>> >>> i.e. here it says that the type is only precise to the point of >>> miliseconds and it starts from 1970. >>> >>> >>> but if u look at the hive-parquet code in >>> >>> https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java#L142 >>> >>> https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java#L54 >>> it seems that hive's encoding of timestamp on parquet is of a different >>> spec, precise to the point of nano seconds, and starting from "Monday, >>> January 1, 4713 " (defined in jodd.datetime.JDateTime) >>> >>> >>> so Hive's parquet timestamp storage is completely different from the >>> above spec ? >>> >>> >>> >>> >>> what about Date support? https://issues.apache.org/jira/browse/HIVE-8119 >>> are we going to have a different on-disk binary encoding than the >>> "int32" specified in the above doc? >>> >>> thanks >>> Yang >>> >>> >>> >>> >>> >> >