----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27404/#review59546 -----------------------------------------------------------
This is looking good overall, but I have a few comments about how it works below. I'd also like to see unit tests for the writeArray and, if added, writeMap methods. That would be a good way to demonstrate the ArrayWritable-of-ArrayWritables structure this relies on and also expand into some more test cases. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java <https://reviews.apache.org/r/27404/#comment100810> I think this patch would benefit from some slight refactoring. The startGroup/endGroup calls have been changed so that each key-value pair or array element has its own startGroup/endGroup call. That's needed, but it seems awkward that this makes the assumptions of writeData and writeArray not match -- writeData expects startGroup and endGroup to have been called already, while writeArray does not. Another problem is that this still doesn't check for LIST and MAP annotations. I think it would be much better to make this match the (draft) spec more closely, which would clean up the problem above. By looking for LIST or MAP, the call to writeArray could be moved to the level that wraps the repeated group. Then the startGroup/endGroup calls could be consistent. This would also allow you to do more validation. First, you could check that the group contained by a LIST or MAP annotation actually is a repeated group and you could check whether LIST and MAP annotate the repeated group itself and adjust accordingly. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java <https://reviews.apache.org/r/27404/#comment100812> It would be much easier to read and understand this section if the names were more descriptive than subValue and subType. There are also validations that should be done at this level, like checking that the key-value type in a map is named "key_value", that the key is required, etc. It would probably be worth breaking writeArray and writeMap into separate methods that made assumptions about the structure of the type, validated those assumptions, and then wrote out values with more understandable code. - Ryan Blue On Oct. 30, 2014, 4:43 p.m., Sergio Pena wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27404/ > ----------------------------------------------------------- > > (Updated Oct. 30, 2014, 4:43 p.m.) > > > Review request for hive. > > > Bugs: HIVE-8359 > https://issues.apache.org/jira/browse/HIVE-8359 > > > Repository: hive-git > > > Description > ------- > > The patch changes the way DataWritableWriter class writes an array of > elements to the Parquet record. > It wraps each array record into a new startGroup/endGroup block so that > Parquet can detect null values on those optional fields. > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java > c7078efe27482df0a11dd68ac068da27dbcf51b3 > ql/src/test/queries/clientpositive/parquet_map_null.q PRE-CREATION > ql/src/test/results/clientpositive/parquet_map_null.q.out PRE-CREATION > > Diff: https://reviews.apache.org/r/27404/diff/ > > > Testing > ------- > > > Thanks, > > Sergio Pena > >