----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27404/#review60863 -----------------------------------------------------------
Great job! I think this is really close. I found one bug and some minor stuff, all of which is noted below. Other than those, I have two major comments: First, the organization seems to duplicate code from writeFields in writeMap and writeArray, where each field of the inner group is written. As it is, this duplicates the code from writeFields and writeFields could be used instead. This also deviates from what the other writer implementations do, which is to require the schema to match a certain pattern and write exactly that pattern. See [1] for an example of what I'm talking about. But I think this was actually structured this way to work with multiple write schemas... The second problem is that this works with multiple write schemas. This was probably a mis-communication on my part, but the backward-compatibility rules in PARQUET-113 are for the read side, not the write side. Hive is already producing the correct structure when writing lists and maps, so we don't need to change that. The problem that PARQUET-113 addresses is reading data written by parquet-avro or parquet-thrift (or similar variants) that uses the 2-level representation. We still need to fix compatibility with that data, but it probably should be done as a separate issue. [1] - https://github.com/rdblue/incubator-parquet-mr/blob/PARQUET-113-nested-types-avro/parquet-avro/src/main/java/parquet/avro/AvroWriteSupport.java#L118 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java <https://reviews.apache.org/r/27404/#comment102276> Minor: The size of map_values should be accessed with map_values.length, rather than value.get().length Nit: I'm not sure what variable name conventions are for this project, but java projects usually prefer camelCase, as in mapValues and mapKeyValue ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java <https://reviews.apache.org/r/27404/#comment102277> When would key_pair_value be null? If this is something Hive needs support for, then we should note it with a comment. Otherwise, the violation of this assumption should be an exception because Parquet can't handle a null value here. This differs slightly from the behavior for LIST because the list will have null values encoded here. For a map, however, the key-value pair is required and should always be non-null; just the map value could be null, which is inside this layer. ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java <https://reviews.apache.org/r/27404/#comment102267> This doesn't appear to be used in the test. Was this extra group in the record intended to test the fact that Hive ignores extra columns? If so, then can you note it as a comment? ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java <https://reviews.apache.org/r/27404/#comment102291> This test should also verify that null array values are supported by adding a createNull() call in between 1 and 2. I tried this and the test failed because the null check calls continue, which skips to the next iteration in writeArray and never calls endGroup(). Adding endGroup() there fixes it; good news! - Ryan Blue On Nov. 10, 2014, 11:04 a.m., Sergio Pena wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27404/ > ----------------------------------------------------------- > > (Updated Nov. 10, 2014, 11:04 a.m.) > > > Review request for hive, Ryan Blue and Brock Noland. > > > Bugs: HIVE-8359 > https://issues.apache.org/jira/browse/HIVE-8359 > > > Repository: hive-git > > > Description > ------- > > The patch changes the way DataWritableWriter class writes an array of > elements to the Parquet record. > It wraps each array record into a new startGroup/endGroup block so that > Parquet can detect null values on those optional fields. > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java > c7078efe27482df0a11dd68ac068da27dbcf51b3 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java > PRE-CREATION > ql/src/test/queries/clientpositive/parquet_map_null.q PRE-CREATION > ql/src/test/results/clientpositive/parquet_map_null.q.out PRE-CREATION > > Diff: https://reviews.apache.org/r/27404/diff/ > > > Testing > ------- > > > Thanks, > > Sergio Pena > >