----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28372/#review65004 -----------------------------------------------------------
I don't know why I didn't catch this earlier, but I'd also like to see a unit test for the new schema converter. hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java <https://reviews.apache.org/r/28372/#comment107915> Should there be a comment here about why this was removed? metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java <https://reviews.apache.org/r/28372/#comment107917> It isn't obvious why this is done or why deepCopy is used. Could you add a comment? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetSchemaReader.java <https://reviews.apache.org/r/28372/#comment107898> This could easily be a Path object rather than a String. In fact, it would be more convenient for the caller also. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetSchemaReader.java <https://reviews.apache.org/r/28372/#comment107902> This should throw IOException instead of coercing it to RuntimeException. This currently hides the fact that this might throw an Exception in the SerDe initialization, which is a bad thing. I think it makes sense to use the checked IOException and put a try/catch in initialize. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetToHiveSchemaConverter.java <https://reviews.apache.org/r/28372/#comment107910> I'm not sure we want to assume that an int96 is a timestamp. That's all it's used for today, but maybe we should make it an opt-in so that users must set the hive type to timestamp explicitly. That way, we keep from assuming that int96 will always be a timestamp or building behavior we might have to break later. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java <https://reviews.apache.org/r/28372/#comment107892> I'd rather not use parquet.file at all. Is there a compelling reason to keep it now that this can find files in the table location? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java <https://reviews.apache.org/r/28372/#comment107920> Why does this use a location property instead of the SD location? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java <https://reviews.apache.org/r/28372/#comment107911> ParquetSchemaReader throws RuntimeException (should be IOException though) and the converter throws UnsupportedOperationException and IllegalArgumentException (should that be UnsupportedOperationException?). I think this should have a try/catch with IOException and RuntimeException that prevents the initialization from failing. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java <https://reviews.apache.org/r/28372/#comment107912> This is another exception that shouldn't be thrown. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java <https://reviews.apache.org/r/28372/#comment107897> I think loc should be a Path rather than a String. This would avoid creating Path objects and converting them to Strings all the time. ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java <https://reviews.apache.org/r/28372/#comment107895> This appears to store the location in the storage descriptor, so I think that the javadoc is no longer correct. ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java <https://reviews.apache.org/r/28372/#comment107893> This will always be true. If the location is null, then the first branch of the containing conditional is always taken. ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java <https://reviews.apache.org/r/28372/#comment107894> Why not return null here? - Ryan Blue On Dec. 12, 2014, 4:03 p.m., Ashish Singh wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28372/ > ----------------------------------------------------------- > > (Updated Dec. 12, 2014, 4:03 p.m.) > > > Review request for hive. > > > Bugs: HIVE-8950 > https://issues.apache.org/jira/browse/HIVE-8950 > > > Repository: hive-git > > > Description > ------- > > HIVE-8950: Add support in ParquetHiveSerde to create table schema from a > parquet file > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > fafd78e63e9b41c9fdb0e017b567dc719d151784 > > hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java > 32186391e7e4cfc9b4d06d7376663e82ec08d9e6 > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > b137fcb86e27ac91ed3c733b4d8788228d379a09 > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java > 2db2658fbc57fba01c892c9213baef6c498e659b > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetSchemaReader.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetToHiveSchemaConverter.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java > 4effe736fcf9d3715f03eed9885c299a7aa040dd > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java > cd3d349e8bd8785d6cadaf9ed8fa7598f223774a > > ql/src/test/queries/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_optional_elements_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_required_elements_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_single_field_struct_gen_schema.q > PRE-CREATION > ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema_ext.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_avro_array_of_primitives_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q > PRE-CREATION > ql/src/test/queries/clientpositive/parquet_decimal_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q > PRE-CREATION > ql/src/test/results/clientpositive/create_view_partitioned.q.out > ebf9a6bc4f2321d7f539b7a445b3f279e3285b8a > > ql/src/test/results/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_optional_elements_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_required_elements_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_single_field_struct_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema_ext.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_avro_array_of_primitives_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q.out > PRE-CREATION > ql/src/test/results/clientpositive/parquet_decimal_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q.out > PRE-CREATION > > Diff: https://reviews.apache.org/r/28372/diff/ > > > Testing > ------- > > Tested by adding appropriate qTests. > > > Thanks, > > Ashish Singh > >