[ https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371448#comment-14371448 ]
Sergio Peña commented on HIVE-10016: ------------------------------------ Looks good [~dongc]. Just a couple of small comments: - In DataWritableRecordConverter.java Could you remove the imports that are not used anymore: * import parquet.schema.MessageTypeParser; * import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport; - In DataWritableReadSupport.java I think the 'MessageType tableSchema' is not needed. What if we just assign the value to hiveTableSchema, and use this variable in the rest of the block? MessageType tableSchema = new MessageType(TABLE_SCHEMA, typeListTable); hiveTableSchema = tableSchema; could it be: hiveTableSchema = new MessageType(TABLE_SCHEMA, typeListTable); > Remove duplicated Hive table schema parsing in DataWritableReadSupport > ---------------------------------------------------------------------- > > Key: HIVE-10016 > URL: https://issues.apache.org/jira/browse/HIVE-10016 > Project: Hive > Issue Type: Sub-task > Reporter: Dong Chen > Assignee: Dong Chen > Attachments: HIVE-10016-parquet.patch > > > In {{DataWritableReadSupport.init()}}, the table schema is created and its > string format is set in conf. When construct the > {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed > several times. > We could remove these schema parsing, and improve the speed of > getRecordReader a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)