----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/785/ -----------------------------------------------------------
(Updated 2011-06-02 20:36:48.205733) Review request for hive. Changes ------- -Fixed issues pointed out in the review. -Changed metric name to rawDataSize instead of uncompressedSize Summary ------- Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we collect statistics about the number of rows per partition/table. Other statistics (e.g., total table/partition size) are derived from the file system. We introduce a new feature for collecting information about the sizes of uncompressed data, to be able to determine the efficiency of compression. On top of adding the new statistic collected, this patch extends the stats collection mechanism, so any new statistics could be added easily. 1. serializer/deserializer classes are amended to accommodate collecting sizes of uncompressed data, when serializing/deserializing objects. We support: Columnar SerDe LazySimpleSerDe LazyBinarySerDe For other SerDe classes the uncompressed siez will be 0. 2. StatsPublisher / StatsAggregator interfaces are extended to support multi-stats collection for both JDBC and HBase. 3. For both INSERT OVERWRITE and ANALYZE statements, FileSinkOperator and TableScanOperator respectively are extended to support multi-stats collection. (2) and (3) enable easy extension for other types of statistics. 4. Collecting uncompressed size can be disabled by setting: hive.stats.collect.uncompressedsize = false This addresses bug HIVE-2185. https://issues.apache.org/jira/browse/HIVE-2185 Diffs (updated) ----- trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1130791 trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/RegexSerDe.java 1130791 trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/TypedBytesSerDe.java 1130791 trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java 1130791 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 1130791 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java 1130791 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java 1130791 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsSetupConstants.java 1130791 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsUtils.java PRE-CREATION trunk/hbase-handler/src/test/queries/hbase_stats.q 1130791 trunk/hbase-handler/src/test/queries/hbase_stats2.q PRE-CREATION trunk/hbase-handler/src/test/results/hbase_stats.q.out 1130791 trunk/hbase-handler/src/test/results/hbase_stats2.q.out PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Stat.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsPublisher.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsSetupConst.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsSetupConstants.java 1130791 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java PRE-CREATION trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisher.java 1130791 trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisherEnhanced.java PRE-CREATION trunk/ql/src/test/org/apache/hadoop/hive/serde2/TestSerDe.java 1130791 trunk/ql/src/test/queries/clientpositive/stats14.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/stats15.q PRE-CREATION trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out 1130791 trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out 1130791 trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out 1130791 trunk/ql/src/test/results/clientpositive/bucketmapjoin4.q.out 1130791 trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out 1130791 trunk/ql/src/test/results/clientpositive/combine2.q.out 1130791 trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out 1130791 trunk/ql/src/test/results/clientpositive/join_map_ppr.q.out 1130791 trunk/ql/src/test/results/clientpositive/merge3.q.out 1130791 trunk/ql/src/test/results/clientpositive/merge4.q.out 1130791 trunk/ql/src/test/results/clientpositive/pcr.q.out 1130791 trunk/ql/src/test/results/clientpositive/sample10.q.out 1130791 trunk/ql/src/test/results/clientpositive/stats11.q.out 1130791 trunk/ql/src/test/results/clientpositive/stats14.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/stats15.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/union22.q.out 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/Deserializer.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/MetadataTypedColumnsetSerDe.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStatsStruct.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/Serializer.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/TypedSerDe.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDe.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java 1130791 trunk/serde/src/java/org/apache/hadoop/hive/serde2/thrift/ThriftDeserializer.java 1130791 trunk/serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/785/diff Testing ------- - additional JUnit test for Serializer/Deserializer amended classes - additional queries for TestCliDriver over multi-partition tables - all other JUnit tests - standalone setup Thanks, Tomasz