[ https://issues.apache.org/jira/browse/HIVE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039943#comment-13039943 ]
jirapos...@reviews.apache.org commented on HIVE-2185: ----------------------------------------------------- bq. On 2011-05-26 21:12:30, Ning Zhang wrote: bq. > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java, line 100 bq. > <https://reviews.apache.org/r/785/diff/1/?file=19586#file19586line100> bq. > bq. > should be >= here Yes. bq. On 2011-05-26 21:12:30, Ning Zhang wrote: bq. > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java, line 82 bq. > <https://reviews.apache.org/r/785/diff/1/?file=19585#file19585line82> bq. > bq. > Isn't isValidStatics() should take "key" as a parameter rather than "rowID"? "key" should indicate which statistics this is right? Yes. It was a bug, I fixed already, once I ran the HBase JUnit :) - Tomasz ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/785/#review719 ----------------------------------------------------------- On 2011-05-26 02:52:55, Tomasz Nykiel wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/785/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-05-26 02:52:55) bq. bq. bq. Review request for hive. bq. bq. bq. Summary bq. ------- bq. bq. Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we collect statistics about the number of rows per partition/table. bq. Other statistics (e.g., total table/partition size) are derived from the file system. bq. bq. We introduce a new feature for collecting information about the sizes of uncompressed data, to be able to determine the efficiency of compression. bq. On top of adding the new statistic collected, this patch extends the stats collection mechanism, so any new statistics could be added easily. bq. bq. 1. serializer/deserializer classes are amended to accommodate collecting sizes of uncompressed data, when serializing/deserializing objects. bq. We support: bq. bq. Columnar SerDe bq. LazySimpleSerDe bq. LazyBinarySerDe bq. bq. For other SerDe classes the uncompressed siez will be 0. bq. bq. 2. StatsPublisher / StatsAggregator interfaces are extended to support multi-stats collection for both JDBC and HBase. bq. bq. 3. For both INSERT OVERWRITE and ANALYZE statements, FileSinkOperator and TableScanOperator respectively are extended to support multi-stats collection. bq. bq. (2) and (3) enable easy extension for other types of statistics. bq. bq. 4. Collecting uncompressed size can be disabled by setting: bq. bq. hive.stats.collect.uncompressedsize = false bq. bq. bq. This addresses bug HIVE-2185. bq. https://issues.apache.org/jira/browse/HIVE-2185 bq. bq. bq. Diffs bq. ----- bq. bq. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1127756 bq. trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/RegexSerDe.java 1127756 bq. trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/TypedBytesSerDe.java 1127756 bq. trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java 1127756 bq. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 1127756 bq. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java 1127756 bq. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java 1127756 bq. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsSetupConstants.java 1127756 bq. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsUtils.java PRE-CREATION bq. trunk/hbase-handler/src/test/queries/hbase_stats.q 1127756 bq. trunk/hbase-handler/src/test/results/hbase_stats.q.out 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Stat.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsPublisher.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsSetupConst.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsSetupConstants.java 1127756 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java PRE-CREATION bq. trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisher.java 1127756 bq. trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisherEnhanced.java PRE-CREATION bq. trunk/ql/src/test/org/apache/hadoop/hive/serde2/TestSerDe.java 1127756 bq. trunk/ql/src/test/queries/clientpositive/stats14.q PRE-CREATION bq. trunk/ql/src/test/queries/clientpositive/stats15.q PRE-CREATION bq. trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/bucketmapjoin4.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/combine2.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/join_map_ppr.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/merge3.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/merge4.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/pcr.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/sample10.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/stats11.q.out 1127756 bq. trunk/ql/src/test/results/clientpositive/stats14.q.out PRE-CREATION bq. trunk/ql/src/test/results/clientpositive/stats15.q.out PRE-CREATION bq. trunk/ql/src/test/results/clientpositive/union22.q.out 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/Deserializer.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/MetadataTypedColumnsetSerDe.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java PRE-CREATION bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStatsStruct.java PRE-CREATION bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/Serializer.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/TypedSerDe.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDe.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java 1127756 bq. trunk/serde/src/java/org/apache/hadoop/hive/serde2/thrift/ThriftDeserializer.java 1127756 bq. trunk/serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/785/diff bq. bq. bq. Testing bq. ------- bq. bq. - additional JUnit test for Serializer/Deserializer amended classes bq. - additional queries for TestCliDriver over multi-partition tables bq. - all other JUnit tests bq. - standalone setup bq. bq. bq. Thanks, bq. bq. Tomasz bq. bq. > extend table statistics to store the size of uncompressed data (+extend > interfaces for collecting other types of statistics) > ---------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-2185 > URL: https://issues.apache.org/jira/browse/HIVE-2185 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers, Statistics > Reporter: Tomasz Nykiel > Assignee: Tomasz Nykiel > Attachments: HIVE-2185.patch > > > Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we > collect statistics about the number of rows per partition/table. Other > statistics (e.g., total table/partition size) are derived from the file > system. > Here, we want to collect information about the sizes of uncompressed data, to > be able to determine the efficiency of compression. > Currently, a large part of statistics collection mechanism is hardcoded and > not-easily extensible for other statistics. > On top of adding the new statistic collected, it would be desirable to extend > the collection mechanism, so any new statistics could be added easily. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira