----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/633/ -----------------------------------------------------------
(Updated 2011-04-28 08:32:17.534107) Review request for hive, Ning Zhang and namit jain. Changes ------- Two changes made according to Namit's comments: 1. explain will print out some about the sampling. (It might not be the best way to print but it follows the framework) 2. the granularity of sampling is down from split-level to HDFS block level. Summary ------- We need a better input sampling to serve at least two purposes: 1. test their queries against a smaller data set 2. understand more about how the data look like without scanning the whole table. A simple function that gives a subset splits will help in those cases. It doesn't have to be strict sampling. This diff allows a syntax of .. table TABLESAMPLE(n PERCENT), which samples input splits with size at least n% of the original inputs. This addresses bug HIVE-2121. https://issues.apache.org/jira/browse/HIVE-2121 Diffs (updated) ----- trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1096852 trunk/conf/hive-default.xml 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinFactory.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SplitSample.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1096852 trunk/ql/src/test/queries/clientnegative/split_sample_out_of_range.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/split_sample_wrong_format.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/split_sample.q PRE-CREATION trunk/ql/src/test/results/clientnegative/split_sample_out_of_range.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/split_sample_wrong_format.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/bucket1.q.out 1096852 trunk/ql/src/test/results/clientpositive/bucket2.q.out 1096852 trunk/ql/src/test/results/clientpositive/bucket3.q.out 1096852 trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample1.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample10.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample2.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample3.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample4.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample5.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample6.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample7.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample8.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample9.q.out 1096852 trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java 1096852 trunk/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 1096852 trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java 1096852 Diff: https://reviews.apache.org/r/633/diff Testing ------- TestCliDriver TestNegativeCliDriver, manual tests on real clusters. Thanks, Siying