-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/633/#review558
-----------------------------------------------------------



trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
<https://reviews.apache.org/r/633/#comment1178>

    The naming of this parameter is a little bit confusing: the parameter key 
is called "randomnumber" but the value of it is a fixed number.  Do you mean 
this number is actually the seed to generate samples?



trunk/conf/hive-default.xml
<https://reviews.apache.org/r/633/#comment1179>

    same as above.



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
<https://reviews.apache.org/r/633/#comment1195>

    better add a comment for this function explain what the rationale behind 
sampling splits.



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
<https://reviews.apache.org/r/633/#comment1192>

    we should declare retLists as interface (List) rather than implementation 
(ArrayList)



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
<https://reviews.apache.org/r/633/#comment1193>

    same here, should declare it as Map rather than HashMap.



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
<https://reviews.apache.org/r/633/#comment1194>

    can you add comments here? 



trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/633/#comment1182>

    can you add a comment on what this Map is used for and what are the key and 
value of the Map?



trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/633/#comment1184>

    I think we don't need to introduce this parameter at all. For one it is a 
new feature rather than a different code path for an old feature. We don't need 
the "fallback" protection by a new parameter. Secondly, throwing a 
SemanticException here can only make the user asking how to solve this problem, 
which is to set the parameter to true. So it seems that it doesn't make sense 
to set the parameter to false in any cases. So why not remove the this 
parameter to make it cleaner. 



trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/633/#comment1189>

    you may want to check the percentage number (if it is a valid double and 
within the range [0,100]) and throw SemanticException if it is invalid before 
creating a SplitSample object.



trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/633/#comment1190>

    why limit cannot be combined with block sampling?



trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SplitSample.java
<https://reviews.apache.org/r/633/#comment1191>

    This comment doesn't belong to this class.


- Ning


On 2011-04-20 18:28:29, Siying Dong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/633/
> -----------------------------------------------------------
> 
> (Updated 2011-04-20 18:28:29)
> 
> 
> Review request for hive, Ning Zhang and namit jain.
> 
> 
> Summary
> -------
> 
> We need a better input sampling to serve at least two purposes:
> 1. test their queries against a smaller data set
> 2. understand more about how the data look like without scanning the whole 
> table.
> A simple function that gives a subset splits will help in those cases. It 
> doesn't have to be strict sampling.
> 
> This diff allows a syntax of .. table TABLESAMPLE(n PERCENT), which samples 
> input splits with size at least n% of the original inputs.
> 
> 
> This addresses bug HIVE-2121.
>     https://issues.apache.org/jira/browse/HIVE-2121
> 
> 
> Diffs
> -----
> 
>   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1095244 
>   trunk/conf/hive-default.xml 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
> 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 
> 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 
> 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java 
> 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinFactory.java 
> 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 1095244 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SplitSample.java 
> PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1095244 
>   trunk/ql/src/test/queries/clientnegative/split_sample_disabled.q 
> PRE-CREATION 
>   trunk/ql/src/test/queries/clientnegative/split_sample_wrong_format.q 
> PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/split_sample.q PRE-CREATION 
>   trunk/ql/src/test/results/clientnegative/split_sample_disabled.q.out 
> PRE-CREATION 
>   trunk/ql/src/test/results/clientnegative/split_sample_wrong_format.q.out 
> PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/bucket1.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/bucket2.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/bucket3.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/sample1.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/sample10.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/sample2.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/sample3.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/sample4.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/sample5.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/sample6.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/sample7.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/sample8.q.out 1095244 
>   trunk/ql/src/test/results/clientpositive/sample9.q.out 1095244 
> 
> Diff: https://reviews.apache.org/r/633/diff
> 
> 
> Testing
> -------
> 
> TestCliDriver TestNegativeCliDriver, manual tests on real clusters.
> 
> 
> Thanks,
> 
> Siying
> 
>

Reply via email to