Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Eugene Koifman Tue, 22 Jan 2019 17:34:33 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review212198
-----------------------------------------------------------





common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2702 (patched)
<https://reviews.apache.org/r/69367/#comment297881>

    I think this needs a better description



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
Lines 848 (patched)
<https://reviews.apache.org/r/69367/#comment297879>

    Why is this needed?  Shouldn't the compiler set this?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 132 (patched)
<https://reviews.apache.org/r/69367/#comment297882>

    what is orc.rows.between.memory.checks'='1 for?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 196 (patched)
<https://reviews.apache.org/r/69367/#comment297884>

    I don't understand the logic here.  Since major compaction was done above, 
there should only be base/bucket0 and base/bucket1 so there is nothing for this 
query to group.  Also, I would think SPLIT_GROUPING_MODE should be "query" 
here...  if it's not, where is it set?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 255 (patched)
<https://reviews.apache.org/r/69367/#comment297885>

    nit: this could just do ShowCompactions to see if anything got queued up



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 309 (patched)
<https://reviews.apache.org/r/69367/#comment297886>

    How does (3,3,x) end up in bucket0?  with bucketing_version=1 it should be 
(val mod num_buckets)=bucketId.



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 312 (patched)
<https://reviews.apache.org/r/69367/#comment297887>

    And smimilarly, (4,4) is in bucket1...



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 315 (patched)
<https://reviews.apache.org/r/69367/#comment297888>

    since you just ran a major compaction, there is only 1 file per bucket so 
would split grouper do anything?  would there be > 1 split?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 325 (patched)
<https://reviews.apache.org/r/69367/#comment297889>

    It would be useful to add a test of a table w/o buckets.  Ideally one that 
has > 1 reducer during Insert so that there is > 1 output file.  I think there 
is some propety to specify number of reducers...  not sure if Tez respects it.



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 328 (patched)
<https://reviews.apache.org/r/69367/#comment297883>

    should this be set in setUp()?
    Alternatively, should conf be cloned?  seems error prone as it modifies 
state outside the method



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java
Lines 189 (patched)
<https://reviews.apache.org/r/69367/#comment297890>

    What is this for?  It seems fragile since it forces some behavior on all 
tests.  Do any newly added tests rely on this?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Line 168 (original), 172 (patched)
<https://reviews.apache.org/r/69367/#comment297900>

    nit: are empty param decls needed?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 202 (patched)
<https://reviews.apache.org/r/69367/#comment297905>

    It should be rowidoffset or splitstart.  For 'original' splits (w/o acid 
meta cols in the file) SyntheticBucketProperties should always be there and so 
rowIdOffset is there.  For 'native' acid files, OrcSplit doesn't have the 1st 
rowid in the split so splitStart is used to sort.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 203 (patched)
<https://reviews.apache.org/r/69367/#comment297906>

    Would be useful to describe what that invariant is.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 241 (patched)
<https://reviews.apache.org/r/69367/#comment297907>

    This is important to add



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 281 (patched)
<https://reviews.apache.org/r/69367/#comment297908>

    what is this TODO for?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java
Line 1233 (original)
<https://reviews.apache.org/r/69367/#comment297896>

    Seems that now the class level JavaDoc is out of sync



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java
Lines 637 (patched)
<https://reviews.apache.org/r/69367/#comment297891>

    What throws the IAE?  Above I see
    if (!reader.hasMetadataValue(OrcRecordUpdater.ACID_KEY_INDEX_NAME)) {
    
    shouldn't it bail out there if there is no index?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java
Lines 638 (patched)
<https://reviews.apache.org/r/69367/#comment297892>

    is there a followup Jira for this?



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
Lines 2201 (patched)
<https://reviews.apache.org/r/69367/#comment297893>

    it would be helpful to add COMPACTOR_CRUD_QUERY_BASED property name to the 
error msg



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 248 (patched)
<https://reviews.apache.org/r/69367/#comment297919>

    What does this do for MM table?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 371 (patched)
<https://reviews.apache.org/r/69367/#comment297922>

    should 'conf' be cloned?  will this affect 'conf' for something else?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 537 (patched)
<https://reviews.apache.org/r/69367/#comment297923>

    why does it need "0+ ..."



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
Lines 101 (patched)
<https://reviews.apache.org/r/69367/#comment297895>

    Useful to include table/part name in the msg



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 17 (patched)
<https://reviews.apache.org/r/69367/#comment297912>

    ROW__ID.bucket_column - you mean bucketId?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 57 (patched)
<https://reviews.apache.org/r/69367/#comment297915>

    This doesn't compare statemetId anywhere but it should.
    
    I think the easiest is to compare bucketProperty or you could extract 
statemetId from it and do it explicitly



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 60 (patched)
<https://reviews.apache.org/r/69367/#comment297917>

    I don't think equals makes sense



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 61 (patched)
<https://reviews.apache.org/r/69367/#comment297918>

    it maybe useful to include both ROW__IDs in the message.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 74 (patched)
<https://reviews.apache.org/r/69367/#comment297916>

    nit: make class and fields final to make sure compareTo is inlined?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 91 (patched)
<https://reviews.apache.org/r/69367/#comment297914>

    when is it ok for 2 consecutive ROW_IDs to be equal?


- Eugene Koifman


On Jan. 21, 2019, 11:04 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2019, 11:04 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
>     https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> d6a41919bf 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 
> 15c14c9be5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> fbb931cbcd 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
> 6d4578e7a0 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 0e5b3e5473 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> dc05e1990e 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> a0df82cb20 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/7/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>

Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Reply via email to