Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Eugene Koifman Thu, 15 Nov 2018 19:02:49 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review210589
-----------------------------------------------------------





common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2685 (patched)
<https://reviews.apache.org/r/69367/#comment295290>

    "And minor compaction will be disabled." - should make sure Initiator 
doesn't start minor and that Alter Table commands requesting Minor are no-op or 
throw so that these don't get into the compactor queue.  We should also, 
perhaps think about how Initiator triggers Major compactions - are current 
config params adequate?  Should do at least the 2nd part in a follow up jira, 
maybe both.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Line 180 (original), 183 (patched)
<https://reviews.apache.org/r/69367/#comment295291>

    I guess all this should be no-op for compactor since it only looks at 1 
partition at a time and for acid serde and IF/OF don't change.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 197 (patched)
<https://reviews.apache.org/r/69367/#comment295292>

    bucketSplitMultiMap?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 206 (patched)
<https://reviews.apache.org/r/69367/#comment295293>

    the error should include table name if easily available here or if not 
maybe a file path from any of the splits...



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 214 (patched)
<https://reviews.apache.org/r/69367/#comment295294>

    should we assert that schemaSplitMultiMap has size=1 since that is what we 
expect for compactor?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 276 (patched)
<https://reviews.apache.org/r/69367/#comment295295>

    Add a comment that this is trully a bucketId (rather than bucket property - 
BucketCodec.java since 3.0) that is derived from file name
    
    WriteId is also from containing file name and for files that have min/max 
wrieid, it's the starting one.  Now that I look at the code in 
TransactionMetadata.findWriteIDForSynthetcRowIDs() - the assert there will 
throw.  It should be removed since where we have to handle files that come from 
compacted dirs so min <> max for all deltas.
    
    maybe these comments should be on OrcSplit where getter methods are defined.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java
Lines 68 (patched)
<https://reviews.apache.org/r/69367/#comment295296>

    mark these transient for clarity since we don't serialize them



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 245 (patched)
<https://reviews.apache.org/r/69367/#comment295297>

    Ideally this should be prevented before it gets into the compction_queue. 
throwing here will cause failed compactions to accumulate in SHOW COMPACTIONS 
and prevent auto-scheduling of new ones.



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 399 (patched)
<https://reviews.apache.org/r/69367/#comment295298>

    should this be in a finally{}?  SessionState is threadLocal so it may get 
reused... or do we shutdown the session each time?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 481 (patched)
<https://reviews.apache.org/r/69367/#comment295299>

    current write id should always be the same as original.  Only delete event 
can have these be different but major compaction absorbs delete events.



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 503 (patched)
<https://reviews.apache.org/r/69367/#comment295300>

    what's the value of specifying location for tmp table?  I'm surprised it's 
even legal.  Would this be a security hole potentially?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 510 (patched)
<https://reviews.apache.org/r/69367/#comment295302>

    why overwrite?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 513 (patched)
<https://reviews.apache.org/r/69367/#comment295301>

    why do you need partition key/values in the query? we are always reading a 
single partition.  This is achieved by getAcidState() which takes partition dir 
as input (i.e. all the files it returns are within a given partition)



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 542 (patched)
<https://reviews.apache.org/r/69367/#comment295303>

    need to think about this.  maybe it's ok...



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 565 (patched)
<https://reviews.apache.org/r/69367/#comment295304>

    there should be something in AcidUtils to parse original bucket file name


- Eugene Koifman


On Nov. 15, 2018, 4:59 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> -----------------------------------------------------------
> 
> (Updated Nov. 15, 2018, 4:59 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
>     https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 40dd992455 
>   pom.xml 26b662e4c3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 92c74e1d06 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>

Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Reply via email to