Answers inline.

Alan.

> On Mar 29, 2017, at 03:08, Riccardo Iacomini <riccardo.iacom...@rdslab.com> 
> wrote:
> 
> Hello,
> I have some questions about the compaction process. I need to manually 
> trigger compaction operations on a standard partitioned orc table (not ACID), 
> and be able to get back the list of compacted files. I could achieve this via 
> HDFS, getting the directory listing and then triggering the compaction, but 
> will imply stopping the underlying processing to avoid new files to be added 
> in between. Here are some questions I could not answer myself from the 
> material I found online:
>       • Is the compaction executed as a MapReduce job?
Yes.

> 
>       • Is there a way to get back the list of compacted files?
No.  Note that even doing listing in HDFS will be somewhat confusing because 
production of the new delta or base file (depending on whether it's a minor or 
major compaction) is decoupled from removing the old delta and/or base files.  
This is because readers may still be using the old files, and the cleanup 
cannot be done until those readers have finished.

> 
>       • How can you customize the compaction criteria?
You can modify when Hive decides to initiate compaction and how many resources 
it allocates to compacting.  See 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions

Alan.

> Also, any link to documentation/material is really appreciated. 
> 
> Thank you all for your time.
> 
> Riccardo

Reply via email to