Answers inline. Alan.
> On Mar 29, 2017, at 03:08, Riccardo Iacomini <riccardo.iacom...@rdslab.com> > wrote: > > Hello, > I have some questions about the compaction process. I need to manually > trigger compaction operations on a standard partitioned orc table (not ACID), > and be able to get back the list of compacted files. I could achieve this via > HDFS, getting the directory listing and then triggering the compaction, but > will imply stopping the underlying processing to avoid new files to be added > in between. Here are some questions I could not answer myself from the > material I found online: > • Is the compaction executed as a MapReduce job? Yes. > > • Is there a way to get back the list of compacted files? No. Note that even doing listing in HDFS will be somewhat confusing because production of the new delta or base file (depending on whether it's a minor or major compaction) is decoupled from removing the old delta and/or base files. This is because readers may still be using the old files, and the cleanup cannot be done until those readers have finished. > > • How can you customize the compaction criteria? You can modify when Hive decides to initiate compaction and how many resources it allocates to compacting. See https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions Alan. > Also, any link to documentation/material is really appreciated. > > Thank you all for your time. > > Riccardo