Okay, made the cleaner work as well :) When the files are owned by a user that is different from the user that runs metastore ("hive"), this doesn't work. Just to test this, I changed ownership of the files for the table to "hive" and ran a major compaction and it does the job.
drwxrwxrwt - hive hive 0 2016-11-17 01:35 /user/hive/warehouse/txntest.db/txntab3/base_0000024 -rw-r--r-- 3 hive hive 451008 2016-11-17 01:35 /user/hive/warehouse/txntest.db/txntab3/base_0000024/bucket_00000 I am going to follow this piece of documentation, so it works: Configuration Values to Set for Compaction If the data in your system is not owned by the Hive user (i.e., the user that the Hive metastore runs as), then Hive will need permission to run as the user who owns the data in order to perform compactions. If you have already set up HiveServer2 to impersonate users, then the only additional work to do is assure that Hive has the right to impersonate users from the host running the Hive metastore. This is done by adding the hostname to hadoop.proxyuser.hive.hosts in Hadoop's core-site.xml file. If you have not already done this, then you will need to configure Hive to act as a proxy user. This requires you to set up keytabs for the user running the Hive metastore and add hadoop.proxyuser.hive.hosts and hadoop.proxyuser.hive.groups to Hadoop's core-site.xml file. See the Hadoop documentation on secure mode for your version of Hadoop (e.g., for Hadoop 2.5.1 it is at Hadoop in Secure Mode <http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/SecureMode.html> ). On Wed, Nov 16, 2016 at 5:43 PM, Manoj Murumkar <manoj.murum...@gmail.com> wrote: > Further update: > > I see this: > > +-----------+----------+------------+--------+-------------- > -------+-----------+----------------+--+ > | dbname | tabname | partname | type | state | > workerid | starttime | > +-----------+----------+------------+--------+-------------- > -------+-----------+----------------+--+ > | Database | Table | Partition | Type | State | > Worker | Start Time | > | txntest | txntab3 | NULL | MAJOR | ready for cleaning | > NULL | 1479346924000 | > +-----------+----------+------------+--------+-------------- > -------+-----------+----------------+--+ > > However, I don't see cleaner actually run in hive logs (I am looking for a > string like "compactor.Cleaner"). > > > On Wed, Nov 16, 2016 at 5:30 PM, Manoj Murumkar <manoj.murum...@gmail.com> > wrote: > >> Quick update: >> >> After each compaction, files under base directory (for the buckets) have >> latest data. However, I am expecting to see all delta files (and >> directories) gone, as they should be merged in the base directory. >> Otherwise, we'll start seeing too many small files on HDFS which is a >> problem. Am I understanding this feature correctly in assuming so? >> >> On Wed, Nov 16, 2016 at 5:24 PM, Manoj Murumkar <manoj.murum...@gmail.com >> > wrote: >> >>> Hi, >>> >>> We are trying to implement transaction feature in hive. I created >>> following table: >>> >>> +----------------------------------------------------------- >>> -----------------------------+--+ >>> | createtab_stmt >>> | >>> +----------------------------------------------------------- >>> -----------------------------+--+ >>> | CREATE TABLE `txntest.txntab3`( >>> | >>> | `id` int, >>> | >>> | `name` string) >>> | >>> | CLUSTERED BY ( >>> | >>> | id) >>> | >>> | INTO 2 BUCKETS >>> | >>> | ROW FORMAT SERDE >>> | >>> | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' >>> | >>> | STORED AS INPUTFORMAT >>> | >>> | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' >>> | >>> | OUTPUTFORMAT >>> | >>> | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' >>> | >>> | LOCATION >>> | >>> | 'hdfs://or1010051016175.corp.adobe.com:8020/user/hive/wareho >>> use/txntest.db/txntab3' | >>> | TBLPROPERTIES ( >>> | >>> | 'COLUMN_STATS_ACCURATE'='true', >>> | >>> | 'numFiles'='22', >>> | >>> | 'numRows'='90000', >>> | >>> | 'rawDataSize'='0', >>> | >>> | 'totalSize'='3564019', >>> | >>> | 'transactional'='true', >>> | >>> | 'transient_lastDdlTime'='1479329198') >>> | >>> +----------------------------------------------------------- >>> -----------------------------+--+ >>> >>> I inserted 90000 rows in it in multiple iterations, so it created 22 >>> files (as is visible above). I have run multiple compactions (major and >>> minor), but nothing seems to happen on HDFS. What am I missing? >>> >>> I have following configuration: >>> >>> Metastore: >>> >>> hive.compactor.initiator.on = true; >>> hive.compactor.worker.threads = 2; >>> >>> Client: >>> >>> hive.support.concurrency = true; >>> hive.exec.dynamic.partition.mode = nonstrict; >>> hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; >>> >>> Can someone point me in right direction? Compaction process did run >>> (verified via SHOW COMPACTIONS) and I also see we have base directory >>> created on HDFS. >>> I was expecting all the delta directories gone when major compation runs. >>> >>> drwxrwxrwt - admin hive 0 2016-11-16 20:47 >>> /user/hive/warehouse/txntest.db/txntab3/base_0000021 >>> -rw-r--r-- 3 admin hive 227916 2016-11-16 20:47 >>> /user/hive/warehouse/txntest.db/txntab3/base_0000021/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:33 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000003_0000003 >>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000003_000000 >>> 3/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:33 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000004_0000004 >>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000004_000000 >>> 4/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000005_0000005 >>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000005_000000 >>> 5/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000006_0000006 >>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:34 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000006_000000 >>> 6/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000007_0000007 >>> -rw-r--r-- 3 nex37045 hive 636 2016-11-16 01:34 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000007_000000 >>> 7/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:36 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000008_0000008 >>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:36 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000008_000000 >>> 8/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:36 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000009_0000009 >>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:36 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000009_000000 >>> 9/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000010_0000010 >>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:37 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000010_000001 >>> 0/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000011_0000011 >>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:37 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000011_000001 >>> 1/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000012_0000012 >>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:37 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000012_000001 >>> 2/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:44 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000013_0000013 >>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:44 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000013_000001 >>> 3/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:45 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000014_0000014 >>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:45 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000014_000001 >>> 4/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 02:02 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000015 >>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 02:02 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_000001 >>> 5/bucket_00000 >>> drwxrwxrwt - admin hive 0 2016-11-16 02:03 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000016 >>> -rw-r--r-- 3 admin hive 531 2016-11-16 02:03 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_000001 >>> 6/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 02:03 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000016_0000016 >>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 02:03 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000016_000001 >>> 6/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:37 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000017 >>> -rw-r--r-- 3 nex37045 hive 156395 2016-11-16 20:37 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_000001 >>> 7/bucket_00000 >>> drwxrwxrwt - admin hive 0 2016-11-16 20:40 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000019 >>> -rw-r--r-- 3 admin hive 2598250 2016-11-16 20:40 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_000001 >>> 9/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:39 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000018_0000018 >>> -rw-r--r-- 3 nex37045 hive 4737 2016-11-16 20:39 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000018_000001 >>> 8/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:39 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000019_0000019 >>> -rw-r--r-- 3 nex37045 hive 192658 2016-11-16 20:39 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000019_000001 >>> 9/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:45 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000020_0000020 >>> -rw-r--r-- 3 nex37045 hive 192835 2016-11-16 20:45 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000020_000002 >>> 0/bucket_00000 >>> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:46 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000021_0000021 >>> -rw-r--r-- 3 nex37045 hive 201206 2016-11-16 20:46 >>> /user/hive/warehouse/txntest.db/txntab3/delta_0000021_000002 >>> 1/bucket_00000 >>> >>> >>> >>> >>> Thanks, >>> >>> Manoj >>> >>> >> >