Further update: I see this:
+-----------+----------+------------+--------+---------------------+-----------+----------------+--+ | dbname | tabname | partname | type | state | workerid | starttime | +-----------+----------+------------+--------+---------------------+-----------+----------------+--+ | Database | Table | Partition | Type | State | Worker | Start Time | | txntest | txntab3 | NULL | MAJOR | ready for cleaning | NULL | 1479346924000 | +-----------+----------+------------+--------+---------------------+-----------+----------------+--+ However, I don't see cleaner actually run in hive logs (I am looking for a string like "compactor.Cleaner"). On Wed, Nov 16, 2016 at 5:30 PM, Manoj Murumkar <manoj.murum...@gmail.com> wrote: > Quick update: > > After each compaction, files under base directory (for the buckets) have > latest data. However, I am expecting to see all delta files (and > directories) gone, as they should be merged in the base directory. > Otherwise, we'll start seeing too many small files on HDFS which is a > problem. Am I understanding this feature correctly in assuming so? > > On Wed, Nov 16, 2016 at 5:24 PM, Manoj Murumkar <manoj.murum...@gmail.com> > wrote: > >> Hi, >> >> We are trying to implement transaction feature in hive. I created >> following table: >> >> +----------------------------------------------------------- >> -----------------------------+--+ >> | createtab_stmt >> | >> +----------------------------------------------------------- >> -----------------------------+--+ >> | CREATE TABLE `txntest.txntab3`( >> | >> | `id` int, >> | >> | `name` string) >> | >> | CLUSTERED BY ( >> | >> | id) >> | >> | INTO 2 BUCKETS >> | >> | ROW FORMAT SERDE >> | >> | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' >> | >> | STORED AS INPUTFORMAT >> | >> | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' >> | >> | OUTPUTFORMAT >> | >> | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' >> | >> | LOCATION >> | >> | 'hdfs://or1010051016175.corp.adobe.com:8020/user/hive/wareho >> use/txntest.db/txntab3' | >> | TBLPROPERTIES ( >> | >> | 'COLUMN_STATS_ACCURATE'='true', >> | >> | 'numFiles'='22', >> | >> | 'numRows'='90000', >> | >> | 'rawDataSize'='0', >> | >> | 'totalSize'='3564019', >> | >> | 'transactional'='true', >> | >> | 'transient_lastDdlTime'='1479329198') >> | >> +----------------------------------------------------------- >> -----------------------------+--+ >> >> I inserted 90000 rows in it in multiple iterations, so it created 22 >> files (as is visible above). I have run multiple compactions (major and >> minor), but nothing seems to happen on HDFS. What am I missing? >> >> I have following configuration: >> >> Metastore: >> >> hive.compactor.initiator.on = true; >> hive.compactor.worker.threads = 2; >> >> Client: >> >> hive.support.concurrency = true; >> hive.exec.dynamic.partition.mode = nonstrict; >> hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; >> >> Can someone point me in right direction? Compaction process did run >> (verified via SHOW COMPACTIONS) and I also see we have base directory >> created on HDFS. >> I was expecting all the delta directories gone when major compation runs. >> >> drwxrwxrwt - admin hive 0 2016-11-16 20:47 >> /user/hive/warehouse/txntest.db/txntab3/base_0000021 >> -rw-r--r-- 3 admin hive 227916 2016-11-16 20:47 >> /user/hive/warehouse/txntest.db/txntab3/base_0000021/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:33 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000003_0000003 >> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000003_000000 >> 3/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:33 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000004_0000004 >> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000004_000000 >> 4/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000005_0000005 >> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000005_000000 >> 5/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000006_0000006 >> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:34 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000006_000000 >> 6/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000007_0000007 >> -rw-r--r-- 3 nex37045 hive 636 2016-11-16 01:34 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000007_000000 >> 7/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:36 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000008_0000008 >> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:36 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000008_000000 >> 8/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:36 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000009_0000009 >> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:36 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000009_000000 >> 9/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000010_0000010 >> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:37 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000010_000001 >> 0/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000011_0000011 >> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:37 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000011_000001 >> 1/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000012_0000012 >> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:37 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000012_000001 >> 2/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:44 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000013_0000013 >> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:44 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000013_000001 >> 3/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:45 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000014_0000014 >> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:45 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000014_000001 >> 4/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 02:02 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000015 >> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 02:02 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_000001 >> 5/bucket_00000 >> drwxrwxrwt - admin hive 0 2016-11-16 02:03 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000016 >> -rw-r--r-- 3 admin hive 531 2016-11-16 02:03 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_000001 >> 6/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 02:03 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000016_0000016 >> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 02:03 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000016_000001 >> 6/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:37 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000017 >> -rw-r--r-- 3 nex37045 hive 156395 2016-11-16 20:37 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_000001 >> 7/bucket_00000 >> drwxrwxrwt - admin hive 0 2016-11-16 20:40 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000019 >> -rw-r--r-- 3 admin hive 2598250 2016-11-16 20:40 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_000001 >> 9/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:39 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000018_0000018 >> -rw-r--r-- 3 nex37045 hive 4737 2016-11-16 20:39 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000018_000001 >> 8/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:39 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000019_0000019 >> -rw-r--r-- 3 nex37045 hive 192658 2016-11-16 20:39 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000019_000001 >> 9/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:45 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000020_0000020 >> -rw-r--r-- 3 nex37045 hive 192835 2016-11-16 20:45 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000020_000002 >> 0/bucket_00000 >> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:46 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000021_0000021 >> -rw-r--r-- 3 nex37045 hive 201206 2016-11-16 20:46 >> /user/hive/warehouse/txntest.db/txntab3/delta_0000021_000002 >> 1/bucket_00000 >> >> >> >> >> Thanks, >> >> Manoj >> >> >