Quick update: After each compaction, files under base directory (for the buckets) have latest data. However, I am expecting to see all delta files (and directories) gone, as they should be merged in the base directory. Otherwise, we'll start seeing too many small files on HDFS which is a problem. Am I understanding this feature correctly in assuming so?
On Wed, Nov 16, 2016 at 5:24 PM, Manoj Murumkar <manoj.murum...@gmail.com> wrote: > Hi, > > We are trying to implement transaction feature in hive. I created > following table: > > +----------------------------------------------------------- > -----------------------------+--+ > | createtab_stmt > | > +----------------------------------------------------------- > -----------------------------+--+ > | CREATE TABLE `txntest.txntab3`( > | > | `id` int, > | > | `name` string) > | > | CLUSTERED BY ( > | > | id) > | > | INTO 2 BUCKETS > | > | ROW FORMAT SERDE > | > | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > | > | STORED AS INPUTFORMAT > | > | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > | > | OUTPUTFORMAT > | > | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > | > | LOCATION > | > | 'hdfs://or1010051016175.corp.adobe.com:8020/user/hive/ > warehouse/txntest.db/txntab3' | > | TBLPROPERTIES ( > | > | 'COLUMN_STATS_ACCURATE'='true', > | > | 'numFiles'='22', > | > | 'numRows'='90000', > | > | 'rawDataSize'='0', > | > | 'totalSize'='3564019', > | > | 'transactional'='true', > | > | 'transient_lastDdlTime'='1479329198') > | > +----------------------------------------------------------- > -----------------------------+--+ > > I inserted 90000 rows in it in multiple iterations, so it created 22 files > (as is visible above). I have run multiple compactions (major and minor), > but nothing seems to happen on HDFS. What am I missing? > > I have following configuration: > > Metastore: > > hive.compactor.initiator.on = true; > hive.compactor.worker.threads = 2; > > Client: > > hive.support.concurrency = true; > hive.exec.dynamic.partition.mode = nonstrict; > hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > > Can someone point me in right direction? Compaction process did run > (verified via SHOW COMPACTIONS) and I also see we have base directory > created on HDFS. > I was expecting all the delta directories gone when major compation runs. > > drwxrwxrwt - admin hive 0 2016-11-16 20:47 > /user/hive/warehouse/txntest.db/txntab3/base_0000021 > -rw-r--r-- 3 admin hive 227916 2016-11-16 20:47 > /user/hive/warehouse/txntest.db/txntab3/base_0000021/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:33 > /user/hive/warehouse/txntest.db/txntab3/delta_0000003_0000003 > -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 > /user/hive/warehouse/txntest.db/txntab3/delta_0000003_0000003/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:33 > /user/hive/warehouse/txntest.db/txntab3/delta_0000004_0000004 > -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 > /user/hive/warehouse/txntest.db/txntab3/delta_0000004_0000004/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 > /user/hive/warehouse/txntest.db/txntab3/delta_0000005_0000005 > -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 > /user/hive/warehouse/txntest.db/txntab3/delta_0000005_0000005/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 > /user/hive/warehouse/txntest.db/txntab3/delta_0000006_0000006 > -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:34 > /user/hive/warehouse/txntest.db/txntab3/delta_0000006_0000006/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 > /user/hive/warehouse/txntest.db/txntab3/delta_0000007_0000007 > -rw-r--r-- 3 nex37045 hive 636 2016-11-16 01:34 > /user/hive/warehouse/txntest.db/txntab3/delta_0000007_0000007/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:36 > /user/hive/warehouse/txntest.db/txntab3/delta_0000008_0000008 > -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:36 > /user/hive/warehouse/txntest.db/txntab3/delta_0000008_0000008/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:36 > /user/hive/warehouse/txntest.db/txntab3/delta_0000009_0000009 > -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:36 > /user/hive/warehouse/txntest.db/txntab3/delta_0000009_0000009/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 > /user/hive/warehouse/txntest.db/txntab3/delta_0000010_0000010 > -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:37 > /user/hive/warehouse/txntest.db/txntab3/delta_0000010_0000010/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 > /user/hive/warehouse/txntest.db/txntab3/delta_0000011_0000011 > -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:37 > /user/hive/warehouse/txntest.db/txntab3/delta_0000011_0000011/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 > /user/hive/warehouse/txntest.db/txntab3/delta_0000012_0000012 > -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:37 > /user/hive/warehouse/txntest.db/txntab3/delta_0000012_0000012/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:44 > /user/hive/warehouse/txntest.db/txntab3/delta_0000013_0000013 > -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:44 > /user/hive/warehouse/txntest.db/txntab3/delta_0000013_0000013/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 01:45 > /user/hive/warehouse/txntest.db/txntab3/delta_0000014_0000014 > -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:45 > /user/hive/warehouse/txntest.db/txntab3/delta_0000014_0000014/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 02:02 > /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000015 > -rw-r--r-- 3 nex37045 hive 644 2016-11-16 02:02 > /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000015/bucket_00000 > drwxrwxrwt - admin hive 0 2016-11-16 02:03 > /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000016 > -rw-r--r-- 3 admin hive 531 2016-11-16 02:03 > /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000016/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 02:03 > /user/hive/warehouse/txntest.db/txntab3/delta_0000016_0000016 > -rw-r--r-- 3 nex37045 hive 644 2016-11-16 02:03 > /user/hive/warehouse/txntest.db/txntab3/delta_0000016_0000016/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 20:37 > /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000017 > -rw-r--r-- 3 nex37045 hive 156395 2016-11-16 20:37 > /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000017/bucket_00000 > drwxrwxrwt - admin hive 0 2016-11-16 20:40 > /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000019 > -rw-r--r-- 3 admin hive 2598250 2016-11-16 20:40 > /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000019/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 20:39 > /user/hive/warehouse/txntest.db/txntab3/delta_0000018_0000018 > -rw-r--r-- 3 nex37045 hive 4737 2016-11-16 20:39 > /user/hive/warehouse/txntest.db/txntab3/delta_0000018_0000018/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 20:39 > /user/hive/warehouse/txntest.db/txntab3/delta_0000019_0000019 > -rw-r--r-- 3 nex37045 hive 192658 2016-11-16 20:39 > /user/hive/warehouse/txntest.db/txntab3/delta_0000019_0000019/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 20:45 > /user/hive/warehouse/txntest.db/txntab3/delta_0000020_0000020 > -rw-r--r-- 3 nex37045 hive 192835 2016-11-16 20:45 > /user/hive/warehouse/txntest.db/txntab3/delta_0000020_0000020/bucket_00000 > drwxr-xr-x - nex37045 hive 0 2016-11-16 20:46 > /user/hive/warehouse/txntest.db/txntab3/delta_0000021_0000021 > -rw-r--r-- 3 nex37045 hive 201206 2016-11-16 20:46 > /user/hive/warehouse/txntest.db/txntab3/delta_0000021_0000021/bucket_00000 > > > > > Thanks, > > Manoj > >