By default, I have these: hadoop.proxyuser.hive.hosts=* hadoop.proxyuser.hive.groups=*
Also, I have impersonation enabled. These settings should have made the compaction to run without issues, but does not. What am I missing? On Wed, Nov 16, 2016 at 6:00 PM, Manoj Murumkar <manoj.murum...@gmail.com> wrote: > Okay, made the cleaner work as well :) > > When the files are owned by a user that is different from the user that > runs metastore ("hive"), this doesn't work. Just to test this, I changed > ownership of the files for the table to "hive" and ran a major compaction > and it does the job. > > drwxrwxrwt - hive hive 0 2016-11-17 01:35 > /user/hive/warehouse/txntest.db/txntab3/base_0000024 > -rw-r--r-- 3 hive hive 451008 2016-11-17 01:35 > /user/hive/warehouse/txntest.db/txntab3/base_0000024/bucket_00000 > > I am going to follow this piece of documentation, so it works: > Configuration Values to Set for Compaction > > If the data in your system is not owned by the Hive user (i.e., the user > that the Hive metastore runs as), then Hive will need permission to run as > the user who owns the data in order to perform compactions. If you have > already set up HiveServer2 to impersonate users, then the only additional > work to do is assure that Hive has the right to impersonate users from the > host running the Hive metastore. This is done by adding the hostname to > hadoop.proxyuser.hive.hosts in Hadoop's core-site.xml file. If you have > not already done this, then you will need to configure Hive to act as a > proxy user. This requires you to set up keytabs for the user running the > Hive metastore and add hadoop.proxyuser.hive.hosts and hadoop.proxyuser. > hive.groups to Hadoop's core-site.xml file. See the Hadoop documentation > on secure mode for your version of Hadoop (e.g., for Hadoop 2.5.1 it is at > Hadoop > in Secure Mode > <http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/SecureMode.html> > ). > > > On Wed, Nov 16, 2016 at 5:43 PM, Manoj Murumkar <manoj.murum...@gmail.com> > wrote: > >> Further update: >> >> I see this: >> >> +-----------+----------+------------+--------+-------------- >> -------+-----------+----------------+--+ >> | dbname | tabname | partname | type | state | >> workerid | starttime | >> +-----------+----------+------------+--------+-------------- >> -------+-----------+----------------+--+ >> | Database | Table | Partition | Type | State | >> Worker | Start Time | >> | txntest | txntab3 | NULL | MAJOR | ready for cleaning | >> NULL | 1479346924000 | >> +-----------+----------+------------+--------+-------------- >> -------+-----------+----------------+--+ >> >> However, I don't see cleaner actually run in hive logs (I am looking for >> a string like "compactor.Cleaner"). >> >> >> On Wed, Nov 16, 2016 at 5:30 PM, Manoj Murumkar <manoj.murum...@gmail.com >> > wrote: >> >>> Quick update: >>> >>> After each compaction, files under base directory (for the buckets) have >>> latest data. However, I am expecting to see all delta files (and >>> directories) gone, as they should be merged in the base directory. >>> Otherwise, we'll start seeing too many small files on HDFS which is a >>> problem. Am I understanding this feature correctly in assuming so? >>> >>> On Wed, Nov 16, 2016 at 5:24 PM, Manoj Murumkar < >>> manoj.murum...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> We are trying to implement transaction feature in hive. I created >>>> following table: >>>> >>>> +----------------------------------------------------------- >>>> -----------------------------+--+ >>>> | createtab_stmt >>>> | >>>> +----------------------------------------------------------- >>>> -----------------------------+--+ >>>> | CREATE TABLE `txntest.txntab3`( >>>> | >>>> | `id` int, >>>> | >>>> | `name` string) >>>> | >>>> | CLUSTERED BY ( >>>> | >>>> | id) >>>> | >>>> | INTO 2 BUCKETS >>>> | >>>> | ROW FORMAT SERDE >>>> | >>>> | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' >>>> | >>>> | STORED AS INPUTFORMAT >>>> | >>>> | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' >>>> | >>>> | OUTPUTFORMAT >>>> | >>>> | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' >>>> | >>>> | LOCATION >>>> | >>>> | 'hdfs://or1010051016175.corp.adobe.com:8020/user/hive/wareho >>>> use/txntest.db/txntab3' | >>>> | TBLPROPERTIES ( >>>> | >>>> | 'COLUMN_STATS_ACCURATE'='true', >>>> | >>>> | 'numFiles'='22', >>>> | >>>> | 'numRows'='90000', >>>> | >>>> | 'rawDataSize'='0', >>>> | >>>> | 'totalSize'='3564019', >>>> | >>>> | 'transactional'='true', >>>> | >>>> | 'transient_lastDdlTime'='1479329198') >>>> | >>>> +----------------------------------------------------------- >>>> -----------------------------+--+ >>>> >>>> I inserted 90000 rows in it in multiple iterations, so it created 22 >>>> files (as is visible above). I have run multiple compactions (major and >>>> minor), but nothing seems to happen on HDFS. What am I missing? >>>> >>>> I have following configuration: >>>> >>>> Metastore: >>>> >>>> hive.compactor.initiator.on = true; >>>> hive.compactor.worker.threads = 2; >>>> >>>> Client: >>>> >>>> hive.support.concurrency = true; >>>> hive.exec.dynamic.partition.mode = nonstrict; >>>> hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; >>>> >>>> Can someone point me in right direction? Compaction process did run >>>> (verified via SHOW COMPACTIONS) and I also see we have base directory >>>> created on HDFS. >>>> I was expecting all the delta directories gone when major compation >>>> runs. >>>> >>>> drwxrwxrwt - admin hive 0 2016-11-16 20:47 >>>> /user/hive/warehouse/txntest.db/txntab3/base_0000021 >>>> -rw-r--r-- 3 admin hive 227916 2016-11-16 20:47 >>>> /user/hive/warehouse/txntest.db/txntab3/base_0000021/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:33 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000003_0000003 >>>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000003_000000 >>>> 3/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:33 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000004_0000004 >>>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000004_000000 >>>> 4/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000005_0000005 >>>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:33 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000005_000000 >>>> 5/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000006_0000006 >>>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:34 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000006_000000 >>>> 6/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:34 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000007_0000007 >>>> -rw-r--r-- 3 nex37045 hive 636 2016-11-16 01:34 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000007_000000 >>>> 7/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:36 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000008_0000008 >>>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:36 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000008_000000 >>>> 8/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:36 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000009_0000009 >>>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:36 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000009_000000 >>>> 9/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000010_0000010 >>>> -rw-r--r-- 3 nex37045 hive 640 2016-11-16 01:37 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000010_000001 >>>> 0/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000011_0000011 >>>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:37 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000011_000001 >>>> 1/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:37 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000012_0000012 >>>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:37 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000012_000001 >>>> 2/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:44 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000013_0000013 >>>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:44 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000013_000001 >>>> 3/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 01:45 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000014_0000014 >>>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 01:45 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000014_000001 >>>> 4/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 02:02 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000015 >>>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 02:02 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_000001 >>>> 5/bucket_00000 >>>> drwxrwxrwt - admin hive 0 2016-11-16 02:03 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000016 >>>> -rw-r--r-- 3 admin hive 531 2016-11-16 02:03 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_000001 >>>> 6/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 02:03 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000016_0000016 >>>> -rw-r--r-- 3 nex37045 hive 644 2016-11-16 02:03 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000016_000001 >>>> 6/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:37 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000017 >>>> -rw-r--r-- 3 nex37045 hive 156395 2016-11-16 20:37 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_000001 >>>> 7/bucket_00000 >>>> drwxrwxrwt - admin hive 0 2016-11-16 20:40 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000019 >>>> -rw-r--r-- 3 admin hive 2598250 2016-11-16 20:40 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_000001 >>>> 9/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:39 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000018_0000018 >>>> -rw-r--r-- 3 nex37045 hive 4737 2016-11-16 20:39 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000018_000001 >>>> 8/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:39 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000019_0000019 >>>> -rw-r--r-- 3 nex37045 hive 192658 2016-11-16 20:39 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000019_000001 >>>> 9/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:45 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000020_0000020 >>>> -rw-r--r-- 3 nex37045 hive 192835 2016-11-16 20:45 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000020_000002 >>>> 0/bucket_00000 >>>> drwxr-xr-x - nex37045 hive 0 2016-11-16 20:46 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000021_0000021 >>>> -rw-r--r-- 3 nex37045 hive 201206 2016-11-16 20:46 >>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000021_000002 >>>> 1/bucket_00000 >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Manoj >>>> >>>> >>> >> >