By default, I have these:

hadoop.proxyuser.hive.hosts=*
hadoop.proxyuser.hive.groups=*

Also, I have impersonation enabled. These settings should have made the
compaction to run without issues, but does not. What am I missing?

On Wed, Nov 16, 2016 at 6:00 PM, Manoj Murumkar <manoj.murum...@gmail.com>
wrote:

> Okay, made the cleaner work as well :)
>
> When the files are owned by a user that is different from the user that
> runs metastore ("hive"), this doesn't work. Just to test this, I changed
> ownership of the files for the table to "hive" and ran a major compaction
> and it does the job.
>
> drwxrwxrwt   - hive hive          0 2016-11-17 01:35
> /user/hive/warehouse/txntest.db/txntab3/base_0000024
> -rw-r--r--   3 hive hive     451008 2016-11-17 01:35
> /user/hive/warehouse/txntest.db/txntab3/base_0000024/bucket_00000
>
> I am going to follow this piece of documentation, so it works:
> Configuration Values to Set for Compaction
>
> If the data in your system is not owned by the Hive user (i.e., the user
> that the Hive metastore runs as), then Hive will need permission to run as
> the user who owns the data in order to perform compactions.  If you have
> already set up HiveServer2 to impersonate users, then the only additional
> work to do is assure that Hive has the right to impersonate users from the
> host running the Hive metastore.  This is done by adding the hostname to
> hadoop.proxyuser.hive.hosts in Hadoop's core-site.xml file.  If you have
> not already done this, then you will need to configure Hive to act as a
> proxy user.  This requires you to set up keytabs for the user running the
> Hive metastore and add hadoop.proxyuser.hive.hosts and hadoop.proxyuser.
> hive.groups to Hadoop's core-site.xml file.  See the Hadoop documentation
> on secure mode for your version of Hadoop (e.g., for Hadoop 2.5.1 it is at 
> Hadoop
> in Secure Mode
> <http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/SecureMode.html>
> ).
>
>
> On Wed, Nov 16, 2016 at 5:43 PM, Manoj Murumkar <manoj.murum...@gmail.com>
> wrote:
>
>> Further update:
>>
>> I see this:
>>
>> +-----------+----------+------------+--------+--------------
>> -------+-----------+----------------+--+
>> |  dbname   | tabname  |  partname  |  type  |        state        |
>> workerid  |   starttime    |
>> +-----------+----------+------------+--------+--------------
>> -------+-----------+----------------+--+
>> | Database  | Table    | Partition  | Type   | State               |
>> Worker    | Start Time     |
>> | txntest   | txntab3  | NULL       | MAJOR  | ready for cleaning  |
>> NULL      | 1479346924000  |
>> +-----------+----------+------------+--------+--------------
>> -------+-----------+----------------+--+
>>
>> However, I don't see cleaner actually run in hive logs (I am looking for
>> a string like "compactor.Cleaner").
>>
>>
>> On Wed, Nov 16, 2016 at 5:30 PM, Manoj Murumkar <manoj.murum...@gmail.com
>> > wrote:
>>
>>> Quick update:
>>>
>>> After each compaction, files under base directory (for the buckets) have
>>> latest data. However, I am expecting to see all delta files (and
>>> directories) gone, as they should be merged in the base directory.
>>> Otherwise, we'll start seeing too many small files on HDFS which is a
>>> problem. Am I understanding this feature correctly in assuming so?
>>>
>>> On Wed, Nov 16, 2016 at 5:24 PM, Manoj Murumkar <
>>> manoj.murum...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are trying to implement transaction feature in hive. I created
>>>> following table:
>>>>
>>>> +-----------------------------------------------------------
>>>> -----------------------------+--+
>>>> |                                     createtab_stmt
>>>> |
>>>> +-----------------------------------------------------------
>>>> -----------------------------+--+
>>>> | CREATE TABLE `txntest.txntab3`(
>>>>                                             |
>>>> |   `id` int,
>>>> |
>>>> |   `name` string)
>>>> |
>>>> | CLUSTERED BY (
>>>>                                             |
>>>> |   id)
>>>> |
>>>> | INTO 2 BUCKETS
>>>> |
>>>> | ROW FORMAT SERDE
>>>>                                               |
>>>> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
>>>> |
>>>> | STORED AS INPUTFORMAT
>>>>                                                |
>>>> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
>>>> |
>>>> | OUTPUTFORMAT
>>>> |
>>>> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
>>>> |
>>>> | LOCATION
>>>> |
>>>> |   'hdfs://or1010051016175.corp.adobe.com:8020/user/hive/wareho
>>>> use/txntest.db/txntab3'  |
>>>> | TBLPROPERTIES (
>>>>                                            |
>>>> |   'COLUMN_STATS_ACCURATE'='true',
>>>> |
>>>> |   'numFiles'='22',
>>>> |
>>>> |   'numRows'='90000',
>>>> |
>>>> |   'rawDataSize'='0',
>>>> |
>>>> |   'totalSize'='3564019',
>>>> |
>>>> |   'transactional'='true',
>>>> |
>>>> |   'transient_lastDdlTime'='1479329198')
>>>> |
>>>> +-----------------------------------------------------------
>>>> -----------------------------+--+
>>>>
>>>> I inserted 90000 rows in it in multiple iterations, so it created 22
>>>> files (as is visible above). I have run multiple compactions (major and
>>>> minor), but nothing seems to happen on HDFS. What am I missing?
>>>>
>>>> I have following configuration:
>>>>
>>>> Metastore:
>>>>
>>>> hive.compactor.initiator.on = true;
>>>> hive.compactor.worker.threads = 2;
>>>>
>>>> Client:
>>>>
>>>> hive.support.concurrency = true;
>>>> hive.exec.dynamic.partition.mode = nonstrict;
>>>> hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
>>>>
>>>> Can someone point me in right direction? Compaction process did run
>>>> (verified via SHOW COMPACTIONS) and I also see we have base directory
>>>> created on HDFS.
>>>> I was expecting all the delta directories gone when major compation
>>>> runs.
>>>>
>>>> drwxrwxrwt   - admin    hive          0 2016-11-16 20:47
>>>> /user/hive/warehouse/txntest.db/txntab3/base_0000021
>>>> -rw-r--r--   3 admin    hive     227916 2016-11-16 20:47
>>>> /user/hive/warehouse/txntest.db/txntab3/base_0000021/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:33
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000003_0000003
>>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:33
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000003_000000
>>>> 3/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:33
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000004_0000004
>>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:33
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000004_000000
>>>> 4/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:34
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000005_0000005
>>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:33
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000005_000000
>>>> 5/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:34
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000006_0000006
>>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:34
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000006_000000
>>>> 6/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:34
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000007_0000007
>>>> -rw-r--r--   3 nex37045 hive        636 2016-11-16 01:34
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000007_000000
>>>> 7/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:36
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000008_0000008
>>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:36
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000008_000000
>>>> 8/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:36
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000009_0000009
>>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:36
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000009_000000
>>>> 9/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:37
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000010_0000010
>>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:37
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000010_000001
>>>> 0/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:37
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000011_0000011
>>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 01:37
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000011_000001
>>>> 1/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:37
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000012_0000012
>>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 01:37
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000012_000001
>>>> 2/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:44
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000013_0000013
>>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 01:44
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000013_000001
>>>> 3/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:45
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000014_0000014
>>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 01:45
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000014_000001
>>>> 4/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 02:02
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000015
>>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 02:02
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_000001
>>>> 5/bucket_00000
>>>> drwxrwxrwt   - admin    hive          0 2016-11-16 02:03
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000016
>>>> -rw-r--r--   3 admin    hive        531 2016-11-16 02:03
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_000001
>>>> 6/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 02:03
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000016_0000016
>>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 02:03
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000016_000001
>>>> 6/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 20:37
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000017
>>>> -rw-r--r--   3 nex37045 hive     156395 2016-11-16 20:37
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_000001
>>>> 7/bucket_00000
>>>> drwxrwxrwt   - admin    hive          0 2016-11-16 20:40
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000019
>>>> -rw-r--r--   3 admin    hive    2598250 2016-11-16 20:40
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_000001
>>>> 9/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 20:39
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000018_0000018
>>>> -rw-r--r--   3 nex37045 hive       4737 2016-11-16 20:39
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000018_000001
>>>> 8/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 20:39
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000019_0000019
>>>> -rw-r--r--   3 nex37045 hive     192658 2016-11-16 20:39
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000019_000001
>>>> 9/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 20:45
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000020_0000020
>>>> -rw-r--r--   3 nex37045 hive     192835 2016-11-16 20:45
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000020_000002
>>>> 0/bucket_00000
>>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 20:46
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000021_0000021
>>>> -rw-r--r--   3 nex37045 hive     201206 2016-11-16 20:46
>>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000021_000002
>>>> 1/bucket_00000
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Manoj
>>>>
>>>>
>>>
>>
>

Reply via email to