Okay, made the cleaner work as well :)

When the files are owned by a user that is different from the user that
runs metastore ("hive"), this doesn't work. Just to test this, I changed
ownership of the files for the table to "hive" and ran a major compaction
and it does the job.

drwxrwxrwt   - hive hive          0 2016-11-17 01:35
/user/hive/warehouse/txntest.db/txntab3/base_0000024
-rw-r--r--   3 hive hive     451008 2016-11-17 01:35
/user/hive/warehouse/txntest.db/txntab3/base_0000024/bucket_00000

I am going to follow this piece of documentation, so it works:
Configuration Values to Set for Compaction

If the data in your system is not owned by the Hive user (i.e., the user
that the Hive metastore runs as), then Hive will need permission to run as
the user who owns the data in order to perform compactions.  If you have
already set up HiveServer2 to impersonate users, then the only additional
work to do is assure that Hive has the right to impersonate users from the
host running the Hive metastore.  This is done by adding the hostname to
hadoop.proxyuser.hive.hosts in Hadoop's core-site.xml file.  If you have
not already done this, then you will need to configure Hive to act as a
proxy user.  This requires you to set up keytabs for the user running the
Hive metastore and add hadoop.proxyuser.hive.hosts and
hadoop.proxyuser.hive.groups to Hadoop's core-site.xml file.  See the
Hadoop documentation on secure mode for your version of Hadoop (e.g., for
Hadoop 2.5.1 it is at Hadoop in Secure Mode
<http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/SecureMode.html>
).


On Wed, Nov 16, 2016 at 5:43 PM, Manoj Murumkar <manoj.murum...@gmail.com>
wrote:

> Further update:
>
> I see this:
>
> +-----------+----------+------------+--------+--------------
> -------+-----------+----------------+--+
> |  dbname   | tabname  |  partname  |  type  |        state        |
> workerid  |   starttime    |
> +-----------+----------+------------+--------+--------------
> -------+-----------+----------------+--+
> | Database  | Table    | Partition  | Type   | State               |
> Worker    | Start Time     |
> | txntest   | txntab3  | NULL       | MAJOR  | ready for cleaning  |
> NULL      | 1479346924000  |
> +-----------+----------+------------+--------+--------------
> -------+-----------+----------------+--+
>
> However, I don't see cleaner actually run in hive logs (I am looking for a
> string like "compactor.Cleaner").
>
>
> On Wed, Nov 16, 2016 at 5:30 PM, Manoj Murumkar <manoj.murum...@gmail.com>
> wrote:
>
>> Quick update:
>>
>> After each compaction, files under base directory (for the buckets) have
>> latest data. However, I am expecting to see all delta files (and
>> directories) gone, as they should be merged in the base directory.
>> Otherwise, we'll start seeing too many small files on HDFS which is a
>> problem. Am I understanding this feature correctly in assuming so?
>>
>> On Wed, Nov 16, 2016 at 5:24 PM, Manoj Murumkar <manoj.murum...@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> We are trying to implement transaction feature in hive. I created
>>> following table:
>>>
>>> +-----------------------------------------------------------
>>> -----------------------------+--+
>>> |                                     createtab_stmt
>>> |
>>> +-----------------------------------------------------------
>>> -----------------------------+--+
>>> | CREATE TABLE `txntest.txntab3`(
>>>                                             |
>>> |   `id` int,
>>> |
>>> |   `name` string)
>>> |
>>> | CLUSTERED BY (
>>>                                             |
>>> |   id)
>>> |
>>> | INTO 2 BUCKETS
>>> |
>>> | ROW FORMAT SERDE
>>>                                               |
>>> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
>>> |
>>> | STORED AS INPUTFORMAT
>>> |
>>> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
>>> |
>>> | OUTPUTFORMAT
>>> |
>>> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
>>> |
>>> | LOCATION
>>> |
>>> |   'hdfs://or1010051016175.corp.adobe.com:8020/user/hive/wareho
>>> use/txntest.db/txntab3'  |
>>> | TBLPROPERTIES (
>>>                                            |
>>> |   'COLUMN_STATS_ACCURATE'='true',
>>> |
>>> |   'numFiles'='22',
>>> |
>>> |   'numRows'='90000',
>>> |
>>> |   'rawDataSize'='0',
>>> |
>>> |   'totalSize'='3564019',
>>> |
>>> |   'transactional'='true',
>>> |
>>> |   'transient_lastDdlTime'='1479329198')
>>> |
>>> +-----------------------------------------------------------
>>> -----------------------------+--+
>>>
>>> I inserted 90000 rows in it in multiple iterations, so it created 22
>>> files (as is visible above). I have run multiple compactions (major and
>>> minor), but nothing seems to happen on HDFS. What am I missing?
>>>
>>> I have following configuration:
>>>
>>> Metastore:
>>>
>>> hive.compactor.initiator.on = true;
>>> hive.compactor.worker.threads = 2;
>>>
>>> Client:
>>>
>>> hive.support.concurrency = true;
>>> hive.exec.dynamic.partition.mode = nonstrict;
>>> hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
>>>
>>> Can someone point me in right direction? Compaction process did run
>>> (verified via SHOW COMPACTIONS) and I also see we have base directory
>>> created on HDFS.
>>> I was expecting all the delta directories gone when major compation runs.
>>>
>>> drwxrwxrwt   - admin    hive          0 2016-11-16 20:47
>>> /user/hive/warehouse/txntest.db/txntab3/base_0000021
>>> -rw-r--r--   3 admin    hive     227916 2016-11-16 20:47
>>> /user/hive/warehouse/txntest.db/txntab3/base_0000021/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:33
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000003_0000003
>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:33
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000003_000000
>>> 3/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:33
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000004_0000004
>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:33
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000004_000000
>>> 4/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:34
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000005_0000005
>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:33
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000005_000000
>>> 5/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:34
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000006_0000006
>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:34
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000006_000000
>>> 6/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:34
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000007_0000007
>>> -rw-r--r--   3 nex37045 hive        636 2016-11-16 01:34
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000007_000000
>>> 7/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:36
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000008_0000008
>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:36
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000008_000000
>>> 8/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:36
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000009_0000009
>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:36
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000009_000000
>>> 9/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:37
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000010_0000010
>>> -rw-r--r--   3 nex37045 hive        640 2016-11-16 01:37
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000010_000001
>>> 0/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:37
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000011_0000011
>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 01:37
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000011_000001
>>> 1/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:37
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000012_0000012
>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 01:37
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000012_000001
>>> 2/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:44
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000013_0000013
>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 01:44
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000013_000001
>>> 3/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 01:45
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000014_0000014
>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 01:45
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000014_000001
>>> 4/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 02:02
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000015
>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 02:02
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_000001
>>> 5/bucket_00000
>>> drwxrwxrwt   - admin    hive          0 2016-11-16 02:03
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_0000016
>>> -rw-r--r--   3 admin    hive        531 2016-11-16 02:03
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000015_000001
>>> 6/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 02:03
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000016_0000016
>>> -rw-r--r--   3 nex37045 hive        644 2016-11-16 02:03
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000016_000001
>>> 6/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 20:37
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000017
>>> -rw-r--r--   3 nex37045 hive     156395 2016-11-16 20:37
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_000001
>>> 7/bucket_00000
>>> drwxrwxrwt   - admin    hive          0 2016-11-16 20:40
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_0000019
>>> -rw-r--r--   3 admin    hive    2598250 2016-11-16 20:40
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000017_000001
>>> 9/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 20:39
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000018_0000018
>>> -rw-r--r--   3 nex37045 hive       4737 2016-11-16 20:39
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000018_000001
>>> 8/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 20:39
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000019_0000019
>>> -rw-r--r--   3 nex37045 hive     192658 2016-11-16 20:39
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000019_000001
>>> 9/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 20:45
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000020_0000020
>>> -rw-r--r--   3 nex37045 hive     192835 2016-11-16 20:45
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000020_000002
>>> 0/bucket_00000
>>> drwxr-xr-x   - nex37045 hive          0 2016-11-16 20:46
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000021_0000021
>>> -rw-r--r--   3 nex37045 hive     201206 2016-11-16 20:46
>>> /user/hive/warehouse/txntest.db/txntab3/delta_0000021_000002
>>> 1/bucket_00000
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Manoj
>>>
>>>
>>
>

Reply via email to