Re: Optimize Hive Query

Eugene Koifman Mon, 27 Jun 2016 15:22:14 -0700

if you have many acid tables you almost certainly want more than 2 workers.  If 
you have 2 workers (and a single metastore instance) you can run at most 2 
compaction jobs at a time.  Unless the tables are very small, compaction may 
fall behind if it's configured to run too serially.

In order for compactions to run automatically, at a minimum you have to have 
hive.compactor.initiator.on=true for one standalone metastore instance.
hive.compactor.delta.num.threshold determines when compaction is triggered for 
a given table/partition.
There is more details in 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration

Look for log messages in metastore.log form Initiator/Cleaner classes.  If you 
don't see any, it must be disabled.

SHOW COMPACTIONS is a command you can run at CLI to see if there are any 
currently running.

you can also use ALTER TABLE 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionCompact)
 to launch compaction on demand.

could you send results of dfs -ls /apps/hive/warehouse/PRDDB.db/tuning_dd_key

thanks,
Eugene

From: "@Sanjiv Singh" <sanjiv.is...@gmail.com<mailto:sanjiv.is...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>, 
"sanjiv.is...@gmail.com<mailto:sanjiv.is...@gmail.com>" 
<sanjiv.is...@gmail.com<mailto:sanjiv.is...@gmail.com>>
Date: Sunday, June 26, 2016 at 1:11 PM
To: Gopal Vijayaraghavan <gop...@apache.org<mailto:gop...@apache.org>>
Cc: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: Optimize Hive Query

Thanks Gopal for your inputs ....For now I have create NON ACID table and 
loaded data ....see below from logs proper group splits happening .

2016-06-25 12:52:00,160 [INFO] [InputInitializer {Map 1} #0] 
|tez.HiveSplitGenerator|: Number of grouped splits: 512

On compaction issue , Compaction enabled with two workers. why compaction not 
happened ? will check metastore logs.

I have too many ACID tables on hive and how many worker should be configured ? 
currently it is 2.

Thanks a lot once again.

Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Fri, Jun 24, 2016 at 9:14 PM, @Sanjiv Singh 
<sanjiv.is...@gmail.com<mailto:sanjiv.is...@gmail.com>> wrote:
Thanks Gopal for your inputs. Let me run compaction explicitly on table then 
see how query works.

Let

Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Fri, Jun 24, 2016 at 7:53 PM, Gopal Vijayaraghavan 
<gop...@apache.org<mailto:gop...@apache.org>> wrote:

> Yes for this tables, ACID enabled.  it has only 256 files for each
>buckets. these are create only when data initially loaded in this table.

Yes, the initial load goes in as an insert DELTA too - that requires
another compaction to move into base files.

The fact that they haven't been automatically compacted yet, suggests that
the compactor isn't working for some reason (check hive metastore logs).

> One thing that I am not able to understand that its is running with 1
>MAPPER.

The size of deltas shows up as 0, till the compaction goes through - in
Hive2, it will be -1 which will be correctly interpreted as "unknown size".

> | -rw-r--r--   3 H56473 hdfs  215973009 2016-06-23 17:38
>/apps/hive/warehouse/PRDDB.db/tuning_dd_key/delta_0001570_0001570/bucket_0
>0000  |

Clearly an issue due to the lack of compaction - I see a single delta with
255 buckets and no base_* files at all.

Cheers,
Gopal

Re: Optimize Hive Query

Reply via email to