Re: Hive on Tez: Diagnosing query execution issues

Pala M Muthaia Fri, 23 May 2014 17:54:15 -0700

Adding the right hive users alias.


On Fri, May 23, 2014 at 5:52 PM, Pala M Muthaia <mchett...@rocketfuelinc.com
> wrote:

> Hi,
>
> I am trying to run a relatively heavy Hive query that joins 3 tables. The
> query succeeds on MR after increasing the mapper and reducer container
> memory:
>
> set mapreduce.map.memory.mb=4096;
> set mapreduce.reduce.memory.mb=8192;
>
> However, the same query, with same settings, on Tez, seems to get stuck in
> Reducer 2.  (The query is a join between 3 tables, hence has 3 Map and 2
> reduce nodes in the DAG).
>
> By stuck, i mean i see only the following in the container logs, for a
> long time:
> 2014-05-23 19:08:54,729 INFO [AMRM Callback Handler Thread]
> org.apache.tez.dag.app.rm.TaskScheduler: App total resource memory: 0 cpu:
> 0 taskAllocations: 301
>
>
> I need help with the following 2 questions:
>
> 1. Is there a separate setting for tez, to specify the amount of memory
> for a container, equivalent to the *.memory.mb settings for mapreduce?
> Maybe that value needs to be updated.
>
> 2. I already looked at the logs on the AM, and i only see the above log
> statements. How do i get more information on why the Reduce node in the
> query DAG is not progressing? Can i get more info from the reduce task
> logs? How do i determine the machines on which the reduce tasks were
> scheduled, so that i can look up the task logs, if any? The yarn resource
> manager UI doesn't show such information.
>
> When I changed the amount of data to one of the large tables by
> introducing sampling, and the query succeeded. I am suspecting memory
> issue, but i am not sure how much memory was allocated in the first place.
>
>
> Thanks.
> -pala
>
>
>
>

Re: Hive on Tez: Diagnosing query execution issues

Reply via email to