Re: Slow performance on queries with aggregation function

Jone Lura Sun, 23 Feb 2014 23:54:24 -0800

Hi Stephen,

Thank you for your reply. It was very helpful.


This weekend everything was working fine, but when I came back to the office, 
the performance was slow again.

I checked the jobtracker and tasktracker and narrowed it down to; Shuffle 
Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

After some research I found out that this might be related to DNS issues, and I 
noticed that  the home network address (10.0.0.1) is visible on the tasktracker 
webpage, whilst in the office, I have 10.20.30.152, and I therefore suspect 
some configuration issues in hadoop related to the host.

Does anyone have any idea in which configuration I can set 0.0.0.0 for default 
listening address or a hostname?

Best regards,

Jone

On 22 Feb 2014, at 05:24, Stephen Sprague <sprag...@gmail.com> wrote:

> Hi Jone,
> um.  i can say for sure something is wrong. :)
> 
> i would _start_ by going to the tasktracker. this is your friend.  find your 
> job and look for failed reducers.  That's the starting point anyway, IMHO.
> 
> 
> 
> On Fri, Feb 21, 2014 at 11:35 AM, Jone Lura <jone.l...@ecc.no> wrote:
> Hi,
> 
> I have tried some variations of queries with aggregation function such as the 
> following query;
> 
> select max(total) from my_table;
> 
> and 
> 
> select id, sum(total) from my_table group by id
> 
> In my junit tests, I only have two rows with data, but the queries are 
> extremely slow.
> 
> The job detail output shows me the following;
> 
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2014-02-21 17:31:42,544 Stage-1 map = 0%,  reduce = 0%
> 2014-02-21 17:31:45,548 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:31:46,899 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:31:55,446 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:32:34,358 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:32:40,040 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:32:45,653 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:32:46,999 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:32:55,544 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:33:34,454 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:33:40,130 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:33:45,742 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:33:47,093 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 17:33:55,632 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:27:48,005 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:27:48,461 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:27:48,311 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:27:48,574 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:27:48,932 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:28:48,915 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:28:48,915 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:28:48,933 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:28:48,933 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:28:49,727 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:29:47,995 Stage-1 map = 100%,  reduce = 100%
> 2014-02-21 19:29:48,997 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:29:49,018 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:29:49,019 Stage-1 map = 100%,  reduce = 0%
> 2014-02-21 19:29:49,824 Stage-1 map = 100%,  reduce = 0%
> 
> I am relatively new to Hadoop and Hive and I do not know if this is normal, 
> or if I have missed some configuration details.
> 
> In my application I am expecting to have 500M or more rows.
> 
> Best regards,
> 
> Jone
>

Re: Slow performance on queries with aggregation function

Reply via email to