Hi Stephen, Thank you for your reply. It was very helpful.
This weekend everything was working fine, but when I came back to the office, the performance was slow again. I checked the jobtracker and tasktracker and narrowed it down to; Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. After some research I found out that this might be related to DNS issues, and I noticed that the home network address (10.0.0.1) is visible on the tasktracker webpage, whilst in the office, I have 10.20.30.152, and I therefore suspect some configuration issues in hadoop related to the host. Does anyone have any idea in which configuration I can set 0.0.0.0 for default listening address or a hostname? Best regards, Jone On 22 Feb 2014, at 05:24, Stephen Sprague <sprag...@gmail.com> wrote: > Hi Jone, > um. i can say for sure something is wrong. :) > > i would _start_ by going to the tasktracker. this is your friend. find your > job and look for failed reducers. That's the starting point anyway, IMHO. > > > > On Fri, Feb 21, 2014 at 11:35 AM, Jone Lura <jone.l...@ecc.no> wrote: > Hi, > > I have tried some variations of queries with aggregation function such as the > following query; > > select max(total) from my_table; > > and > > select id, sum(total) from my_table group by id > > In my junit tests, I only have two rows with data, but the queries are > extremely slow. > > The job detail output shows me the following; > > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 1 > 2014-02-21 17:31:42,544 Stage-1 map = 0%, reduce = 0% > 2014-02-21 17:31:45,548 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:31:46,899 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:31:55,446 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:32:34,358 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:32:40,040 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:32:45,653 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:32:46,999 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:32:55,544 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:33:34,454 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:33:40,130 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:33:45,742 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:33:47,093 Stage-1 map = 100%, reduce = 0% > 2014-02-21 17:33:55,632 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:27:48,005 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:27:48,461 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:27:48,311 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:27:48,574 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:27:48,932 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:28:48,915 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:28:48,915 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:28:48,933 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:28:48,933 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:28:49,727 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:29:47,995 Stage-1 map = 100%, reduce = 100% > 2014-02-21 19:29:48,997 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:29:49,018 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:29:49,019 Stage-1 map = 100%, reduce = 0% > 2014-02-21 19:29:49,824 Stage-1 map = 100%, reduce = 0% > > I am relatively new to Hadoop and Hive and I do not know if this is normal, > or if I have missed some configuration details. > > In my application I am expecting to have 500M or more rows. > > Best regards, > > Jone >