Hi, I have tried some variations of queries with aggregation function such as the following query;
select max(total) from my_table; and select id, sum(total) from my_table group by id In my junit tests, I only have two rows with data, but the queries are extremely slow. The job detail output shows me the following; Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-02-21 17:31:42,544 Stage-1 map = 0%, reduce = 0% 2014-02-21 17:31:45,548 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:31:46,899 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:31:55,446 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:32:34,358 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:32:40,040 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:32:45,653 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:32:46,999 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:32:55,544 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:33:34,454 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:33:40,130 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:33:45,742 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:33:47,093 Stage-1 map = 100%, reduce = 0% 2014-02-21 17:33:55,632 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:27:48,005 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:27:48,461 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:27:48,311 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:27:48,574 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:27:48,932 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:28:48,915 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:28:48,915 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:28:48,933 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:28:48,933 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:28:49,727 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:29:47,995 Stage-1 map = 100%, reduce = 100% 2014-02-21 19:29:48,997 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:29:49,018 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:29:49,019 Stage-1 map = 100%, reduce = 0% 2014-02-21 19:29:49,824 Stage-1 map = 100%, reduce = 0% I am relatively new to Hadoop and Hive and I do not know if this is normal, or if I have missed some configuration details. In my application I am expecting to have 500M or more rows. Best regards, Jone