At first look, it seems like a heap out of memory issue. But need to perform detailed analysis to nail down the issue further. Hive logs can provide more insights to it.
We gave a presentation in Hadoop Summit on "Debugging Hive with Hadoop". The slide deck link is specified below. The slide# 42 and next couple of slides detail on how to investigate hive struck jobs. Hope this helps. Slide deck link: http://www.slideshare.net/altiscale/debugging-hive-with-hadoop-in-the-cloud --Bala G. On Tue, Jul 8, 2014 at 1:18 PM, Tim Harsch <thar...@yarcdata.com> wrote: > Hi, > I asked a question on Stack Overflow > ( > http://stackoverflow.com/questions/24621002/hive-job-stuck-at-map-100-redu > ce-0) which hasn't seemed to get much traction, so I'd like to ask it here > as well. > > I'm running hive-0.12.0 on hadoop-2.2.0. After submitting the query: > > select i_item_desc > ,i_category > ,i_class > ,i_current_price > ,i_item_id > ,sum(ws_ext_sales_price) as itemrevenue > ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over > (partition by i_class) as revenueratio > from item JOIN web_sales ON (web_sales.ws_item_sk = item.i_item_sk) JOIN > date_dim ON (web_sales.ws_sold_date_sk = date_dim.d_date_sk) > where item.i_category in ('Jewelry', 'Sports', 'Books') > and date_dim.d_date between '2001-01-12' and '2001-02-11' > and ws_sold_date between '2001-01-12' and '2001-02-11' > group by > i_item_id > ,i_item_desc > ,i_category > ,i_class > ,i_current_price > order by > i_category > ,i_class > ,i_item_id > ,i_item_desc > ,revenueratio > limit 100; > > I get the following errors in the logs: > > Hadoop job information for Stage-3: number of mappers: 1; number of > reducers: 1 > 2014-07-07 15:26:16,893 Stage-3 map = 0%, reduce = 0% > 2014-07-07 15:26:22,033 Stage-3 map = 100%, reduce = 0%, Cumulative CPU > 1.32 sec > > And then the last line repeats every second or so ad infinitum. If I look > at container logs I see: > > 2014-07-07 17:12:17,477 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics > report from attempt_1404402886929_0036_m_000000_0: Container killed by the > ApplicationMaster. > Container killed on request. Exit code is 143 > > I've searched for the Exit code 143, but most the stuff out there refers > to memory issue and I have memory set pretty large (following the advice > of Container is running beyond memory limits > < > http://stackoverflow.com/questions/21005643/container-is-running-beyond-me > mory-limits>). I have even tried adding 6GB to each of the settings in > that post, still no luck. > > I've also run the job with: > hive -hiveconf hive.root.logger=DEBUG,console > which really just produces alot more info, but nothing I see makes clear > what the issue is. > I'm not sure where else to look... > >