hive> explain select i_item_desc , i_category , i_class , i_current_price , i_item_id ,sum(ws_ext_sales_price) as itemrevenue, sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratio > from item JOIN web_sales ON (web_sales.ws_item_sk = item.i_item_sk) JOIN date_dim ON (web_sales.ws_sold_date_sk = date_dim.d_date_sk) > where item.i_category in ('Jewelry', 'Sports', 'Books') and date_dim.d_date between '2001-01-12' and '2001-02-11' and ws_sold_date between '2001-01-12' and '2001-02-11' > group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price > order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio > limit 100; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME item)) (TOK_TABREF (TOK_TABNAME web_sales)) (= (. (TOK_TABLE_OR_COL web_sales) ws_item_sk) (. (TOK_TABLE_OR_COL item) i_item_sk))) (TOK_TABREF (TOK_TABNAME date_dim)) (= (. (TOK_TABLE_OR_COL web_sales) ws_sold_date_sk) (. (TOK_TABLE_OR_COL date_dim) d_date_sk)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL i_item_desc)) (TOK_SELEXPR (TOK_TABLE_OR_COL i_category)) (TOK_SELEXPR (TOK_TABLE_OR_COL i_class)) (TOK_SELEXPR (TOK_TABLE_OR_COL i_current_price)) (TOK_SELEXPR (TOK_TABLE_OR_COL i_item_id)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL ws_ext_sales_price)) itemrevenue) (TOK_SELEXPR (/ (* (TOK_FUNCTION sum (TOK_TABLE_OR_COL ws_ext_sales_price)) 100) (TOK_FUNCTION sum (TOK_FUNCTION sum (TOK_TABLE_OR_COL ws_ext_sales_price)) (TOK_WINDOWSPEC (TOK_PARTITIONINGSPEC (TOK_DISTRIBUTEBY (TOK_TABLE_OR_COL i_class)))))) revenueratio)) (TOK_WHERE (and (and (TOK_FUNCTION in (. (TOK_TABLE_OR_COL item) i_category) 'Jewelry' 'Sports' 'Books') (TOK_FUNCTION between KW_FALSE (. (TOK_TABLE_OR_COL date_dim) d_date) '2001-01-12' '2001-02-11')) (TOK_FUNCTION between KW_FALSE (TOK_TABLE_OR_COL ws_sold_date) '2001-01-12' '2001-02-11'))) (TOK_GROUPBY (TOK_TABLE_OR_COL i_item_id) (TOK_TABLE_OR_COL i_item_desc) (TOK_TABLE_OR_COL i_category) (TOK_TABLE_OR_COL i_class) (TOK_TABLE_OR_COL i_current_price)) (TOK_ORDERBY (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL i_category)) (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL i_class)) (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL i_item_id)) (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL i_item_desc)) (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL revenueratio))) (TOK_LIMIT 100)))
STAGE DEPENDENCIES: Stage-10 is a root stage Stage-3 depends on stages: Stage-10 Stage-4 depends on stages: Stage-3 Stage-5 depends on stages: Stage-4 Stage-0 is a root stage STAGE PLANS: Stage: Stage-10 Map Reduce Local Work Alias -> Map Local Tables: date_dim Fetch Operator limit: -1 item Fetch Operator limit: -1 Alias -> Map Local Operator Tree: date_dim TableScan alias: date_dim Filter Operator predicate: expr: d_date BETWEEN '2001-01-12' AND '2001-02-11' type: boolean HashTable Sink Operator condition expressions: 0 {_col1} {_col4} {_col5} {_col10} {_col12} {_col47} 1 handleSkewJoin: false keys: 0 [Column[_col24]] 1 [Column[d_date_sk]] Position of Big Table: 0 item TableScan alias: item Filter Operator predicate: expr: (i_category) IN ('Jewelry', 'Sports', 'Books') type: boolean HashTable Sink Operator condition expressions: 0 {i_item_id} {i_item_desc} {i_current_price} {i_class} {i_category} 1 {ws_sold_date_sk} {ws_ext_sales_price} handleSkewJoin: false keys: 0 [Column[i_item_sk]] 1 [Column[ws_item_sk]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias -> Map Operator Tree: web_sales TableScan alias: web_sales Filter Operator predicate: expr: ws_sold_date BETWEEN '2001-01-12' AND '2001-02-11' type: boolean Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {i_item_id} {i_item_desc} {i_current_price} {i_class} {i_category} 1 {ws_sold_date_sk} {ws_ext_sales_price} handleSkewJoin: false keys: 0 [Column[i_item_sk]] 1 [Column[ws_item_sk]] outputColumnNames: _col1, _col4, _col5, _col10, _col12, _col24, _col47 Position of Big Table: 1 Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col1} {_col4} {_col5} {_col10} {_col12} {_col47} 1 handleSkewJoin: false keys: 0 [Column[_col24]] 1 [Column[d_date_sk]] outputColumnNames: _col1, _col4, _col5, _col10, _col12, _col47 Position of Big Table: 0 Select Operator expressions: expr: _col1 type: string expr: _col4 type: string expr: _col12 type: string expr: _col10 type: string expr: _col5 type: float expr: _col47 type: float outputColumnNames: _col1, _col4, _col12, _col10, _col5, _col47 Group By Operator aggregations: expr: sum(_col47) bucketGroup: false keys: expr: _col1 type: string expr: _col4 type: string expr: _col12 type: string expr: _col10 type: string expr: _col5 type: float mode: hash outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Reduce Output Operator key expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: string expr: _col3 type: string expr: _col4 type: float sort order: +++++ Map-reduce partition columns: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: string expr: _col3 type: string expr: _col4 type: float tag: -1 value expressions: expr: _col5 type: double Local Work: Map Reduce Local Work Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string expr: KEY._col1 type: string expr: KEY._col2 type: string expr: KEY._col3 type: string expr: KEY._col4 type: float mode: mergepartial outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-4 Map Reduce Alias -> Map Operator Tree: hdfs://localhost:9000/tmp/hive-tharsch/hive_2014-07-08_16-34-32_453_181611671719021152-1/-mr-10004 Reduce Output Operator key expressions: expr: _col3 type: string expr: _col3 type: string sort order: ++ Map-reduce partition columns: expr: _col3 type: string tag: -1 value expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: string expr: _col3 type: string expr: _col4 type: float expr: _col5 type: double Reduce Operator Tree: Extract PTF Operator Select Operator expressions: expr: _col1 type: string expr: _col2 type: string expr: _col3 type: string expr: _col4 type: float expr: _col0 type: string expr: _col5 type: double expr: ((_col5 * 100) / _wcol0) type: double outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-5 Map Reduce Alias -> Map Operator Tree: hdfs://localhost:9000/tmp/hive-tharsch/hive_2014-07-08_16-34-32_453_181611671719021152-1/-mr-10005 Reduce Output Operator key expressions: expr: _col1 type: string expr: _col2 type: string expr: _col4 type: string expr: _col0 type: string expr: _col6 type: double sort order: +++++ tag: -1 value expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: string expr: _col3 type: float expr: _col4 type: string expr: _col5 type: double expr: _col6 type: double Reduce Operator Tree: Extract Limit File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: 100 Time taken: 1.102 seconds, Fetched: 278 row(s) From: Hari Subramaniyan <hsubramani...@hortonworks.com<mailto:hsubramani...@hortonworks.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Tuesday, July 8, 2014 2:12 PM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: Re: hive job stuck at map=100%, reduce 0% Hi Tim Can you please provide the "explain plan" for the query. If this happens because of the memory issues with mapjoin in the query, you can set hive.auto.convert.join=false and give it a try. Thanks Hari On Tue, Jul 8, 2014 at 1:18 PM, Tim Harsch <thar...@yarcdata.com<mailto:thar...@yarcdata.com>> wrote: Hi, I asked a question on Stack Overflow (http://stackoverflow.com/questions/24621002/hive-job-stuck-at-map-100-redu ce-0) which hasn't seemed to get much traction, so I'd like to ask it here as well. I'm running hive-0.12.0 on hadoop-2.2.0. After submitting the query: select i_item_desc ,i_category ,i_class ,i_current_price ,i_item_id ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratio from item JOIN web_sales ON (web_sales.ws_item_sk = item.i_item_sk) JOIN date_dim ON (web_sales.ws_sold_date_sk = date_dim.d_date_sk) where item.i_category in ('Jewelry', 'Sports', 'Books') and date_dim.d_date between '2001-01-12' and '2001-02-11' and ws_sold_date between '2001-01-12' and '2001-02-11' group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100; I get the following errors in the logs: Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1 2014-07-07 15:26:16,893 Stage-3 map = 0%, reduce = 0% 2014-07-07 15:26:22,033 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.32 sec And then the last line repeats every second or so ad infinitum. If I look at container logs I see: 2014-07-07 17:12:17,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1404402886929_0036_m_000000_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 I've searched for the Exit code 143, but most the stuff out there refers to memory issue and I have memory set pretty large (following the advice of Container is running beyond memory limits <http://stackoverflow.com/questions/21005643/container-is-running-beyond-me mory-limits>). I have even tried adding 6GB to each of the settings in that post, still no luck. I've also run the job with: hive -hiveconf hive.root.logger=DEBUG,console which really just produces alot more info, but nothing I see makes clear what the issue is. I'm not sure where else to look... CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.