[ 
https://issues.apache.org/jira/browse/HIVE-21130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muthu Manickam updated HIVE-21130:
----------------------------------
    Description: 
Hive queries are stuck after initializing MapOperator. These hive queries are 
simple CTAS reading from a Hive table backed by rcfile format. This table has 
7500 partitions and 110 columns with column data types restricted to string and 
int. This is being on run on a EMR cluster with 100 data nodes with enough 
memory/cores.

 

After the query is submitted, YARN allocates the necessary containers. All the 
mapper tasks are in RUNNING state and all the map tasks reach the stage of 
initializing MapOperator and get stuck. Here is log message from the map tasks.

2019-01-17 *15:02:06,262* INFO [main] 
org.apache.hadoop.hive.ql.exec.MapOperator: Initializing operator MAP[0]

2019-01-17 *15:08:22,093* INFO [main] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper:

 

_*You can see that task is stuck for 6 minutes. This even gets to 20 minutes 
depending upon the number of parallel queries on the cluster. After the 
ExceMapper starts, the query completes in a minute.*_ 

I also noticed tread dumps in the logs that lead to a finding where it is 
spending all the time in this menthod *setReadNestedColumnPathConf().*

"main" #1 prio=5 os_prio=0 tid=0x00007f4cd805e800 nid=0x18074 runnable 
[0x00007f4cded7f000]
 java.lang.Thread.State: RUNNABLE
 at java.lang.String.toLowerCase(String.java:2670)
 at 
org.apache.hadoop.hive.serde2.ColumnProjectionUtils.*setReadNestedColumnPathConf(ColumnProjectionUtils.java:223)*
 at 
org.apache.hadoop.hive.serde2.ColumnProjectionUtils.appendNestedColumnPaths(ColumnProjectionUtils.java:145)
 at 
org.apache.hadoop.hive.ql.exec.MapOperator.cloneConfsForNestedColPruning(MapOperator.java:365)
 at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:419)
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106)

 

Based on the error, *I tried running the same query with 1 to 5  columns in the 
select clause and it runs quickly as expected. If I include more columns in the 
select clause, it falls in to the same issue with long pause times between 
MapOperator initialization and ExecMapper.*

 

The error is very similar to the one in this Jira 

https://issues.apache.org/jira/browse/HIVE-16969

 

 

 

 

  was:
Hive queries are stuck after initializing MapOperator. These hive queries are 
simple CTAS reading from a Hive table backed by rcfile format. This table has 
7500 partitions and 110 columns with column data types restricted to string and 
int. This is being on run on a EMR cluster with 100 data nodes with enough 
memory/cores.

 

After the query is submitted, YARN allocates the necessary containers. All the 
mapper tasks are in RUNNING state and all the map tasks reach this stage of 
initializing MapOperator. Here is log message from the map tasks.

2019-01-17 *15:02:06,262* INFO [main] 
org.apache.hadoop.hive.ql.exec.MapOperator: Initializing operator MAP[0]

2019-01-17 *15:08:22,093* INFO [main] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper:

_*After this message, they are stuck for 6 minutes. This even gets to 20 
minutes depending upon the number of parallel queries on the cluster. After the 
ExceMapper starts, the query completes in a minute.*_ 

I also noticed tread dumps in the logs that lead to a finding where it is 
spending all the time in this menthod *setReadNestedColumnPathConf().*

"main" #1 prio=5 os_prio=0 tid=0x00007f4cd805e800 nid=0x18074 runnable 
[0x00007f4cded7f000]
 java.lang.Thread.State: RUNNABLE
 at java.lang.String.toLowerCase(String.java:2670)
 at 
org.apache.hadoop.hive.serde2.ColumnProjectionUtils.*setReadNestedColumnPathConf(ColumnProjectionUtils.java:223)*
 at 
org.apache.hadoop.hive.serde2.ColumnProjectionUtils.appendNestedColumnPaths(ColumnProjectionUtils.java:145)
 at 
org.apache.hadoop.hive.ql.exec.MapOperator.cloneConfsForNestedColPruning(MapOperator.java:365)
 at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:419)
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106)

 

Based on the error, *I tried running the same query with 1 to 5  columns in the 
select clause and it runs quickly as expected. If I include more columns in the 
select clause, it falls in to the same issue with long pause times between 
MapOperator initialization and ExecMapper.*

 

The error is very similar to the one in this Jira 

https://issues.apache.org/jira/browse/HIVE-16969

 

 

 

 


> Mappers stuck after initializing MapOperator
> --------------------------------------------
>
>                 Key: HIVE-21130
>                 URL: https://issues.apache.org/jira/browse/HIVE-21130
>             Project: Hive
>          Issue Type: Bug
>          Components: Operators
>    Affects Versions: 2.3.2
>            Reporter: Muthu Manickam
>            Priority: Critical
>
> Hive queries are stuck after initializing MapOperator. These hive queries are 
> simple CTAS reading from a Hive table backed by rcfile format. This table has 
> 7500 partitions and 110 columns with column data types restricted to string 
> and int. This is being on run on a EMR cluster with 100 data nodes with 
> enough memory/cores.
>  
> After the query is submitted, YARN allocates the necessary containers. All 
> the mapper tasks are in RUNNING state and all the map tasks reach the stage 
> of initializing MapOperator and get stuck. Here is log message from the map 
> tasks.
> 2019-01-17 *15:02:06,262* INFO [main] 
> org.apache.hadoop.hive.ql.exec.MapOperator: Initializing operator MAP[0]
> 2019-01-17 *15:08:22,093* INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper:
>  
> _*You can see that task is stuck for 6 minutes. This even gets to 20 minutes 
> depending upon the number of parallel queries on the cluster. After the 
> ExceMapper starts, the query completes in a minute.*_ 
> I also noticed tread dumps in the logs that lead to a finding where it is 
> spending all the time in this menthod *setReadNestedColumnPathConf().*
> "main" #1 prio=5 os_prio=0 tid=0x00007f4cd805e800 nid=0x18074 runnable 
> [0x00007f4cded7f000]
>  java.lang.Thread.State: RUNNABLE
>  at java.lang.String.toLowerCase(String.java:2670)
>  at 
> org.apache.hadoop.hive.serde2.ColumnProjectionUtils.*setReadNestedColumnPathConf(ColumnProjectionUtils.java:223)*
>  at 
> org.apache.hadoop.hive.serde2.ColumnProjectionUtils.appendNestedColumnPaths(ColumnProjectionUtils.java:145)
>  at 
> org.apache.hadoop.hive.ql.exec.MapOperator.cloneConfsForNestedColPruning(MapOperator.java:365)
>  at 
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:419)
>  at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106)
>  
> Based on the error, *I tried running the same query with 1 to 5  columns in 
> the select clause and it runs quickly as expected. If I include more columns 
> in the select clause, it falls in to the same issue with long pause times 
> between MapOperator initialization and ExecMapper.*
>  
> The error is very similar to the one in this Jira 
> https://issues.apache.org/jira/browse/HIVE-16969
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to