[jira] [Commented] (HIVE-16368) Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView Operation for hive on MR.

zhihai xu (JIRA) Mon, 03 Apr 2017 22:27:02 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954570#comment-15954570
 ]


zhihai xu commented on HIVE-16368:
----------------------------------

The query plan of Reduce Operator Tree for the MR job with  
LateralViewJoinOperator is
{code}
|   Stage: Stage-3                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|     Map Reduce                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|       Map Operator Tree:                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |       ......

|       Reduce Operator Tree:                                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|         Join Operator                                                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|           condition map:                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                Inner Join 0 to 1                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|           keys:                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|             0 _col7 (type: string)                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|             1 msg.chain_uuid (type: string)                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|           outputColumnNames: _col0, _col3, _col4, _col5, _col7, _col8, _col9, 
_col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19, _col20, 
_col21, _col22, _col26, _col27, _col28, _col31, _col32, _col33, _col34, _col35, 
_col36, _col44                                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|           Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats: 
COMPLETE Column stats: NONE                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                      |
|           Select Operator                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|             expressions: _col0 (type: string), _col3 (type: string), _col4 
(type: bigint), _col5 (type: bigint), _col7 (type: string), _col8 (type: 
string), _col9 (type: int), _col10 (type: int), _col11 (type: string), _col12 
(type: string), _col14 (type: double), _col15 (type: double), _col16 (type: 
double), _col17 (type: double), _col18 (type: double), _col19 (type: double), 
_col20 (type: double), _col21 (type: double), _col22 (type: double), _col26 
(type: timestamp), _col27 (type: string), _col28 (type: array<string>), _col31 
(type: double), _col32 (type: double), _col33 (type: double), _col34 (type: 
double), _col35 (type: string), _col36 (type: bigint), _col44.all_points (type: 
array<struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>>)
                                                                                
                                                                                
                                                                                
                                                                  |
|             outputColumnNames: _col0, _col3, _col4, _col5, _col7, _col8, 
_col9, _col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19, 
_col20, _col21, _col22, _col26, _col27, _col28, _col31, _col32, _col33, _col34, 
_col35, _col36, _col37                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                          |
|             Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats: 
COMPLETE Column stats: NONE                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|             Lateral View Forward                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|               Statistics: Num rows: 35439950 Data size: 1169518365 Basic 
stats: COMPLETE Column stats: NONE                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
|               Select Operator                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                 expressions: _col0 (type: string), _col10 (type: int), _col11 
(type: string), _col12 (type: string), _col14 (type: double), _col15 (type: 
double), _col16 (type: double), _col17 (type: double), _col18 (type: double), 
_col19 (type: double), _col20 (type: double), _col21 (type: double), _col22 
(type: double), _col26 (type: timestamp), _col27 (type: string), _col28 (type: 
array<string>), _col3 (type: string), _col31 (type: double), _col32 (type: 
double), _col33 (type: double), _col34 (type: double), _col35 (type: string), 
_col36 (type: bigint), _col4 (type: bigint), _col5 (type: bigint), _col7 (type: 
string), _col8 (type: string), _col9 (type: int)                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |
|                 outputColumnNames: _col0, _col10, _col11, _col12, _col14, 
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27, 
_col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5, 
_col7, _col8, _col9                                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                           |
|                 Statistics: Num rows: 35439950 Data size: 1169518365 Basic 
stats: COMPLETE Column stats: NONE                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
|                 Lateral View Join Operator                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                   outputColumnNames: _col0, _col10, _col11, _col12, _col14, 
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27, 
_col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5, 
_col7, _col8, _col9, _col38                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
|                   Statistics: Num rows: 70879900 Data size: 2339036730 Basic 
stats: COMPLETE Column stats: NONE                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                     |
|                   File Output Operator                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                     compressed: false                                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                     table:                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                         input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                            |
|                         output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                           |
|                         serde: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                   |
|               Select Operator                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                 expressions: _col37 (type: 
array<struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>>)
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          |
|                 outputColumnNames: _col0                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                 Statistics: Num rows: 35439950 Data size: 1169518365 Basic 
stats: COMPLETE Column stats: NONE                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
|                 UDTF Operator                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                   Statistics: Num rows: 35439950 Data size: 1169518365 Basic 
stats: COMPLETE Column stats: NONE                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                     |
|                   function name: explode                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                   Lateral View Join Operator                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                     outputColumnNames: _col0, _col10, _col11, _col12, _col14, 
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27, 
_col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5, 
_col7, _col8, _col9, _col38                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
|                     Statistics: Num rows: 70879900 Data size: 2339036730 
Basic stats: COMPLETE Column stats: NONE                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
|                     File Output Operator                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                       compressed: false                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                       table:                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|                           input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                          |
|                           output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                         |
|                           serde: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                 
{code}

The query plan of Map Operator Tree  for the MR job with the following 
TableScanOperator is:
{code}
|   Stage: Stage-4                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|     Map Reduce                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|       Map Operator Tree:                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|           TableScan                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|             Reduce Output Operator                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|               key expressions: _col27 (type: string), _col7 (type: string), 
_col38.ts (type: bigint)                                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                      |
|               sort order: +++                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    |
|               Map-reduce partition columns: _col27 (type: string), _col7 
(type: string)                                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
|               Statistics: Num rows: 70879900 Data size: 2339036730 Basic 
stats: COMPLETE Column stats: NONE                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
|               value expressions: _col0 (type: string), _col3 (type: string), 
_col4 (type: bigint), _col5 (type: bigint), _col8 (type: string), _col9 (type: 
int), _col10 (type: int), _col11 (type: string), _col12 (type: string), _col14 
(type: double), _col15 (type: double), _col16 (type: double), _col17 (type: 
double), _col18 (type: double), _col19 (type: double), _col20 (type: double), 
_col21 (type: double), _col22 (type: double), _col26 (type: timestamp), _col28 
(type: array<string>), _col31 (type: double), _col32 (type: double), _col33 
(type: double), _col34 (type: double), _col35 (type: string), _col36 (type: 
bigint), _col38 (type: 
struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>)
                                                                                
                                                                                
                                                                                
                                                                                
                                         |
|       Reduce Operator Tree:                                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    
{code}

> Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView 
> Operation for hive on MR.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16368
>                 URL: https://issues.apache.org/jira/browse/HIVE-16368
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>
> Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened 
> in LaterView Operation. It happened for hive-on-mr. The reason is because the 
> column prune change the column order in LaterView operation, for back-back 
> reducesink operators using MR engine, FileSinkOperator and TableScanOperator 
> are added before the second ReduceSink operator, The serialization column 
> order used by FileSinkOperator in LazyBinarySerDe of previous reducer is 
> different from deserialization column order from table desc used by 
> MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper.
> The serialization is decided by the outputObjInspector from 
> LateralViewJoinOperator,
> {code}
>     ArrayList<String> fieldNames = conf.getOutputInternalColNames();
>     outputObjInspector = ObjectInspectorFactory
>         .getStandardStructObjectInspector(fieldNames, ois);
> {code}
> So the column order for serialization is decided by getOutputInternalColNames 
> in LateralViewJoinOperator.
> The deserialization is decided by TableScanOperator which is created at  
> GenMapRedUtils.splitTasks. 
> {code}
>     TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils
>         .getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol"));
>     // Create the temporary file, its corresponding FileSinkOperaotr, and
>     // its corresponding TableScanOperator.
>     TableScanOperator tableScanOp =
>         createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx);
> {code}
> The column order for deserialization is decided by rowSchema of 
> LateralViewJoinOperator.
> But ColumnPrunerLateralViewJoinProc changed the order of 
> outputInternalColNames but still keep the original order of rowSchema,
> Which cause the mismatch between serialization and deserialization for two 
> back-to-back MR jobs.
> Similar issue for ColumnPrunerLateralViewForwardProc which change the column 
> order of its child selector colList but not rowSchema.
> The exception is 
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 875968094
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:78)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>       at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:554)
>       at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:381)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16368) Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView Operation for hive on MR.

Reply via email to