Hi,
I played my query further, and found out it is very puzzle to explain the 
following behaviors:
1) The following query works:
select c_poi.provider_str, c_poi.name from (select darray(search_results, 
c.rank) as c_poi from nulf_search lateral view explode(search_clicks) 
clickTable as c) a
I get get all the result from the above query without any problem.
2) The following query NOT works:
select c_poi.provider_str, c_poi.name from (select darray(search_results, 
c.rank) as c_poi from nulf_search lateral view explode(search_clicks) 
clickTable as c) a where c_poi.provider_str = 'POI'
As long as I add the where criteria on provider_str, or even I added another 
level of sub query like following:
selectps, namefrom (select c_poi.provider_str as ps, c_poi.name as name from 
(select darray(search_results, c.rank) as c_poi from nulf_search lateral view 
explode(search_clicks) clickTable as c) a ) bwhere ps = 'POI'
any kind of criteria I tried to add on provider_str, the hive MR jobs failed in 
the same error I shown below.
Any idea why this happened? Is it related to the data? But provider_str is just 
a simple String type.
Thanks
Yong
From: java8...@hotmail.com
To: user@hive.apache.org
Subject: RE: Array index support non-constant expresssion
Date: Wed, 12 Dec 2012 12:15:27 -0500





OK. 
I followed the hive source code of 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the 
UDF. It is quite simple. 
It works fine as I expected for simple case, but when I try to run it under 
some complex query, the hive MR jobs failed with some strange errors. What I 
mean is that it failed in HIVE code base, from stuck trace, I can not see this 
failure has anything to do with my custom code.
I would like some help if some one can tell me what went wrong.
For example, I created this UDF called darray, stand for dynamic array, which 
supports the non-constant value as the index location of the array.
The following query works fine as I expected:
hive> select c_poi.provider_str as provider_str, c_poi.name as name from 
(select darray(search_results, c.index_loc) as c_poi from search_table lateral 
view explode(search_clicks) clickTable as c) a limit 5;POI                      
   xxxxADDRESS               some addressPOI                        xxxxPOI     
                   xxxxADDRESSS             some address
Of course, in this case, I only want the provider_str = 'POI' returned, and 
filter out any rows with provider_str != 'POI', so it sounds simple, I changed 
the query to the following:
hive> select c_poi.provider_str as provider_str, c_poi.name as name from 
(select darray(search_results, c.rank) as c_poi from search_table lateral view 
explode(search_clicks) clickTable as c) a where c_poi.provider_str = 'POI' 
limit 5;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks 
is set to 0 since there's no reduce operatorCannot run job locally: Input Size 
(= 178314025) is larger than hive.exec.mode.local.auto.inputbytes.max (= 
134217728)Starting Job = job_201212031001_0100, Tracking URL = 
http://blevine-desktop:50030/jobdetails.jsp?jobid=job_201212031001_0100Kill 
Command = /home/yzhang/hadoop/bin/hadoop job  
-Dmapred.job.tracker=blevine-desktop:8021 -kill job_201212031001_01002012-12-12 
11:45:24,090 Stage-1 map = 0%,  reduce = 0%2012-12-12 11:45:43,173 Stage-1 map 
= 100%,  reduce = 100%Ended Job = job_201212031001_0100 with errorsFAILED: 
Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
I am only add a Where limitation, but to my surprise, the MR jobs generated by 
HIVE failed. I am testing this in my local standalone cluster, which is running 
CDH3U3 release. When I check the hadoop userlog, here is what I got:
2012-12-12 11:40:22,421 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 
SELECT 
struct<_col0:bigint,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:boolean,_col7:boolean,_col8:boolean,_col9:boolean,_col10:boolean,_col11:boolean,_col12:string,_col13:string,_col14:struct<lat:double,lon:double,query_text_raw:string,query_text_normalized:string,query_string:string,llcountry:string,ipcountry:string,request_cnt:int,address:struct<country:string,state:string,zip:string,city:string,street:string,house:string>,categories_id:array<int>,categories_name:array<string>,lang_raw:string,lang_rose:string,lang:string,viewport:struct<top_lat:double,left_lon:double,bottom_lat:double,right_lon:double>>,_col15:struct<versions:int,physical_host:string,nose_request_id:string,client_type:string,ip:int,time_taken:int,user_agent:string,http_host:string,http_referrer:string,http_status:smallint,http_size:int,accept_language:string,md5:string,datacenter:string,tlv_map_data_version:string,tlv_devide_software_version:string,csid:int,rid:string,xncrid:string,cbfn:string,sources:array<struct<tm:bigint,tm_date:string,tm_time:string,md5:string,time_taken:int>>>,_col16:array<struct<provider_str:string,name:string,lat:double,lon:double,dyn:boolean,authoritative:boolean,search_center:boolean>>,_col17:array<struct<rank:int,action:int,tm:bigint,event:string,is_csid:boolean,is_rid:boolean,is_pbapi:boolean,is_nac:boolean>>,_col18:string,_col19:struct<rank:int,action:int,tm:bigint,event:string,is_csid:boolean,is_rid:boolean,is_pbapi:boolean,is_nac:boolean>>2012-12-12
 11:40:22,440 WARN org.apache.hadoop.mapred.Child: Error running 
childjava.lang.RuntimeException: Error in configuring object        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)      
  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)    
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)    
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)        
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)        at 
org.apache.hadoop.mapred.Child$4.run(Child.java:270)        at 
java.security.AccessController.doPrivileged(Native Method)        at 
javax.security.auth.Subject.doAs(Subject.java:396)        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)Caused by: 
java.lang.reflect.InvocationTargetException        at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)      
  ... 9 moreCaused by: java.lang.RuntimeException: Error in configuring object  
      at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)      
  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)    
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)    
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)        
... 14 moreCaused by: java.lang.reflect.InvocationTargetException        at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)      
  ... 17 moreCaused by: java.lang.RuntimeException: Map operator initialization 
failed        at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)        
... 22 moreCaused by: java.lang.RuntimeException: cannot find field results 
from [0:_col0, 1:_col1, 2:_col2, 3:_col3, 4:_col4, 5:_col5, 6:_col6, 7:_col7, 
8:_col8, 9:_col9, 10:_col10, 11:_col11, 12:_col12, 13:_col13, 14:_col14, 
15:_col15, 16:_col16, 17:_col17, 18:_col18, 19:_col19]        at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
        at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:77)
        at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)       
 at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
        at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)        at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)        at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)   
     at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) 
       at 
org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.initializeOp(LateralViewJoinOperator.java:109)
        at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)        at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)        at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)   
     at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) 
       at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.initializeOp(UDTFOperator.java:85)  
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)  
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)  
      at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)   
     at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62)
        at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)        at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)        at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)   
     at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) 
       at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) 
       at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) 
       at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)   
     at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)
        at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)        at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)   
     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)   
     at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) 
       ... 22 more2012-12-12 11:40:22,443 INFO org.apache.hadoop.mapred.Task: 
Runnning cleanup for the task
Of course, this whole table is built on very complex nested (struct + array 
....., you can see the struct in the top part of stuck trace)  structure with 
my own custom InputFormat/SerDe class. But everything so far works fine. I can 
query this table anyway I want without seeing errors. This is the first problem 
I met when I tried to query it in a way I want, but is given an error I am not 
sure the reason. If you need more information, I will be happy to provide, but 
right now I am stuck with the above error, and not sure is it a HIVE bug, or I 
did something wrong.
Thanks
Yong
From: java8...@hotmail.com
To: user@hive.apache.org
Subject: Array index support non-constant expresssion
Date: Tue, 11 Dec 2012 17:24:52 -0500





Hi, In our project to use the HIVE on CDH3U4 release (Hive 0.7.1), I have a 
hive table like the following:
Table foo (   search_results array<struct<id:bigint,..............>>   
search_clicks array<struct<index_loc:int, ..............>>)
As you can see, the 2nd column, which represents the list of search results 
clicked, contains the index location of which results was clicked (starting 
from 0). Now I need to find out the click count of some IDs, but this IDs only 
exists in the search_result struct.
When I tried to write a query for this, I will try to do following first:

select search_results[c.index_loc] from foo lateral view explode(search_clicks) 
clickTable as c;
But it won't work in hive, as the following error message returned:AILED: Error 
in semantic analysis: Line 1:7 Non-constant expressions for array indexes not 
supported rank.
It looks like that currently Hive (at least 0.7) does NOT support non-constant 
expressions as the array index.
I searched on google, and found out the following HIVE jira ticket:
https://issues.apache.org/jira/browse/HIVE-1955
It looks like someone has the same request to support it, but not yet.
But there is a comment in the above ticket that it can be done in an UDF. My 
question is, can anyone share some ideas about how to archive this in an UDF, 
as it maybe the only option for me right now?
Thanks                                                                          
                                          

Reply via email to