Hi,

You would need to add the files to the distributed cache so other machines can 
access it.

http://wiki.apache.org/hadoop/Hive/GettingStarted#STREAMING
http://wiki.apache.org/hadoop/Hive/LanguageManual/Cli#Hive_Resources

hive> add file /home/pc/mypython.py;
hive> select transform(a.col) using  'mypython.py' as (col string) 
from tmp_table a where a.col2='01';




________________________________
From: Jianhua Wang <wjh_had...@163.com>
To: user <user@hive.apache.org>; "d...@hive.apache.org" <d...@hive.apache.org>
Sent: Mon, February 28, 2011 6:19:45 PM
Subject: about User scripte in HiveQL

  
Hi all,
 
      Recently, i have met a problem, and i can  not solve it after some 
efforts. So I wanna look for help here, and any help  will be appreciated. 
Thanks!
 
      My case is depicted as below:
 
      I want to execute the HiveQL command : 
 
select transform(a.col) using  '/home/pc/mypython.py' as (col string) from 
tmp_table a where a.col2='01';
 
where the 'mypython.py' is a python script of  mine.
 
I have built a environment of hadoop within the  vmware machine on my single 
node PC-home, and the command works well on  this environment within only 
single 

node.
 
I also have a cluster of three PC servers,  including node A, B, and C.
 
Then, I store the '/home/pc/mypython.py' on node  A.
 
However, every time I issue the command to  the cluster, i am always going to 
get the error information like this: 

 
-------------------------------------------------------------------------------------------------------------------


Caused by: java.io.IOException: Cannot run program  "/home/pc/mypython.py": 
java.io.IOException: error=2, No such file or  directory
         at  java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
         at  
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
         ...  20 more
    Caused by: java.io.IOException:  java.io.IOException: error=2, No such file 
or  directory
         at  java.lang.UNIXProcess.(UNIXProcess.java:148)
         at  java.lang.ProcessImpl.start(ProcessImpl.java:65)
         at  java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
         ...  21  more
    
-------------------------------------------------------------------------------------------------------------------


By looking up the Job logs, these errors were reported by node B and node C. It 
seems that the tasktracker B and C can not find the script. 

On hive wiki, I didn't find any instruction on how to place the user script.
What should I do to place my script in proper place? 
Thanks in advance for any reply!
 
2011-03-01 
________________________________

Jianhua Wang  


      

Reply via email to