Hi, You would need to add the files to the distributed cache so other machines can access it.
http://wiki.apache.org/hadoop/Hive/GettingStarted#STREAMING http://wiki.apache.org/hadoop/Hive/LanguageManual/Cli#Hive_Resources hive> add file /home/pc/mypython.py; hive> select transform(a.col) using 'mypython.py' as (col string) from tmp_table a where a.col2='01'; ________________________________ From: Jianhua Wang <wjh_had...@163.com> To: user <user@hive.apache.org>; "d...@hive.apache.org" <d...@hive.apache.org> Sent: Mon, February 28, 2011 6:19:45 PM Subject: about User scripte in HiveQL Hi all, Recently, i have met a problem, and i can not solve it after some efforts. So I wanna look for help here, and any help will be appreciated. Thanks! My case is depicted as below: I want to execute the HiveQL command : select transform(a.col) using '/home/pc/mypython.py' as (col string) from tmp_table a where a.col2='01'; where the 'mypython.py' is a python script of mine. I have built a environment of hadoop within the vmware machine on my single node PC-home, and the command works well on this environment within only single node. I also have a cluster of three PC servers, including node A, B, and C. Then, I store the '/home/pc/mypython.py' on node A. However, every time I issue the command to the cluster, i am always going to get the error information like this: ------------------------------------------------------------------------------------------------------------------- Caused by: java.io.IOException: Cannot run program "/home/pc/mypython.py": java.io.IOException: error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279) ... 20 more Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) ... 21 more ------------------------------------------------------------------------------------------------------------------- By looking up the Job logs, these errors were reported by node B and node C. It seems that the tasktracker B and C can not find the script. On hive wiki, I didn't find any instruction on how to place the user script. What should I do to place my script in proper place? Thanks in advance for any reply! 2011-03-01 ________________________________ Jianhua Wang