Cheolsoo Park created PIG-4124:
----------------------------------
Summary: Command for Python streaming udf should be configurable
Key: PIG-4124
URL: https://issues.apache.org/jira/browse/PIG-4124
Project: Pig
Issue Type: Improvement
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Fix For: 0.14.0
In my cluster, multiple versions of python are installed such as python2.6,
python2.7, etc. Since some modules are only available on non-default python
versions, it would be nice if the python command could be configurable by the
user.
For eg, I have a streaming udf that imports pytz. It fails with the following
error if it runs with {{python}}-
{code}
: Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE 4:
ImportError: No module named pytz
: File
/mnt1/var/lib/hadoop/nm-local-dir/usercache/cheolsoop/appcache/application_1407968511815_0021/container_1407968511815_0021_01_001322/tmp/udfs.py,
line 4, in <module>
: import pytz
: at
org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:519)
{code}
But it works if I use {{python2.7}} as command.
--
This message was sent by Atlassian JIRA
(v6.2#6252)