Here is the actual fix (it's actually quite loony):  don't wrap the name of
the file in the REGISTER statement with single quotes.  That's it.

Catastrophic problems here:

1) Obviously not backwards compatible.

2) If this is a problem, why not just indicate that a) the format is wrong,
or b) that a path that started with a single-quote did not yield a valid
python file or c) anything understandable instead of getting in the middle
of the M/R computation and throwing wacky (mkey?  nullPointer) errors.

On Wed, Oct 30, 2024 at 4:12 PM Mark Woodcock <woodc...@usna.edu> wrote:

> pig-0.17.0bin/pig  -x local
>
> very basic UDF file:
>
> #!/usr/bin/python3
>
> from pig_util import outputSchema
>
> @outputSchema("as:int")
> def square(num):
> if num == None:
> return None
> return ((num) * (num))
>
> @outputSchema("word:chararray")
> def concat(word):
> return word + word
>
> Exceedingly simple pig script:
>
> REGISTER '/home/scs/woodcock/SD411/lab_udf/test.py' USING
> org.apache.pig.scripting.streaming.Python.PythonScriptEngine AS myFuncs;
>
> A = LOAD '/home/scs/woodcock/SD411/DATA/accident.csv' USING
> PigStorage(',') AS (state:int,name:chararray);
>
> B = FOREACH A GENERATE myFuncs.square(state) AS state, name;
>
>
>
> If I do a "DUMP A" I get exactly what I would expect.
>
> But, on a "DUMP B", I get a failed job:
>
> java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException:
> LINE :
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE :
> at
> org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:506)
>
> grunt> Exception in thread "Thread-82" java.lang.NullPointerException:
> Cannot invoke "java.util.concurrent.BlockingQueue.put(Object)" because the
> return value of
> "org.apache.pig.impl.builtin.StreamingUDF.access$500(org.apache.pig.impl.builtin.StreamingUDF)"
> is null
> at
> org.apache.pig.impl.builtin.StreamingUDF$ProcessOutputThread.run(StreamingUDF.java:471)
> 2024-10-29 13:02:15,296 [communication thread] INFO
> org.apache.hadoop.mapred.LocalJobRunner - map > map
>
> ?
>

Reply via email to