Here is the actual fix (it's actually quite loony): don't wrap the name of the file in the REGISTER statement with single quotes. That's it.
Catastrophic problems here: 1) Obviously not backwards compatible. 2) If this is a problem, why not just indicate that a) the format is wrong, or b) that a path that started with a single-quote did not yield a valid python file or c) anything understandable instead of getting in the middle of the M/R computation and throwing wacky (mkey? nullPointer) errors. On Wed, Oct 30, 2024 at 4:12 PM Mark Woodcock <woodc...@usna.edu> wrote: > pig-0.17.0bin/pig -x local > > very basic UDF file: > > #!/usr/bin/python3 > > from pig_util import outputSchema > > @outputSchema("as:int") > def square(num): > if num == None: > return None > return ((num) * (num)) > > @outputSchema("word:chararray") > def concat(word): > return word + word > > Exceedingly simple pig script: > > REGISTER '/home/scs/woodcock/SD411/lab_udf/test.py' USING > org.apache.pig.scripting.streaming.Python.PythonScriptEngine AS myFuncs; > > A = LOAD '/home/scs/woodcock/SD411/DATA/accident.csv' USING > PigStorage(',') AS (state:int,name:chararray); > > B = FOREACH A GENERATE myFuncs.square(state) AS state, name; > > > > If I do a "DUMP A" I get exactly what I would expect. > > But, on a "DUMP B", I get a failed job: > > java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException: > LINE : > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : > at > org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:506) > > grunt> Exception in thread "Thread-82" java.lang.NullPointerException: > Cannot invoke "java.util.concurrent.BlockingQueue.put(Object)" because the > return value of > "org.apache.pig.impl.builtin.StreamingUDF.access$500(org.apache.pig.impl.builtin.StreamingUDF)" > is null > at > org.apache.pig.impl.builtin.StreamingUDF$ProcessOutputThread.run(StreamingUDF.java:471) > 2024-10-29 13:02:15,296 [communication thread] INFO > org.apache.hadoop.mapred.LocalJobRunner - map > map > > ? >