I've been able to isolate the problem, but have no idea what is causing it.

The input is in this form (this is correct):

{({(a),(b),(c)}),({(a),(b),(c)}),({(a),(b),(c)})}

and the output is in this form:

{(b,c,3),(c,a,3),(b,a,3)}

which is also correct. By placing prints and whatnot, I can see that the
error is coming once I return the second bag.

        public DataBag exec(Tuple input) throws IOException {
                try {
                        accumulate(input);
                        DataBag bag = getValue();
                        System.out.println(input.get(0).toString());
                        System.out.println(bag.toString());
                        return bag;

                } catch (Exception e) {
                        int errCode = 31415;
                        String msg = "Error while accumulating graphs (exec)
" + this.getClass().getSimpleName();
                        throw new ExecException(msg, errCode,
PigException.BUG, e);
                }
        }

The prints are how I saw that it calculated properly, and I know it's not an
error within exec because it's not throwing an exception. So something weird
is going on afterwards.

It'd be great to understand what is going on, because I think this is what
was plaguing an algebraic version of another script...

Is there something special you have to do if the form of your output is
significantly different from the form of your input? Here is the script that
generates this:

register /path/to/myudf.jar;
A = LOAD 'test.txt' as (a:chararray, b:chararray);
B = GROUP A BY a;
C = FOREACH B GENERATE A.b;
D = GROUP C ALL;
E = FOREACH D GENERATE myudf.fun.udf(C.b);

So It's weird: I'm getting the output I want, it is a DataBag, I output
that, but something is exploding.

Any ideas what it could be? As always, thanks.

Here's from grunt:

java.io.IOException: java.lang.RuntimeException: Unexpected data type -1
found in stream.
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:438)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:401)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:381)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:251)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
Caused by: java.lang.RuntimeException: Unexpected data type -1 found in
stream.
        at
org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
        at
org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
        at
org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
        at
org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
        at
org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
        at
org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
        at
org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
        at
org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
        at
org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
        at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
        at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:508)
        at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:436)
Here's from the logfile:Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias E

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias E
        at org.apache.pig.PigServer.openIterator(PigServer.java:754)
        at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
        at org.apache.pig.Main.run(Main.java:465)
        at org.apache.pig.Main.main(Main.java:107)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
        at org.apache.pig.PigServer.openIterator(PigServer.java:744)
        ... 7 more
================================================================================

Reply via email to