Played around with this some more. Got some interesting results. It turns out that having two STORE commands in the script is what is causing it to fail. If I comment out either of them, the script will run and produce the other result. Because of that, we know that the code used in both of those paths is ok.
Also, if I copy the script & run it from the grunt prompt, both outputs work fine. I imagine this is because the prompt runs one output at a time. I'd say at this point this looks like a bug in pig. Especially with the NPE in the stack trace, I'd say this is not expected. >From the line numbers in the version of pig that I'm running, it appears to be >this bit of code (line 789). It looks like operationID is not present in the >globalCounters map, and thus when you call iterator() you get the NPE. 787 (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java?av=f#787) while(operationIDs.hasNext()) { 788 (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java?av=f#788) String operationID = operationIDs.next(); 789 (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java?av=f#789) Iterator<Pair<String, Long>> itPairs = globalCounters.get(operationID).iterator(); 790 (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java?av=f#790) Pair<String,Long> pair = null; 791 (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java?av=f#791) while(itPairs.hasNext()) { 792 (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java?av=f#792) pair = itPairs.next(); 793 (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java?av=f#793) conf.setLong(pair.first, pair.second); 794 (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java?av=f#794) } 795 (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java?av=f#795) } 796 (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java?av=f#796) -Matt On Monday, October 6, 2014 at 8:54 PM, Sunil S Nandihalli wrote: > The input file-directory tarred and gzipped is here > (https://transfer.sh/Nmnkk/rawlogs.tgz) . The Jar file which contains all the > udfs is here (https://transfer.sh/JpSKg/pigpen.jar) > > On Tue, Oct 7, 2014 at 9:07 AM, Sunil S Nandihalli > <[email protected] (mailto:[email protected])> wrote: > > Hi Everybody, > > The pig script mba.pig (https://gist.github.com/97073ae7bf16d8be5532) is > > giving me the following error when run. This is a PigPen generated script. > > the log (https://gist.github.com/228a84351440f7b15e62) is here > > (https://gist.github.com/228a84351440f7b15e62). The last few lines of the > > stdout is > > > > 2014-10-07 03:18:14,252 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader > > - Current split being processed > > file:/tmp/temp-923128527/tmp204410789/part-r-00000:0+0 > > 2014-10-07 03:18:14,259 [LocalJobRunner Map Task Executor #0] WARN > > org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already > > been initialized > > 2014-10-07 03:18:14,281 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map > > - Aliases being processed per job phase (AliasName[line,offset]): M: > > generate6660[329,15],union6387[332,12],generate6661[336,15],generate6662[341,15],generate6663[349,15] > > C: R: > > 2014-10-07 03:18:14,291 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.hadoop.mapred.LocalJobRunner - > > 2014-10-07 03:18:14,291 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.hadoop.mapred.Task - Task:attempt_local710497996_0012_m_000001_0 > > is done. And is in the process of committing > > 2014-10-07 03:18:14,294 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.hadoop.mapred.LocalJobRunner - > > 2014-10-07 03:18:14,294 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.hadoop.mapred.Task - Task attempt_local710497996_0012_m_000001_0 > > is allowed to commit now > > 2014-10-07 03:18:14,296 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output > > of task 'attempt_local710497996_0012_m_000001_0' to > > file:/home/hdfs/sunil/mobster-knowledge-clj/hadoop-repl/sunil/output/mba/app-install.clj/_temporary/0/task_local710497996_0012_m_000001 > > 2014-10-07 03:18:14,298 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output > > of task 'attempt_local710497996_0012_m_000001_0' to > > file:/tmp/temp-923128527/tmp927324561/_temporary/0/task_local710497996_0012_m_000001 > > 2014-10-07 03:18:14,299 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.hadoop.mapred.LocalJobRunner - map > > 2014-10-07 03:18:14,299 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.hadoop.mapred.Task - Task > > 'attempt_local710497996_0012_m_000001_0' done. > > 2014-10-07 03:18:14,299 [LocalJobRunner Map Task Executor #0] INFO > > org.apache.hadoop.mapred.LocalJobRunner - Finishing task: > > attempt_local710497996_0012_m_000001_0 > > 2014-10-07 03:18:14,299 [Thread-147] INFO > > org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. > > 2014-10-07 03:18:14,722 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 63% complete > > 2014-10-07 03:18:14,724 [main] WARN > > org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob for > > job job_local710497996_0012 > > 2014-10-07 03:18:14,728 [main] INFO org.apache.pig.tools.pigstats.JobStats > > - using output size reader: > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader > > 2014-10-07 03:18:14,731 [main] INFO > > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added > > to the job > > 2014-10-07 03:20:16,529 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 > > 2014-10-07 03:20:16,534 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - Reduce phase detected, estimating # of required reducers. > > 2014-10-07 03:20:16,535 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - Setting Parallelism to 1 > > 2014-10-07 03:20:16,547 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - Setting up multi store job > > 2014-10-07 03:20:16,561 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 2017: Internal error creating job configuration. > > > > > > Can somebody help me figure out what is happening. > > Thanks, > > Sunil. > > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "PigPen Support" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > (mailto:[email protected]). > For more options, visit https://groups.google.com/d/optout.
