Hi Malcolm, Thank you for sharing it. I am glad to hear that it worked. :-)
>> We're only processing ~200 rows at the most when we run the script, not sure if that helps you narrow down the cause. Very interesting. That's surprisingly small. In my test, I used 10m rows of random integers as input. I am wondering whether it's your data that triggers a race condition. Hard to tell. But what's interesting is that the FindBugs identifies the static field in question as a potential bug, so I filed PIG-3050 to fix it. >> I assume we just use the patch you gave me on 0.10.0 until the fix comes out in a later release ? Yes. It's a bit too late to get the fix in 0.11 now, but I will aim to fix it in 0.12. Regards, Cheolsoo p.s. I did more testing with my patch by myself and found some regressions in streaming. If you're not using streaming, you should be fine, but I am just letting you know. On Mon, Nov 19, 2012 at 12:30 PM, Malcolm Tye <[email protected]>wrote: > Hi Cheolsoo, > The patch works as expected. We've not seen one error in > the > test system since we installed the new jar file. > > We're only processing ~200 rows at the most when we run the script, not > sure > if that helps you narrow down the cause. > > I assume we just use the patch you gave me on 0.10.0 until the fix comes > out > in a later release ? > > Many thanks for your quick response, it's very much appreciated. > > > Malc > > -----Original Message----- > From: Cheolsoo Park [mailto:[email protected]] > Sent: 15 November 2012 00:16 > To: [email protected] > Subject: Re: Intermittent NullPointerException > > Hi Malcolm, > > I have been running your script with 10M rows for a half day but couldn't > reproduce your error. So my analysis may be baseless here. > > That being said, it looks like a race condition to me. The callstack in the > log shows below: > > Caused by: java.lang.NullPointerException > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator > .processInput(PhysicalOperator.java:286) > > Now if you look at PhysicalOperator.java:286, it's like this: > > if(reporter!=null) { > reporter.progress(); ---> NullPointerException is thrown here } > > So 'reporter' became null between 'if(reporter!=null)' and > 'reporter.progress()'. > > Given that 'reporter' is a static field, this is totally possible. > > public static PigProgressable reporter; > > Even though you're setting default_parallel to 1, it only controls the > number of reducers, and the number of mappers is determined by the size of > input data. So you will still run multiple mapper threads in parallel in > LocalJobRunner, and they might be stepping into each other. > > One possible fix is probably changing reporter to a thread local variable. > I will send a patch that does this to your email address. I based it to > branch-0.10, so you should be able to apply it cleanly to the 0.10 source > tarball running: > > patch -p0 -i <patch file> > > Can you please try to apply the patch, rebuild pig and see if that fixes > your problem? If this does, I will try to write a unit test case and commit > the fix upstream as well. > > Thanks, > Cheolsoo > > On Wed, Nov 14, 2012 at 4:32 AM, Malcolm Tye > <[email protected]>wrote: > > > Hi, > > Looks like zip files get rejected. Here's the log file > > unzipped > > > > > > Malc > > > > > > -----Original Message----- > > From: Malcolm Tye [mailto:[email protected]] > > Sent: 14 November 2012 12:01 > > To: '[email protected]' > > Subject: RE: Intermittent NullPointerException > > > > Hi Cheolsoo, > > Even with the recompiled Pig, we still see the error. > > He's a debug log from Pig. It doesn't seem to give any more > > information. > > > > Any ideas ? > > > > > > Thanks > > > > Malc > > > > > > -----Original Message----- > > From: Malcolm Tye [mailto:[email protected]] > > Sent: 13 November 2012 12:58 > > To: '[email protected]' > > Subject: RE: Intermittent NullPointerException > > > > Hi Cheolsoo, > > I tried setting default_parallel to 1 to rule out > > parallel processing, but the problem still happened. > > > > I've recompiled Pig and have put that into the test environment with > > the debug option set. > > > > I don't have recreate steps that fail every time. When the problem > > occurs, we can run the same script again on the input file and the > > file gets processed OK the next time ! > > > > Thanks > > > > Malc > > > > > > -----Original Message----- > > From: Cheolsoo Park [mailto:[email protected]] > > Sent: 12 November 2012 23:00 > > To: [email protected] > > Subject: Re: Intermittent NullPointerException > > > > Hi Malcolm, > > > > If you're not running in parallel, it may be a different issue. But I > > am surprised that Pig 0.10 local mode fails Intermittently like you > > describe w/o parallelism. You might have discovered a real issue. If > > you could provide steps that reproduce the error, that would be great! > > > > >> How do I tell which pig jar file I'm using currently ? > > > > "pig -secretDebugCmd" will show which pig jar file in file system is > > picked up. For example, it shows the following output for me: > > > > /usr/bin/hadoop jar > > /home/cheolsoo/pig-svn/bin/../pig-withouthadoop.jar > > > > Thanks, > > Cheolsoo > > > > On Mon, Nov 12, 2012 at 2:46 PM, Malcolm Tye > > <[email protected]>wrote: > > > > > Hi Cheolsoo, > > > I'm not specifically setting default_parallel in my > > > script anywhere and I see this in the log file :- > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCont > > > ro lCompi ler - Neither PARALLEL nor default parallelism is set for > > > this job. Setting number of reducers to 1 > > > > > > So I guess I'm not using parallel. Is it worth trying to compile Pig > > > to use the Hadoop 0.23.x LocalJobRunner ? How do I tell which pig > > > jar file I'm using currently ? > > > > > > Thanks > > > > > > Malc > > > > > > > > > -----Original Message----- > > > From: Cheolsoo Park [mailto:[email protected]] > > > Sent: 12 November 2012 16:29 > > > To: [email protected] > > > Subject: Re: Intermittent NullPointerException > > > > > > Hi Malcolm, > > > > > > How do you run your script? Do you run your script in parallel? > > > Hadoop 1.0.x LocalJobRunner is not thread-safe, and Pig is by > > > default built with Hadoop 1.0.x. I have seen a similar problem > > > before ( https://issues.apache.org/jira/browse/PIG-2852). > > > > > > If you're running your script in parallel, one workaround is to use > > > Hdoop 0.23.x LocalJobRunner, which is thread-safe. You can do the > > following: > > > - If you're using the standalone pig.jar, please download the Pig > > > source tarball and run "ant clean jar -Dhadoopversion=23" to build > > pig.jar. > > > - If you're using installed Hadoop with pig-withouthadoop.jar, > > > please install Hadoop 0.23.x, download the Pig source tarball, and > > > run "ant clean jar-withouthadoop -Dhadoopversion=23" to build > > pig-withouthadoop.jar. > > > > > > Hope this is helpful. > > > > > > Thanks, > > > Cheolsoo > > > > > > On Mon, Nov 12, 2012 at 7:14 AM, Malcolm Tye > > > <[email protected]>wrote: > > > > > > > Hi,**** > > > > > > > > I'm running Pig 0.10.0 in local mode on some small text files. > > > > There is no intention to run it on Hadoop at all. We have a job > > > > that runs every 5 minutes and about 3% of the time, the job fails > > > > with the error below. It happens at random places within the Pig > > > > Script.**** > > > > > > > > ** ** > > > > > > > > 2012-10-19 14:15:37,719 [Thread-15] WARN > > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0004 > > > > java.lang.NullPointerException > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.Physic > > > > al > > > > Op > > > > erator.processInput(PhysicalOperator.java:286) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expres > > > > si > > > > on > > > > Operators.POProject.getNext(POProject.java:158) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expres > > > > si > > > > on > > > > Operators.POProject.getNext(POProject.java:360) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.Physic > > > > al > > > > Op > > > > erator.getNext(PhysicalOperator.java:330) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati > > > > on > > > > al > > > > Operators.POForEach.processPlan(POForEach.java:332) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati > > > > on > > > > al > > > > Operators.POForEach.getNext(POForEach.java:284) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.Physic > > > > al > > > > Op > > > > erator.processInput(PhysicalOperator.java:290) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati > > > > on > > > > al > > > > Operators.POFilter.getNext(POFilter.java:95) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.Physic > > > > al > > > > Op > > > > erator.processInput(PhysicalOperator.java:290) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati > > > > on > > > > al > > > > Operators.POForEach.getNext(POForEach.java:233) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.Physic > > > > al > > > > Op > > > > erator.processInput(PhysicalOperator.java:290) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati > > > > on > > > > al > > > > Operators.POLocalRearrange.getNext(POLocalRearrange.java:256) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relati > > > > on > > > > al > > > > Operators.POUnion.getNext(POUnion.java:165) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGe > > > > ne > > > > ri > > > > cMapBase.runPipeline(PigGenericMapBase.java:271) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGe > > > > ne > > > > ri > > > > cMapBase.map(PigGenericMapBase.java:266) > > > > > > > > at > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGe > > > > ne > > > > ri > > > > cMapBase.map(PigGenericMapBase.java:64) > > > > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > > > at > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > > > > at > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java: > > > > 21 > > > > 2)** > > > > ** > > > > > > > > ** ** > > > > > > > > In the Pig Log, I get**** > > > > > > > > ** ** > > > > > > > > ERROR 2244: Job failed, hadoop does not return any error message > > > > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2244: > > > > Job failed, hadoop does not return any error message > > > > at > > > > > > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:1 > > 40) > > > > at > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser. > > > > ja > > > > va:193) > > > > > > > > at > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser. > > > > ja > > > > va:165) > > > > > > > > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > > > > at org.apache.pig.Main.run(Main.java:555) > > > > at org.apache.pig.Main.main(Main.java:111) > > > > > > > > ================================================================== > > > > == > > > > == > > > > ========== > > > > **** > > > > > > > > ** ** > > > > > > > > Pig script is attached.**** > > > > > > > > ** ** > > > > > > > > Any help gratefully received**** > > > > > > > > ** ** > > > > > > > > Thanks**** > > > > > > > > ** ** > > > > > > > > Malc**** > > > > > > > > ** ** > > > > > > > > ** ** > > > > > > > > ** ** > > > > > > > > ** ** > > > > > > > > ** ** > > > > > > > > > > > > > >
