Hi Cheolsoo,
When we encounter the problem, we can reprocess the file
with no problems in a later run. If you want a sample file I can pick one
up for you if you want ?
OK, we'll use your patch on top of 0.10.0 until we see the bug included onto
the next release.
We're not using streaming.
Many thanks
Malc
-----Original Message-----
From: Cheolsoo Park [mailto:[email protected]]
Sent: 20 November 2012 05:19
To: [email protected]
Subject: Re: Intermittent NullPointerException
Hi Malcolm,
Thank you for sharing it. I am glad to hear that it worked. :-)
>> We're only processing ~200 rows at the most when we run the script,
>> not
sure if that helps you narrow down the cause.
Very interesting. That's surprisingly small. In my test, I used 10m rows of
random integers as input. I am wondering whether it's your data that
triggers a race condition. Hard to tell. But what's interesting is that the
FindBugs identifies the static field in question as a potential bug, so I
filed PIG-3050 to fix it.
>> I assume we just use the patch you gave me on 0.10.0 until the fix
>> comes
out in a later release ?
Yes. It's a bit too late to get the fix in 0.11 now, but I will aim to fix
it in 0.12.
Regards,
Cheolsoo
p.s. I did more testing with my patch by myself and found some regressions
in streaming. If you're not using streaming, you should be fine, but I am
just letting you know.
On Mon, Nov 19, 2012 at 12:30 PM, Malcolm Tye
<[email protected]>wrote:
> Hi Cheolsoo,
> The patch works as expected. We've not seen one error
> in the test system since we installed the new jar file.
>
> We're only processing ~200 rows at the most when we run the script,
> not sure if that helps you narrow down the cause.
>
> I assume we just use the patch you gave me on 0.10.0 until the fix
> comes out in a later release ?
>
> Many thanks for your quick response, it's very much appreciated.
>
>
> Malc
>
> -----Original Message-----
> From: Cheolsoo Park [mailto:[email protected]]
> Sent: 15 November 2012 00:16
> To: [email protected]
> Subject: Re: Intermittent NullPointerException
>
> Hi Malcolm,
>
> I have been running your script with 10M rows for a half day but
> couldn't reproduce your error. So my analysis may be baseless here.
>
> That being said, it looks like a race condition to me. The callstack
> in the log shows below:
>
> Caused by: java.lang.NullPointerException
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOp
> erator
> .processInput(PhysicalOperator.java:286)
>
> Now if you look at PhysicalOperator.java:286, it's like this:
>
> if(reporter!=null) {
> reporter.progress(); ---> NullPointerException is thrown here }
>
> So 'reporter' became null between 'if(reporter!=null)' and
> 'reporter.progress()'.
>
> Given that 'reporter' is a static field, this is totally possible.
>
> public static PigProgressable reporter;
>
> Even though you're setting default_parallel to 1, it only controls the
> number of reducers, and the number of mappers is determined by the
> size of input data. So you will still run multiple mapper threads in
> parallel in LocalJobRunner, and they might be stepping into each other.
>
> One possible fix is probably changing reporter to a thread local variable.
> I will send a patch that does this to your email address. I based it
> to branch-0.10, so you should be able to apply it cleanly to the 0.10
> source tarball running:
>
> patch -p0 -i <patch file>
>
> Can you please try to apply the patch, rebuild pig and see if that
> fixes your problem? If this does, I will try to write a unit test case
> and commit the fix upstream as well.
>
> Thanks,
> Cheolsoo
>
> On Wed, Nov 14, 2012 at 4:32 AM, Malcolm Tye
> <[email protected]>wrote:
>
> > Hi,
> > Looks like zip files get rejected. Here's the log file
> > unzipped
> >
> >
> > Malc
> >
> >
> > -----Original Message-----
> > From: Malcolm Tye [mailto:[email protected]]
> > Sent: 14 November 2012 12:01
> > To: '[email protected]'
> > Subject: RE: Intermittent NullPointerException
> >
> > Hi Cheolsoo,
> > Even with the recompiled Pig, we still see the error.
> > He's a debug log from Pig. It doesn't seem to give any more
> > information.
> >
> > Any ideas ?
> >
> >
> > Thanks
> >
> > Malc
> >
> >
> > -----Original Message-----
> > From: Malcolm Tye [mailto:[email protected]]
> > Sent: 13 November 2012 12:58
> > To: '[email protected]'
> > Subject: RE: Intermittent NullPointerException
> >
> > Hi Cheolsoo,
> > I tried setting default_parallel to 1 to rule out
> > parallel processing, but the problem still happened.
> >
> > I've recompiled Pig and have put that into the test environment with
> > the debug option set.
> >
> > I don't have recreate steps that fail every time. When the problem
> > occurs, we can run the same script again on the input file and the
> > file gets processed OK the next time !
> >
> > Thanks
> >
> > Malc
> >
> >
> > -----Original Message-----
> > From: Cheolsoo Park [mailto:[email protected]]
> > Sent: 12 November 2012 23:00
> > To: [email protected]
> > Subject: Re: Intermittent NullPointerException
> >
> > Hi Malcolm,
> >
> > If you're not running in parallel, it may be a different issue. But
> > I am surprised that Pig 0.10 local mode fails Intermittently like
> > you describe w/o parallelism. You might have discovered a real
> > issue. If you could provide steps that reproduce the error, that would
be great!
> >
> > >> How do I tell which pig jar file I'm using currently ?
> >
> > "pig -secretDebugCmd" will show which pig jar file in file system is
> > picked up. For example, it shows the following output for me:
> >
> > /usr/bin/hadoop jar
> > /home/cheolsoo/pig-svn/bin/../pig-withouthadoop.jar
> >
> > Thanks,
> > Cheolsoo
> >
> > On Mon, Nov 12, 2012 at 2:46 PM, Malcolm Tye
> > <[email protected]>wrote:
> >
> > > Hi Cheolsoo,
> > > I'm not specifically setting default_parallel in
> > > my script anywhere and I see this in the log file :-
> > >
> > >
> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCo
> > > nt ro lCompi ler - Neither PARALLEL nor default parallelism is set
> > > for this job. Setting number of reducers to 1
> > >
> > > So I guess I'm not using parallel. Is it worth trying to compile
> > > Pig to use the Hadoop 0.23.x LocalJobRunner ? How do I tell which
> > > pig jar file I'm using currently ?
> > >
> > > Thanks
> > >
> > > Malc
> > >
> > >
> > > -----Original Message-----
> > > From: Cheolsoo Park [mailto:[email protected]]
> > > Sent: 12 November 2012 16:29
> > > To: [email protected]
> > > Subject: Re: Intermittent NullPointerException
> > >
> > > Hi Malcolm,
> > >
> > > How do you run your script? Do you run your script in parallel?
> > > Hadoop 1.0.x LocalJobRunner is not thread-safe, and Pig is by
> > > default built with Hadoop 1.0.x. I have seen a similar problem
> > > before ( https://issues.apache.org/jira/browse/PIG-2852).
> > >
> > > If you're running your script in parallel, one workaround is to
> > > use Hdoop 0.23.x LocalJobRunner, which is thread-safe. You can do
> > > the
> > following:
> > > - If you're using the standalone pig.jar, please download the Pig
> > > source tarball and run "ant clean jar -Dhadoopversion=23" to build
> > pig.jar.
> > > - If you're using installed Hadoop with pig-withouthadoop.jar,
> > > please install Hadoop 0.23.x, download the Pig source tarball, and
> > > run "ant clean jar-withouthadoop -Dhadoopversion=23" to build
> > pig-withouthadoop.jar.
> > >
> > > Hope this is helpful.
> > >
> > > Thanks,
> > > Cheolsoo
> > >
> > > On Mon, Nov 12, 2012 at 7:14 AM, Malcolm Tye
> > > <[email protected]>wrote:
> > >
> > > > Hi,****
> > > >
> > > > I'm running Pig 0.10.0 in local mode on some small text files.
> > > > There is no intention to run it on Hadoop at all. We have a job
> > > > that runs every 5 minutes and about 3% of the time, the job
> > > > fails with the error below. It happens at random places within
> > > > the Pig
> > > > Script.****
> > > >
> > > > ** **
> > > >
> > > > 2012-10-19 14:15:37,719 [Thread-15] WARN
> > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
> > > > java.lang.NullPointerException
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.Phys
> > > > ic
> > > > al
> > > > Op
> > > > erator.processInput(PhysicalOperator.java:286)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expr
> > > > es
> > > > si
> > > > on
> > > > Operators.POProject.getNext(POProject.java:158)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expr
> > > > es
> > > > si
> > > > on
> > > > Operators.POProject.getNext(POProject.java:360)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.Phys
> > > > ic
> > > > al
> > > > Op
> > > > erator.getNext(PhysicalOperator.java:330)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.rela
> > > > ti
> > > > on
> > > > al
> > > > Operators.POForEach.processPlan(POForEach.java:332)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.rela
> > > > ti
> > > > on
> > > > al
> > > > Operators.POForEach.getNext(POForEach.java:284)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.Phys
> > > > ic
> > > > al
> > > > Op
> > > > erator.processInput(PhysicalOperator.java:290)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.rela
> > > > ti
> > > > on
> > > > al
> > > > Operators.POFilter.getNext(POFilter.java:95)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.Phys
> > > > ic
> > > > al
> > > > Op
> > > > erator.processInput(PhysicalOperator.java:290)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.rela
> > > > ti
> > > > on
> > > > al
> > > > Operators.POForEach.getNext(POForEach.java:233)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.Phys
> > > > ic
> > > > al
> > > > Op
> > > > erator.processInput(PhysicalOperator.java:290)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.rela
> > > > ti
> > > > on
> > > > al
> > > > Operators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.rela
> > > > ti
> > > > on
> > > > al
> > > > Operators.POUnion.getNext(POUnion.java:165)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Pig
> > > > Ge
> > > > ne
> > > > ri
> > > > cMapBase.runPipeline(PigGenericMapBase.java:271)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Pig
> > > > Ge
> > > > ne
> > > > ri
> > > > cMapBase.map(PigGenericMapBase.java:266)
> > > >
> > > > at
> > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Pig
> > > > Ge
> > > > ne
> > > > ri
> > > > cMapBase.map(PigGenericMapBase.java:64)
> > > >
> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > at
> > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > > > at
> > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:
> > > > 21
> > > > 2)**
> > > > **
> > > >
> > > > ** **
> > > >
> > > > In the Pig Log, I get****
> > > >
> > > > ** **
> > > >
> > > > ERROR 2244: Job failed, hadoop does not return any error message
> > > >
> > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2244:
> > > > Job failed, hadoop does not return any error message
> > > > at
> > > >
> > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java
> > :1
> > 40)
> > > > at
> > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.
> > > > ja
> > > > va:193)
> > > >
> > > > at
> > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.
> > > > ja
> > > > va:165)
> > > >
> > > > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> > > > at org.apache.pig.Main.run(Main.java:555)
> > > > at org.apache.pig.Main.main(Main.java:111)
> > > >
> > > > ================================================================
> > > > ==
> > > > ==
> > > > ==
> > > > ==========
> > > > ****
> > > >
> > > > ** **
> > > >
> > > > Pig script is attached.****
> > > >
> > > > ** **
> > > >
> > > > Any help gratefully received****
> > > >
> > > > ** **
> > > >
> > > > Thanks****
> > > >
> > > > ** **
> > > >
> > > > Malc****
> > > >
> > > > ** **
> > > >
> > > > ** **
> > > >
> > > > ** **
> > > >
> > > > ** **
> > > >
> > > > ** **
> > > >
> > >
> > >
> >
>
>