I just ran your script in local mode with -Dhadoopversion=20 (Hadoop 1)
using Apache Pig.
1) Branch 0.11 and 0.12 fail with NPE
Caused by: java.lang.NullPointerException
at org.apache.hadoop.mapred.Counters.getShortName(Counters.java:669)
at org.apache.hadoop.mapred.Counters.getGroup(Counters.java:405)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:360)
... 12 more
This is fixed in trunk.
2) Trunk works fine.
So I cannot reproduce your problem. If you're using a specific distribution
(such as MapR), it should be a distribution specific issue.
Thanks,
Cheolsoo
On Thu, Mar 6, 2014 at 7:38 PM, Suhas Satish <[email protected]> wrote:
> The example that reproduces the issue along with data is attached in the
> very first email on this thread
>
> On Thursday, March 6, 2014, Cheolsoo Park <[email protected]> wrote:
>
> > So that's backend. It has nothing to do with the filter extractor. The
> > filter extractor is for predicate push down on the frontend.
> >
> > The code that you're showing is the entry point where Pig mapper begins.
> So
> > it doesn't tell us much. The mapper is given a segment of physical plan
> > (pipeline), and the getNext() call pulls records from roots to leaves one
> > by one.
> >
> > You need to find where time is spent in the pipeline. If you're
> suspecting
> > Filter By is slow, then it should be POFilter. Please take thread dump
> > multiple times and see the stack traces. Unless you provide an example
> that
> > reproduces the error, I cannot help you more.
> >
> >
> >
> > On Thu, Mar 6, 2014 at 6:03 PM, Suhas Satish <[email protected]
> <javascript:;>>
> > wrote:
> >
> > > Hi Cheolsoo,
> > > This is where its hanging -
> > > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*
> > >
> > > org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/
> > > PigGenericMapBase.java:
> > >
> > > protected void *runPipeline*(PhysicalOperator leaf) throws IOException,
> > > InterruptedException {
> > > while(true){
> > > Result res = leaf.getNext(DUMMYTUPLE);
> > > if(res.returnStatus==POStatus.STATUS_OK){
> > > collect(outputCollector,(Tuple)res.result);
> > > continue;
> > > }
> > > ....
> > >
> > > Cheers,
> > > Suhas.
> > >
> > >
> > > On Thu, Mar 6, 2014 at 5:56 PM, Cheolsoo Park <[email protected]>
> > > wrote:
> > >
> > > > Hi Suhas,
> > > >
> > > > No. The issue with PIG-3461 is that Pig hangs at the query
> compilation
> > > with
> > > > a big filter expression before the job is submitted.
> > > > In addition, the filter extractor was totally rewritten in 0.12.
> > > > https://issues.apache.org/jira/browse/PIG-3461
> > > >
> > > > Where exactly is your job hanging? Backend or frontend? Are you
> running
> > > it
> > > > in local mode or remote mode?
> > > >
> > > > Thanks,
> > > > Cheolsoo
> > > >
> > > > p.s.
> > > > There are two known issues with the new filter extractor in 0.12.0
> > > although
> > > > these are probably not related to your issue-
> > > > https://issues.apache.org/jira/browse/PIG-3510
> > > > https://issues.apache.org/jira/browse/PIG-3657
> > > >
> > > >
> > > > On Thu, Mar 6, 2014 at 5:30 PM, Suhas Satish <[email protected]
> >
> > > > wrote:
> > > >
> > > > > I seem to be hitting this issue in pig-0.12 although it claims to
> be
> > > > fixed
> > > > > in pig-0.12
> > > > > https://issues.apache.org/jira/browse/PIG-3395
> > > > > Large filter expression makes Pig hang
> > > > >
> > > > > Cheers,
> > > > > Suhas.
> > > > >
> > > > >
> > > > > On Thu, Mar 6, 2014 at 4:26 PM, Suhas Satish <
> [email protected]
> > >
> > > > > wrote:
> > > > >
> > > > > > This is the pig script -
> > > > > >
> > > > > > %default previousPeriod $pPeriod
> > > > > >
> > > > > > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS
> > > (WEEK:int,
> > > > > > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> > > > > >
> > > > > > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> > > > > >
> > > > > > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*
> > > > > >
> > > > > > pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> > > > > >
> > > > > > gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> > > > > > store gpWeekRanked INTO 'gpWeekRanked';
> > > > > > describe gpWeekRanked;
> > > > > >
> > > > > >
> > > > > > Without the filter statement, the code runs without hanging.
> > > > > >
> > > > > > Cheers,
> > > > > > Suhas.
> > > > > >
> > > > > >
> > > > > > On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish <
> > [email protected]
> > > > > >wrote:
> > > > > >
> > > > > >> Hi
> > > > > >> I launched the attached pig job on pig-12 with hadoop MRv1 with
> > the
> > > > > >> attached data, but the FILTER function causes the job to get
> stuck
> > > in
> > > > an
> > > > > >> infinite loop.
> > > > > >>
> > > > > >> pig -p pPeriod=201312 -f test.pig
> > > > > >>
> > > > > >> The thread in question seems to be stuck forever inside while
> loop
> > > of
> > > > > >> runPipel
>
>
>
> --
> Cheers,
> Suhas.
>