Hi Suhas, No. The issue with PIG-3461 is that Pig hangs at the query compilation with a big filter expression before the job is submitted. In addition, the filter extractor was totally rewritten in 0.12. https://issues.apache.org/jira/browse/PIG-3461
Where exactly is your job hanging? Backend or frontend? Are you running it in local mode or remote mode? Thanks, Cheolsoo p.s. There are two known issues with the new filter extractor in 0.12.0 although these are probably not related to your issue- https://issues.apache.org/jira/browse/PIG-3510 https://issues.apache.org/jira/browse/PIG-3657 On Thu, Mar 6, 2014 at 5:30 PM, Suhas Satish <[email protected]> wrote: > I seem to be hitting this issue in pig-0.12 although it claims to be fixed > in pig-0.12 > https://issues.apache.org/jira/browse/PIG-3395 > Large filter expression makes Pig hang > > Cheers, > Suhas. > > > On Thu, Mar 6, 2014 at 4:26 PM, Suhas Satish <[email protected]> > wrote: > > > This is the pig script - > > > > %default previousPeriod $pPeriod > > > > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, > > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > > > > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > > > > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;* > > > > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > > > > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > > store gpWeekRanked INTO 'gpWeekRanked'; > > describe gpWeekRanked; > > > > > > Without the filter statement, the code runs without hanging. > > > > Cheers, > > Suhas. > > > > > > On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish <[email protected] > >wrote: > > > >> Hi > >> I launched the attached pig job on pig-12 with hadoop MRv1 with the > >> attached data, but the FILTER function causes the job to get stuck in an > >> infinite loop. > >> > >> pig -p pPeriod=201312 -f test.pig > >> > >> The thread in question seems to be stuck forever inside while loop of > >> runPipeline method. > >> > >> stack trace: > >> ----------- > >> > >> "main" prio=10 tid=0x00007fd74800b000 nid=0x2f63 runnable > >> [0x00007fd750d50000] > >> java.lang.Thread.State: RUNNABLE > >> at > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer. > >> relationalOperators.POForEach.getNextTuple(POForEach.java:217) > >> at > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. > >> PigGenericMapBase.runPipeline(PigGenericMapBase.java:282) > >> at > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. > >> PigGenericMapBase.map(PigGenericMapBase.java:277) > >> at > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. > >> PigGenericMapBase.map(PigGenericMapBase.java:64) > >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:680) > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346) > >> at org.apache.hadoop.mapred.Child$4.run(Child.java:282) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at javax.security.auth.Subject.doAs(Subject.java:415) > >> at > >> org.apache.hadoop.security.UserGroupInformation.doAs( > >> UserGroupInformation.java:1117) > >> at org.apache.hadoop.mapred.Child.main(Child.java:271) > >> > >> > >> > >> > >> org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/ > >> PigGenericMapBase.java: > >> > >> protected void *runPipeline*(PhysicalOperator leaf) throws IOException, > >> InterruptedException { > >> while(true){ > >> Result res = leaf.getNext(DUMMYTUPLE); > >> if(res.returnStatus==POStatus.STATUS_OK){ > >> collect(outputCollector,(Tuple)res.result); > >> continue; > >> } > >> .... > >> > >> > >> > >> Whats the suggested code fix here? > >> > >> > >> Thanks, > >> Suhas. > >> > > > > >
