[jira] [Created] (PIG-2932) Setting high default_parallel causes IOException in local mode

Gianmarco De Francisci Morales (JIRA) Wed, 26 Sep 2012 06:43:12 -0700

Gianmarco De Francisci Morales created PIG-2932:
---------------------------------------------------


             Summary: Setting high default_parallel causes IOException in local 
mode
                 Key: PIG-2932
                 URL: https://issues.apache.org/jira/browse/PIG-2932
             Project: Pig
          Issue Type: Bug
            Reporter: Gianmarco De Francisci Morales
            Priority: Critical


This bug has been confirmed only in local mode.

When setting a high default_parallel, Pig fails on some operations.
The following data and script reproduce the bug.

Data:
{code}
grunt> cat file.txt                                  
11      1       qwer
12      2       qwerty
13      3       ert
13      3       ertyu
14      4       zxcv
16      6       fsdfg
16      6       fdfghj
18      8       fjklopi
{code}

Script:
{code}
SET default_parallel 9
a = load 'file.txt' as (id1:int, id2:int, str:chararray);
b = group a by (id1,id2);
c = foreach b generate flatten(group), a;
d = order c by group::id1 ASC, group::id2 ASC;
dump d
{code}

Error:
{code}
2012-09-26 15:28:13,230 [Thread-32] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map
 - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] 
C:  R: 
2012-09-26 15:28:13,232 [Thread-32] WARN  
org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
        at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
{code}

The script succeeds if default_parallel is set to 2.
I guess it depends on the fact that the default_parallel is higher than the 
number of unique keys, probably some quirk with ORDER BY.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2932) Setting high default_parallel causes IOException in local mode

Reply via email to