[ 
https://issues.apache.org/jira/browse/PIG-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896368#comment-16896368
 ] 

Jeffrey Brownlow edited comment on PIG-4449 at 7/30/19 6:02 PM:
----------------------------------------------------------------

{code:java}
grouped_data_set = group data_set by id;

capped_data_set = foreach grouped_data_set
{
  ordered = order joined_data_set by timestamp desc;
  capped = limit ordered $num;
  generate ordered, flatten(capped);
};{code}
Included the sorted alias in the generate statement fires off this error:
{code:java}
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias 
itaConversionsFinal
    at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
    at org.apache.pig.PigServer.store(PigServer.java:1086)
    at org.apache.pig.PigServer.openIterator(PigServer.java:999)
    ... 26 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
Error processing rule NestedLimitOptimizer. Try -t NestedLimitOptimizer
    at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:125)
    at 
org.apache.pig.newplan.logical.relational.LogicalPlan.optimize(LogicalPlan.java:281)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1462)
    at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
    ... 28 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: 
Projection with nothing to reference!
    at 
org.apache.pig.newplan.logical.expression.ProjectExpression.findReferent(ProjectExpression.java:430)
    at 
org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:281)
    at 
org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:264)
    at 
org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:53)
    at 
org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:215)
    at 
org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at 
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visitAll(SchemaResetter.java:67)
    at 
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:122)
    at 
org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:263)
    at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
    at 
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114)
    at 
org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:87)
    at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at 
org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
    at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:116)
    ... 31 more{code}


was (Author: jbrownlow):
{code:java}
grouped_data_set = group data_set by id;

capped_data_set = foreach grouped_data_set
{
  ordered = order joined_data_set by timestamp desc;
  capped = limit ordered $num;
  generate order, flatten(capped);
};{code}
Included the sorted alias in the generate statement fires off this error:
{code:java}
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias 
itaConversionsFinal
    at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
    at org.apache.pig.PigServer.store(PigServer.java:1086)
    at org.apache.pig.PigServer.openIterator(PigServer.java:999)
    ... 26 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
Error processing rule NestedLimitOptimizer. Try -t NestedLimitOptimizer
    at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:125)
    at 
org.apache.pig.newplan.logical.relational.LogicalPlan.optimize(LogicalPlan.java:281)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1462)
    at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
    ... 28 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: 
Projection with nothing to reference!
    at 
org.apache.pig.newplan.logical.expression.ProjectExpression.findReferent(ProjectExpression.java:430)
    at 
org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:281)
    at 
org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:264)
    at 
org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:53)
    at 
org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:215)
    at 
org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at 
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visitAll(SchemaResetter.java:67)
    at 
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:122)
    at 
org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:263)
    at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
    at 
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114)
    at 
org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:87)
    at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at 
org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
    at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:116)
    ... 31 more{code}

> Optimize the case of Order by + Limit in nested foreach
> -------------------------------------------------------
>
>                 Key: PIG-4449
>                 URL: https://issues.apache.org/jira/browse/PIG-4449
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>            Priority: Major
>              Labels: Performance
>             Fix For: 0.18.0
>
>
> This is one of the very frequently used patterns
> {code}
> grouped_data_set = group data_set by id;
> capped_data_set = foreach grouped_data_set
> {
>   ordered = order joined_data_set by timestamp desc;
>   capped = limit ordered $num;
>  generate flatten(capped);
> };
> {code}
> But this performs very poorly when there are millions of rows for a key in 
> the groupby with lot of spills.  This can be easily optimized by pushing the 
> limit into the InternalSortedBag and maintain only $num records any time and 
> avoid memory pressure.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to