[ https://issues.apache.org/jira/browse/PIG-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896368#comment-16896368 ]
Jeffrey Brownlow edited comment on PIG-4449 at 7/30/19 6:02 PM: ---------------------------------------------------------------- {code:java} grouped_data_set = group data_set by id; capped_data_set = foreach grouped_data_set { ordered = order joined_data_set by timestamp desc; capped = limit ordered $num; generate ordered, flatten(capped); };{code} Included the sorted alias in the generate statement fires off this error: {code:java} Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias itaConversionsFinal at org.apache.pig.PigServer.storeEx(PigServer.java:1127) at org.apache.pig.PigServer.store(PigServer.java:1086) at org.apache.pig.PigServer.openIterator(PigServer.java:999) ... 26 more Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule NestedLimitOptimizer. Try -t NestedLimitOptimizer at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:125) at org.apache.pig.newplan.logical.relational.LogicalPlan.optimize(LogicalPlan.java:281) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1462) at org.apache.pig.PigServer.storeEx(PigServer.java:1123) ... 28 more Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection with nothing to reference! at org.apache.pig.newplan.logical.expression.ProjectExpression.findReferent(ProjectExpression.java:430) at org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:281) at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:264) at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:53) at org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:215) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visitAll(SchemaResetter.java:67) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:122) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:263) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:87) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:116) ... 31 more{code} was (Author: jbrownlow): {code:java} grouped_data_set = group data_set by id; capped_data_set = foreach grouped_data_set { ordered = order joined_data_set by timestamp desc; capped = limit ordered $num; generate order, flatten(capped); };{code} Included the sorted alias in the generate statement fires off this error: {code:java} Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias itaConversionsFinal at org.apache.pig.PigServer.storeEx(PigServer.java:1127) at org.apache.pig.PigServer.store(PigServer.java:1086) at org.apache.pig.PigServer.openIterator(PigServer.java:999) ... 26 more Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule NestedLimitOptimizer. Try -t NestedLimitOptimizer at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:125) at org.apache.pig.newplan.logical.relational.LogicalPlan.optimize(LogicalPlan.java:281) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1462) at org.apache.pig.PigServer.storeEx(PigServer.java:1123) ... 28 more Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection with nothing to reference! at org.apache.pig.newplan.logical.expression.ProjectExpression.findReferent(ProjectExpression.java:430) at org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:281) at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:264) at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:53) at org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:215) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visitAll(SchemaResetter.java:67) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:122) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:263) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:87) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:116) ... 31 more{code} > Optimize the case of Order by + Limit in nested foreach > ------------------------------------------------------- > > Key: PIG-4449 > URL: https://issues.apache.org/jira/browse/PIG-4449 > Project: Pig > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Assignee: Rohini Palaniswamy > Priority: Major > Labels: Performance > Fix For: 0.18.0 > > > This is one of the very frequently used patterns > {code} > grouped_data_set = group data_set by id; > capped_data_set = foreach grouped_data_set > { > ordered = order joined_data_set by timestamp desc; > capped = limit ordered $num; > generate flatten(capped); > }; > {code} > But this performs very poorly when there are millions of rows for a key in > the groupby with lot of spills. This can be easily optimized by pushing the > limit into the InternalSortedBag and maintain only $num records any time and > avoid memory pressure. -- This message was sent by Atlassian JIRA (v7.6.14#76016)