[ https://issues.apache.org/jira/browse/PIG-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohini Palaniswamy updated PIG-5323: ------------------------------------ Status: Patch Available (was: Open) > Implement LastInputStreamingOptimizer in Tez > -------------------------------------------- > > Key: PIG-5323 > URL: https://issues.apache.org/jira/browse/PIG-5323 > Project: Pig > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Assignee: Rohini Palaniswamy > Fix For: 0.18.0 > > Attachments: PIG-5323-1.patch > > > http://pig.apache.org/docs/r0.17.0/perf.html#join-optimizations > {quote} > Optimization for regular joins ensures that the last table in the join is not > brought into memory but streamed through instead. Optimization reduces the > amount of memory used which means you can avoid spilling the data and also > should be able to scale your query to larger data volumes. > To take advantage of this optimization, make sure that the table with the > largest number of tuples per key is the last table in your query. In some of > our tests we saw 10x performance improvement as the result of this > optimization. > {quote} > We are not doing that in Tez and both the tables are materialized as > InternalCachedBag. -- This message was sent by Atlassian JIRA (v6.4.14#64029)