[ 
https://issues.apache.org/jira/browse/PIG-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5323:
------------------------------------
    Status: Patch Available  (was: Open)

> Implement LastInputStreamingOptimizer in Tez
> --------------------------------------------
>
>                 Key: PIG-5323
>                 URL: https://issues.apache.org/jira/browse/PIG-5323
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.18.0
>
>         Attachments: PIG-5323-1.patch
>
>
> http://pig.apache.org/docs/r0.17.0/perf.html#join-optimizations
> {quote}
> Optimization for regular joins ensures that the last table in the join is not 
> brought into memory but streamed through instead. Optimization reduces the 
> amount of memory used which means you can avoid spilling the data and also 
> should be able to scale your query to larger data volumes.
> To take advantage of this optimization, make sure that the table with the 
> largest number of tuples per key is the last table in your query. In some of 
> our tests we saw 10x performance improvement as the result of this 
> optimization.
> {quote}
> We are not doing that in Tez and both the tables are materialized as 
> InternalCachedBag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to