Rohini Palaniswamy created PIG-5323:
---------------------------------------

             Summary: Implement LastInputStreamingOptimizer in Tez
                 Key: PIG-5323
                 URL: https://issues.apache.org/jira/browse/PIG-5323
             Project: Pig
          Issue Type: Improvement
            Reporter: Rohini Palaniswamy
            Assignee: Rohini Palaniswamy
             Fix For: 0.18.0


http://pig.apache.org/docs/r0.17.0/perf.html#join-optimizations
{quote}
Optimization for regular joins ensures that the last table in the join is not 
brought into memory but streamed through instead. Optimization reduces the 
amount of memory used which means you can avoid spilling the data and also 
should be able to scale your query to larger data volumes.

To take advantage of this optimization, make sure that the table with the 
largest number of tuples per key is the last table in your query. In some of 
our tests we saw 10x performance improvement as the result of this optimization.
{quote}

We are not doing that in Tez and both the tables are materialized as 
InternalCachedBag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to