-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68891/#review209325
-----------------------------------------------------------




src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
Line 956 (original), 961 (patched)
<https://reviews.apache.org/r/68891/#comment293648>

    " as the partially serialized size in memory is "
    
    Partially because we stop serializing once it crosses spillThreshold



src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezInputHelper.java
Lines 76 (patched)
<https://reviews.apache.org/r/68891/#comment293639>

    Remove boolean groupSplits, int targetTasks and also all of the 
corresponding javadoc



src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezInputHelper.java
Lines 144 (patched)
<https://reviews.apache.org/r/68891/#comment293644>

    Iterables.transform is an overkill here when everything is already in 
memory. Can we just do List<TaskLocationHint> = new 
ArrayList<TaskLocationHint>(newFormatSplits.length); and do a for loop to add 
items to the list?



src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezInputHelper.java
Lines 205 (patched)
<https://reviews.apache.org/r/68891/#comment293649>

    && i != (inputSplits.length - 1)



src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezJobSplitWriter.java
Lines 137 (patched)
<https://reviews.apache.org/r/68891/#comment293651>

    Log the total serialized size here.
    
    "Size of serialize job.split file is " + out.getPos


- Rohini Palaniswamy


On Oct. 2, 2018, 5:19 p.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68891/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2018, 5:19 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> [PIG-5359] Reduce time spent in split serialization
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 
> f292487f0 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/LoaderProcessor.java
>  7a12df784 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/util/MRToTezHelper.java 
> b604d9f18 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/util/SerializationInfo.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezInputHelper.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezJobSplitWriter.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68891/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Satish Saley
> 
>

Reply via email to