----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68891/#review209325 -----------------------------------------------------------
src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java Line 956 (original), 961 (patched) <https://reviews.apache.org/r/68891/#comment293648> " as the partially serialized size in memory is " Partially because we stop serializing once it crosses spillThreshold src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezInputHelper.java Lines 76 (patched) <https://reviews.apache.org/r/68891/#comment293639> Remove boolean groupSplits, int targetTasks and also all of the corresponding javadoc src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezInputHelper.java Lines 144 (patched) <https://reviews.apache.org/r/68891/#comment293644> Iterables.transform is an overkill here when everything is already in memory. Can we just do List<TaskLocationHint> = new ArrayList<TaskLocationHint>(newFormatSplits.length); and do a for loop to add items to the list? src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezInputHelper.java Lines 205 (patched) <https://reviews.apache.org/r/68891/#comment293649> && i != (inputSplits.length - 1) src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezJobSplitWriter.java Lines 137 (patched) <https://reviews.apache.org/r/68891/#comment293651> Log the total serialized size here. "Size of serialize job.split file is " + out.getPos - Rohini Palaniswamy On Oct. 2, 2018, 5:19 p.m., Satish Saley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68891/ > ----------------------------------------------------------- > > (Updated Oct. 2, 2018, 5:19 p.m.) > > > Review request for pig. > > > Repository: pig-git > > > Description > ------- > > [PIG-5359] Reduce time spent in split serialization > > > Diffs > ----- > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > f292487f0 > > src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/LoaderProcessor.java > 7a12df784 > > src/org/apache/pig/backend/hadoop/executionengine/tez/util/MRToTezHelper.java > b604d9f18 > > src/org/apache/pig/backend/hadoop/executionengine/tez/util/SerializationInfo.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezInputHelper.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezJobSplitWriter.java > PRE-CREATION > > > Diff: https://reviews.apache.org/r/68891/diff/1/ > > > Testing > ------- > > > Thanks, > > Satish Saley > >