[ https://issues.apache.org/jira/browse/PIG-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036346#comment-14036346 ]
Daniel Dai commented on PIG-4020: --------------------------------- In short, processInput() in MR will not result endOfAllInput flag set, but in Tez that's no longer true. In MR, we run the pipeline once per key. During a particular key, we keep pulling the bottom of the pipeline until see a EOP. endOfAllInput will not be set during the process. In cleanup, we will set endOfAllInput flag and pull the pipeline again. In Tez, we run the pipeline once per task. During the process, we keep pulling the input, and endOfAllInput will be set during our pulling. So in MR, after processInput (POSplit:214), we don't need to check if endOfAllInput is set or not, but in Tez, we need to check. If it is set, then we need to pull till the pipeline becomes empty to finalize the data processing, instead of simply return a EOP, which will result the loss of the later part of data. > Fix tez e2e tests MapPartialAgg_[2-4], StreamingPerformance_[6-7] > ----------------------------------------------------------------- > > Key: PIG-4020 > URL: https://issues.apache.org/jira/browse/PIG-4020 > Project: Pig > Issue Type: Bug > Components: tez > Reporter: Daniel Dai > Assignee: Daniel Dai > Fix For: 0.14.0 > > Attachments: PIG-4020-1.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)