[ 
https://issues.apache.org/jira/browse/PIG-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036346#comment-14036346
 ] 

Daniel Dai commented on PIG-4020:
---------------------------------

In short, processInput() in MR will not result endOfAllInput flag set, but in 
Tez that's no longer true.

In MR, we run the pipeline once per key. During a particular key, we keep 
pulling the bottom of the pipeline until see a EOP. endOfAllInput will not be 
set during the process. In cleanup, we will set endOfAllInput flag and pull the 
pipeline again. 

In Tez, we run the pipeline once per task. During the process, we keep pulling 
the input, and endOfAllInput will be set during our pulling. 

So in MR, after processInput (POSplit:214), we don't need to check if 
endOfAllInput is set or not, but in Tez, we need to check. If it is set, then 
we need to pull till the pipeline becomes empty to finalize the data 
processing, instead of simply return a EOP, which will result the loss of the 
later part of data.

> Fix tez e2e tests MapPartialAgg_[2-4], StreamingPerformance_[6-7]
> -----------------------------------------------------------------
>
>                 Key: PIG-4020
>                 URL: https://issues.apache.org/jira/browse/PIG-4020
>             Project: Pig
>          Issue Type: Bug
>          Components: tez
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.14.0
>
>         Attachments: PIG-4020-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to