GitHub user guozhangwang opened a pull request:

    https://github.com/apache/kafka/pull/2485

    KAFKA-3896: Fix KStream-KStream leftJoin

    The issue of transiently having duplicates is due to the bad design of the 
left join itself: in order to ignore the partial joined results such as 
`A:null`, it lets the producer to potentially send twice to source stream one 
and rely on all the following conditions to be true in order to pass the test:
    
    1. `receiveMessages` happen to have fetched all the produced results and 
have committed offsets.
    2. streams app happen to have completed sending all result data.
    3. consumer used in `receiveMessages` will complete getting all messages in 
a single poll().
    
    If any of the above is not true, the test fails.
    
    Fixed this test to add a filter right after left join to filter out partial 
joined results. Minor cleanup on integration test utils.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/guozhangwang/kafka 
K3896-duplicate-join-results

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/2485.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2485
    
----
commit 186c32075bb8cab217dddeb23aa341fb3ac5e5d0
Author: Guozhang Wang <wangg...@gmail.com>
Date:   2017-02-02T06:51:57Z

    fix left join

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to