GitHub user guozhangwang opened a pull request: https://github.com/apache/kafka/pull/2485
KAFKA-3896: Fix KStream-KStream leftJoin The issue of transiently having duplicates is due to the bad design of the left join itself: in order to ignore the partial joined results such as `A:null`, it lets the producer to potentially send twice to source stream one and rely on all the following conditions to be true in order to pass the test: 1. `receiveMessages` happen to have fetched all the produced results and have committed offsets. 2. streams app happen to have completed sending all result data. 3. consumer used in `receiveMessages` will complete getting all messages in a single poll(). If any of the above is not true, the test fails. Fixed this test to add a filter right after left join to filter out partial joined results. Minor cleanup on integration test utils. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guozhangwang/kafka K3896-duplicate-join-results Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2485.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2485 ---- commit 186c32075bb8cab217dddeb23aa341fb3ac5e5d0 Author: Guozhang Wang <wangg...@gmail.com> Date: 2017-02-02T06:51:57Z fix left join ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---