Jamie Brandon created KAFKA-12608: ------------------------------------- Summary: Simple identity pipeline sometimes loses data Key: KAFKA-12608 URL: https://issues.apache.org/jira/browse/KAFKA-12608 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.7.0 Environment: https://github.com/jamii/streaming-consistency/blob/c1f504e73141405ee6cd0c7f217604d643babf81/pkgs.nix
[nix-shell:~/streaming-consistency/kafka-streams]$ java -version openjdk version "1.8.0_265" OpenJDK Runtime Environment (build 1.8.0_265-ga) OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode) [nix-shell:~/streaming-consistency/kafka-streams]$ nix-info system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos Reporter: Jamie Brandon I'm running a very simple streams program that reads records from one topic into a table and then writes the stream back into another topic. In about 1 in 5 runs, some of the output records are missing. They tend to form a single contiguous range, as if a single batch was dropped somewhere. {code:bash} $ wc -l tmp/*transactions 999514 tmp/accepted_transactions 1000000 tmp/transactions 1999514 total $ cat tmp/transactions | cut -d',' -f 1 | cut -d' ' -f 2 > in $ cat tmp/accepted_transactions | cut -d',' -f 1 | cut -d':' -f 2 > out $ diff in out | wc -l 487 $ diff in out | head 25313,25798d25312 < 25312 < 25313 < 25314 < 25315 < 25316 < 25317 < 25318 < 25319 < 25320 $ diff in out | tail < 25788 < 25789 < 25790 < 25791 < 25792 < 25793 < 25794 < 25795 < 25796 < 25797 {code} I've checked running the consumer multiple times to make sure that the records are actually missing from the topic and it wasn't just a hiccup in the consumer. The repo linked above has instructions in the readme on how to reproduce the exact versions used. -- This message was sent by Atlassian Jira (v8.3.4#803005)