Jamie Brandon created KAFKA-12608:
-------------------------------------
Summary: Simple identity pipeline sometimes loses data
Key: KAFKA-12608
URL: https://issues.apache.org/jira/browse/KAFKA-12608
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 2.7.0
Environment:
https://github.com/jamii/streaming-consistency/blob/c1f504e73141405ee6cd0c7f217604d643babf81/pkgs.nix
[nix-shell:~/streaming-consistency/kafka-streams]$ java -version
openjdk version "1.8.0_265"
OpenJDK Runtime Environment (build 1.8.0_265-ga)
OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
[nix-shell:~/streaming-consistency/kafka-streams]$ nix-info
system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10,
channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs:
/nix/var/nix/profiles/per-user/root/channels/nixos
Reporter: Jamie Brandon
I'm running a very simple streams program that reads records from one topic
into a table and then writes the stream back into another topic. In about 1 in
5 runs, some of the output records are missing. They tend to form a single
contiguous range, as if a single batch was dropped somewhere.
{code:bash}
$ wc -l tmp/*transactions
999514 tmp/accepted_transactions
1000000 tmp/transactions
1999514 total
$ cat tmp/transactions | cut -d',' -f 1 | cut -d' ' -f 2 > in
$ cat tmp/accepted_transactions | cut -d',' -f 1 | cut -d':' -f 2 > out
$ diff in out | wc -l
487
$ diff in out | head
25313,25798d25312
< 25312
< 25313
< 25314
< 25315
< 25316
< 25317
< 25318
< 25319
< 25320
$ diff in out | tail
< 25788
< 25789
< 25790
< 25791
< 25792
< 25793
< 25794
< 25795
< 25796
< 25797
{code}
I've checked running the consumer multiple times to make sure that the records
are actually missing from the topic and it wasn't just a hiccup in the
consumer.
The repo linked above has instructions in the readme on how to reproduce the
exact versions used.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)