Hi we notice data loss i.e. dropped records when running Debezium on Kafka
Connect with Apicurio Schema Registry. Specifically, multiple times we have
observed that a single record is dropped when we get this exception (full
stack trace
<https://gist.github.com/twthorn/917bf3cc576f2b486dde04b16a60d681>).


Failed to send HTTP request to endpoint:
http://schema-registry.service.prod-us-east-1-dw1.consul:8080/apis/ccompat/v6/subjects/prod
.<keyspace>.<table>-key/versions?normalize=false


This exception is raised by the Kafka Connect worker, which receives it
from the confluent schema registry client. This seems to be a network blip
and after it doesn't have any errors and continues processing data without
issue. But it will have data loss for one record that was received almost
exactly one minute prior to when this exception is logged. We have observed
the behavior with that same timeline occur on different days several weeks
apart. We have these key Kafka config settings (see full configs here
<https://gist.github.com/twthorn/78c2ac329a46ce1baa820753daad47dd>):


 "producer.batch.size=524288"
 "producer.linger.ms=100"
 "producer.acks=-1"
 "producer.compression.type=snappy"
 "producer.buffer.memory=268435456"
 "config.storage.replication.factor=4"
 "offset.storage.replication.factor=4"
 "status.storage.replication.factor=4"
 "scheduled.rebalance.max.delay.ms=180000"


Other version info:


   - Kafka Version 3.8.1
   - Confluent version (eg for kafka-schema-registry-client,
   kafka-schema-registry-converter, etc ) 7.5.2
   - Avro Version 1.11.4

Questions we have:

   - Are there any known issues with schema registry interacting with Kafka
   Connect to cause data loss?
   - If we drop a record does that mean that the offsets stored by the
   Kafka Connect worker source task are incorrect? i.e., we are committing
   offsets for data that we have not yet finished sending to Kafka
   - Any recommended debug steps to root cause this issue?

Thank you for the help

Reply via email to