Luke Chen created KAFKA-19460:
---------------------------------
Summary: fetch result might have size < fetch.min.bytes even if
data is available in replica
Key: KAFKA-19460
URL: https://issues.apache.org/jira/browse/KAFKA-19460
Project: Kafka
Issue Type: Improvement
Reporter: Luke Chen
In the doc of
"[fetch.min.bytes|https://kafka.apache.org/documentation/#consumerconfigs_fetch.min.bytes]",
it said:
??The minimum amount of data the server should return for a fetch request. If
insufficient data is available the request will wait for that much data to
accumulate before answering the request.??
It makes users believe the records returned will always greater fetch.min.bytes
if there is sufficient data in replica. But even if the data is sufficient is
available in the replica, there is still possible the returned records size <
fetch.min.bytes.
For example:
# Config
fetch.max.bytes=1500
max.partition.fetch.bytes=1000
fetch.min.bytes=1100
fetch.max.wait.ms=500
# topic foo has 2 partitions, and each partition contains 1 record with size
1000 bytes.
# When a consumer fetches data from these 2 partitions, it starts from foo-0,
and fetch 1000 bytes of data, and 500 bytes left before reaching
fetch.max.bytes.
# When fetching foo-1, since we only have 500 bytes available to be fetched,
and the first batch size in foo-1 is 1000 bytes, which is greater than 500, so
we don't fetch it.
# In the end, the total returned size is 1000 bytes, which is less than
fetch.min.bytes, without waiting until `fetch.max.wait.ms` expired. It's
because we checked the total size in replicas are more than "fetch.min.bytes",
so no wait for "fetch.max.wait.ms".
I think the logic is correct. It's just we need to update the doc to make it
clear to users. We might also need to check `replica.fetch.min.bytes` config.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)