I fixed some ambiguous expression in the output of unit test, and also open
a bug in JIRA  <https://issues.apache.org/jira/browse/KAFKA-13772> with a
pull request <https://issues.apache.org/jira/browse/KAFKA-13772>

Thank you for your help and suggestions!

Feiyan Yu <dbtr...@gmail.com> 于2022年3月26日周六 16:18写道:

> Thank you, I will make it an issue.
>
> Luke Chen <show...@gmail.com> 于2022年3月26日周六 16:03写道:
>
>> Hi Feiyan,
>>
>> It did look like a bug. Could you open a bug in JIRA here
>> <
>> https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13735?filter=allopenissues
>> >
>> ?
>> One thing I don't fully understand from the unit test is that, you have
>> output "error distributions: $tpsWithoutResize".
>> I thought the bug is about the unevenly partition distribution after
>> resizing fetch thread, what do you mean by "error distribution"?
>> Could you please also elaborate in JIRA ticket?
>>
>> Also, welcome to submit a PR to it, since you've already had fix and test
>> ready! :)
>>
>> Thank you.
>> Luke
>>
>>
>>
>> On Sat, Mar 26, 2022 at 2:56 PM Feiyan Yu <dbtr...@gmail.com> wrote:
>>
>> > Howdy!
>> >
>> > I found that the method of resizing fetching thread had one potential
>> bug
>> > which lead to imbalanced partitions distribution for each thread after
>> > changing thread number on dynamic configuration.
>> >
>> > To figure it out, I added a primitive unit test method "testResize" in
>> > "AbstractFetcherThreadTest" to simulate the replica fetchers resizing
>> > process from 10 threads to 60 threads with originally 10 topics and 100
>> > partitions each. I designed the test to show that after the resizing
>> > process, all partitions should be redistributed correctly based on the
>> new
>> > thread number. However, the test failed because when I tried to compare
>> the
>> > "fetcherThreadMap" with the fetcherId for each topic-partition, the
>> > fetcherId mismatched! The unit test I added is in this commit (
>> >
>> https://github.com/yufeiyan1220/kafka/commit/eb99b7499b416cdeb44c2ccd3ea55a1e38ea3d60
>> ),
>> > and the standard output of the unit test showed in attachment.
>> >
>> > I doubt that maybe it is because the method "addFetcherForPartitions"
>> > which maybe adds some new fetchers to "fetcherThreadMap" called in the
>> > block of iterating the "fetcherThreadMap", and the iterator ignore some
>> > fetchers, which leads to some of the fetchers remain their topic
>> > partitions. And it leads to the imbalanced partition distribution
>> pattern
>> > in resizing.
>> >
>> > To solve this issue I make a mutable map to store all partitions and its
>> > fetch offset, and then add it back once out of the iteration. I make
>> > another commit (
>> >
>> https://github.com/yufeiyan1220/kafka/commit/0a793dfca2ab9b8e8b45ba5693359960f3e306eb
>> ),
>> > and the new resize method passed the unit test.
>> >
>> > I'm not sure whether it is an issue that Kafka Community need to fix.
>> But
>> > for me, it affects the fetching efficiency when I try to deploy Kafka
>> cross
>> > regions with high network latency.
>> >
>> > I'd really appreciate to hear from you!
>> >
>> >
>>
>

Reply via email to