Re: Potential bugs in resizing kafka thread

2022-03-26 Thread Luke Chen
Hi Feiyan,

It did look like a bug. Could you open a bug in JIRA here

?
One thing I don't fully understand from the unit test is that, you have
output "error distributions: $tpsWithoutResize".
I thought the bug is about the unevenly partition distribution after
resizing fetch thread, what do you mean by "error distribution"?
Could you please also elaborate in JIRA ticket?

Also, welcome to submit a PR to it, since you've already had fix and test
ready! :)

Thank you.
Luke



On Sat, Mar 26, 2022 at 2:56 PM Feiyan Yu  wrote:

> Howdy!
>
> I found that the method of resizing fetching thread had one potential bug
> which lead to imbalanced partitions distribution for each thread after
> changing thread number on dynamic configuration.
>
> To figure it out, I added a primitive unit test method "testResize" in
> "AbstractFetcherThreadTest" to simulate the replica fetchers resizing
> process from 10 threads to 60 threads with originally 10 topics and 100
> partitions each. I designed the test to show that after the resizing
> process, all partitions should be redistributed correctly based on the new
> thread number. However, the test failed because when I tried to compare the
> "fetcherThreadMap" with the fetcherId for each topic-partition, the
> fetcherId mismatched! The unit test I added is in this commit (
> https://github.com/yufeiyan1220/kafka/commit/eb99b7499b416cdeb44c2ccd3ea55a1e38ea3d60),
> and the standard output of the unit test showed in attachment.
>
> I doubt that maybe it is because the method "addFetcherForPartitions"
> which maybe adds some new fetchers to "fetcherThreadMap" called in the
> block of iterating the "fetcherThreadMap", and the iterator ignore some
> fetchers, which leads to some of the fetchers remain their topic
> partitions. And it leads to the imbalanced partition distribution pattern
> in resizing.
>
> To solve this issue I make a mutable map to store all partitions and its
> fetch offset, and then add it back once out of the iteration. I make
> another commit (
> https://github.com/yufeiyan1220/kafka/commit/0a793dfca2ab9b8e8b45ba5693359960f3e306eb),
> and the new resize method passed the unit test.
>
> I'm not sure whether it is an issue that Kafka Community need to fix. But
> for me, it affects the fetching efficiency when I try to deploy Kafka cross
> regions with high network latency.
>
> I'd really appreciate to hear from you!
>
>


Re: Potential bugs in resizing kafka thread

2022-03-26 Thread Feiyan Yu
Thank you, I will make it an issue.

Luke Chen  于2022年3月26日周六 16:03写道:

> Hi Feiyan,
>
> It did look like a bug. Could you open a bug in JIRA here
> <
> https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13735?filter=allopenissues
> >
> ?
> One thing I don't fully understand from the unit test is that, you have
> output "error distributions: $tpsWithoutResize".
> I thought the bug is about the unevenly partition distribution after
> resizing fetch thread, what do you mean by "error distribution"?
> Could you please also elaborate in JIRA ticket?
>
> Also, welcome to submit a PR to it, since you've already had fix and test
> ready! :)
>
> Thank you.
> Luke
>
>
>
> On Sat, Mar 26, 2022 at 2:56 PM Feiyan Yu  wrote:
>
> > Howdy!
> >
> > I found that the method of resizing fetching thread had one potential bug
> > which lead to imbalanced partitions distribution for each thread after
> > changing thread number on dynamic configuration.
> >
> > To figure it out, I added a primitive unit test method "testResize" in
> > "AbstractFetcherThreadTest" to simulate the replica fetchers resizing
> > process from 10 threads to 60 threads with originally 10 topics and 100
> > partitions each. I designed the test to show that after the resizing
> > process, all partitions should be redistributed correctly based on the
> new
> > thread number. However, the test failed because when I tried to compare
> the
> > "fetcherThreadMap" with the fetcherId for each topic-partition, the
> > fetcherId mismatched! The unit test I added is in this commit (
> >
> https://github.com/yufeiyan1220/kafka/commit/eb99b7499b416cdeb44c2ccd3ea55a1e38ea3d60
> ),
> > and the standard output of the unit test showed in attachment.
> >
> > I doubt that maybe it is because the method "addFetcherForPartitions"
> > which maybe adds some new fetchers to "fetcherThreadMap" called in the
> > block of iterating the "fetcherThreadMap", and the iterator ignore some
> > fetchers, which leads to some of the fetchers remain their topic
> > partitions. And it leads to the imbalanced partition distribution pattern
> > in resizing.
> >
> > To solve this issue I make a mutable map to store all partitions and its
> > fetch offset, and then add it back once out of the iteration. I make
> > another commit (
> >
> https://github.com/yufeiyan1220/kafka/commit/0a793dfca2ab9b8e8b45ba5693359960f3e306eb
> ),
> > and the new resize method passed the unit test.
> >
> > I'm not sure whether it is an issue that Kafka Community need to fix. But
> > for me, it affects the fetching efficiency when I try to deploy Kafka
> cross
> > regions with high network latency.
> >
> > I'd really appreciate to hear from you!
> >
> >
>


Re: [VOTE] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-26 Thread David Jacot
+1 (binding). Thanks for the KIP.

Best,
David

Le ven. 25 mars 2022 à 07:11, Luke Chen  a écrit :

> Hi Sergio,
>
> Thanks for the KIP!
> +1(binding) from me.
>
> Thank you.
> Luke
>
> On Fri, Mar 25, 2022 at 1:40 PM Sergio Daniel Troiano
>  wrote:
>
> > Hi lads,
> >
> > I would like to start the vote on this:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> >
> > As an extra information we are already using the patch in our company, so
> > thanks to this patch and other extra script we did (maybe I will start
> > another KIP later) we started saving plenty of money.
> >
> > Best regards.
> > Sergio Troiano
> >
>


[jira] [Created] (KAFKA-13772) Unevenly distribute topic partitions for threads after resizing thread number

2022-03-26 Thread Feiyan Yu (Jira)
Feiyan Yu created KAFKA-13772:
-

 Summary: Unevenly distribute topic partitions for threads after 
resizing thread number 
 Key: KAFKA-13772
 URL: https://issues.apache.org/jira/browse/KAFKA-13772
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.0.1, 1.1.0
Reporter: Feiyan Yu
 Fix For: 3.3.0


The method 'resizeThreadPool' is suppose to redistributed all topic partitions 
to threads based on the new thread number, and they should be distributed  
evenly. 
 
However, the resizing process which has a logic to add  fetcher to 
'fetcherThreadMap' is within the iteration of 'fetcherThreadMap' all fetchers, 
which could lead to skipping some fetchers and these fetchers remain their 
topic partition assignment.
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Potential bugs in resizing kafka thread

2022-03-26 Thread Feiyan Yu
I fixed some ambiguous expression in the output of unit test, and also open
a bug in JIRA   with a
pull request 

Thank you for your help and suggestions!

Feiyan Yu  于2022年3月26日周六 16:18写道:

> Thank you, I will make it an issue.
>
> Luke Chen  于2022年3月26日周六 16:03写道:
>
>> Hi Feiyan,
>>
>> It did look like a bug. Could you open a bug in JIRA here
>> <
>> https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13735?filter=allopenissues
>> >
>> ?
>> One thing I don't fully understand from the unit test is that, you have
>> output "error distributions: $tpsWithoutResize".
>> I thought the bug is about the unevenly partition distribution after
>> resizing fetch thread, what do you mean by "error distribution"?
>> Could you please also elaborate in JIRA ticket?
>>
>> Also, welcome to submit a PR to it, since you've already had fix and test
>> ready! :)
>>
>> Thank you.
>> Luke
>>
>>
>>
>> On Sat, Mar 26, 2022 at 2:56 PM Feiyan Yu  wrote:
>>
>> > Howdy!
>> >
>> > I found that the method of resizing fetching thread had one potential
>> bug
>> > which lead to imbalanced partitions distribution for each thread after
>> > changing thread number on dynamic configuration.
>> >
>> > To figure it out, I added a primitive unit test method "testResize" in
>> > "AbstractFetcherThreadTest" to simulate the replica fetchers resizing
>> > process from 10 threads to 60 threads with originally 10 topics and 100
>> > partitions each. I designed the test to show that after the resizing
>> > process, all partitions should be redistributed correctly based on the
>> new
>> > thread number. However, the test failed because when I tried to compare
>> the
>> > "fetcherThreadMap" with the fetcherId for each topic-partition, the
>> > fetcherId mismatched! The unit test I added is in this commit (
>> >
>> https://github.com/yufeiyan1220/kafka/commit/eb99b7499b416cdeb44c2ccd3ea55a1e38ea3d60
>> ),
>> > and the standard output of the unit test showed in attachment.
>> >
>> > I doubt that maybe it is because the method "addFetcherForPartitions"
>> > which maybe adds some new fetchers to "fetcherThreadMap" called in the
>> > block of iterating the "fetcherThreadMap", and the iterator ignore some
>> > fetchers, which leads to some of the fetchers remain their topic
>> > partitions. And it leads to the imbalanced partition distribution
>> pattern
>> > in resizing.
>> >
>> > To solve this issue I make a mutable map to store all partitions and its
>> > fetch offset, and then add it back once out of the iteration. I make
>> > another commit (
>> >
>> https://github.com/yufeiyan1220/kafka/commit/0a793dfca2ab9b8e8b45ba5693359960f3e306eb
>> ),
>> > and the new resize method passed the unit test.
>> >
>> > I'm not sure whether it is an issue that Kafka Community need to fix.
>> But
>> > for me, it affects the fetching efficiency when I try to deploy Kafka
>> cross
>> > regions with high network latency.
>> >
>> > I'd really appreciate to hear from you!
>> >
>> >
>>
>


Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

2022-03-26 Thread Luke Chen
Hi Mickael,

Thanks for the KIP!
It's indeed a pain point for the Kafka admins.

I have some comments:
1. Typo in motivation section: When administrators [when to] remove brokers
from a cluster,
2. If different `replica.placer.class.name` configs are set in all
controllers, I think only the config for  "active controller" will take
effect, right?
3. Could you explain more about how the proposal fixes some scenarios you
listed, ex: the new added broker case. How could we know the broker is new
added? I guess it's by checking the broker load via some metrics
dynamically, right?


Thank you.
Luke

On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan  wrote:

> Thanks Mickael, this makes sense to me! I've been wanting something like
> this in order to decommission a broker without new partitions getting
> accidentally assigned to it.
>
> Ryanne
>
> On Thu, Mar 17, 2022, 5:56 AM Mickael Maison 
> wrote:
>
> > Hi,
> >
> > I'd like to start a new discussion on KIP-660. I originally wrote this
> > KIP in 2020 and the initial discussion
> > (https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9)
> > raised some concerns especially around KRaft (which did not exist at
> > that time) and scalability.
> >
> > Since then, we got a new KRaft controller so I've been able to revisit
> > this KIP. I kept the KIP number as it's essentially the same idea, but
> > the proposal is significantly different:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer
> >
> > Please take a look and let me know if you have any feedback.
> >
> > Thanks,
> > Mickael
> >
>


Re: [VOTE] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-26 Thread John Roesler
Thanks for the KIP, Sergio!

I’m +1 (binding)

Thanks,
John

On Sat, Mar 26, 2022, at 03:32, David Jacot wrote:
> +1 (binding). Thanks for the KIP.
>
> Best,
> David
>
> Le ven. 25 mars 2022 à 07:11, Luke Chen  a écrit :
>
>> Hi Sergio,
>>
>> Thanks for the KIP!
>> +1(binding) from me.
>>
>> Thank you.
>> Luke
>>
>> On Fri, Mar 25, 2022 at 1:40 PM Sergio Daniel Troiano
>>  wrote:
>>
>> > Hi lads,
>> >
>> > I would like to start the vote on this:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
>> >
>> > As an extra information we are already using the patch in our company, so
>> > thanks to this patch and other extra script we did (maybe I will start
>> > another KIP later) we started saving plenty of money.
>> >
>> > Best regards.
>> > Sergio Troiano
>> >
>>


Re: [VOTE] KIP-794: Strictly Uniform Sticky Partitioner

2022-03-26 Thread Lucas Bradstreet
Hi Artem,

Thank you for all the work on this. I think it'll be quite impactful.

+1 non-binding from me.

Lucas

On Wed, Mar 23, 2022 at 8:27 PM Luke Chen  wrote:

> Hi Artem,
>
> Thanks for the KIP and the patience during discussion!
> +1 binding from me.
>
> Luke
>
> On Thu, Mar 24, 2022 at 3:43 AM Ismael Juma  wrote:
>
> > Thanks for the KIP and for taking the time to address all the feedback.
> +1
> > (binding)
> >
> > Ismael
> >
> > On Mon, Mar 21, 2022 at 4:52 PM Artem Livshits
> >  wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start a vote on
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
> > > .
> > >
> > > -Artem
> > >
> >
>