Re: Request to be added to kafka contributors list
Hi Matthia, I tried sign out and sign in, still can't find the "Assign" button, my JIRA ID is fanyan, could you help me set it again? Best, Fan From: Matthias J. Sax Sent: Saturday, May 18, 2024 4:06 To: users@kafka.apache.org Subject: Re: 回复: Request to be added to kafka contributors list Did you sign out and sign in again? On 5/17/24 9:49 AM, Yang Fan wrote: > Thanks Matthias, > > I still can't find "Assign to me" button beside Assignee and Reporter. Could > you help me set it again? > > Best regards, > Fan > > 发件人: Matthias J. Sax > 发送时间: 2024年5月17日 2:24 > 收件人: users@kafka.apache.org > 主题: Re: Request to be added to kafka contributors list > > Thanks for reaching out Yang. You should be all set. > > -Matthias > > On 5/16/24 7:40 AM, Yang Fan wrote: >> Dear Apache Kafka Team, >> >> I hope this email finds you well. My name is Fan Yang, JIRA ID is fanyan, I >> kindly request to be added to the contributors list for Apache Kafka. Being >> part of this list would allow me to be assigned to JIRA tickets and work on >> them. >> Thank you for considering my request. >> Best regards, >> Fan
Re: Fwd: Request to be added to kafka contributors list
Hello, It works like a charm. Few questions: 1. Now, I'm asking my self, I did the job describe in JIRA 16707 in a fork/branch of the 3.7.0 of kafka, but reading the "Contributing Code Change", I feeI should have done it on a branch from trunk of my fork? (if so, I'll just do on my fork a new branch, rebase, and re-run test for sure, I just want to from which point I should start to PR correctly) 2. when doing a "gradelew clean test" from a clean fork of the 3.7.0 branch, I have a failure, so I'm asking my self how it will be managed when I'll do the PR, do you know? Best regards On 21/05/2024 03:35, Matthias J. Sax wrote: Done. You should be all set :) -Matthias On 5/20/24 10:10 AM, bou...@ulukai.net wrote: Dear Apache Kafka Team, I hope to post in the right place: my name is Franck LEDAY, under Apache-Jira ID "handfreezer". I opened an issue as Improvement KAFKA-16707 but I failed to assigned it to me. May I ask to be added to the contributors list for Apache Kafka? As I already did the job of improvement, and would like to be assigned on to end my contribution. Thank you for considering my request. Best regards, Franck.
Re: Request for contributor list
I believe it was brendendeluna, that is what I entered for username on the self serve page. I am unable to find a Jira ID though. Please let me know if brendendeluna works or if I need something else. On Mon, May 20, 2024 at 8:35 PM Matthias J. Sax wrote: > What is your Jira ID? > > -Matthias > > > On 5/20/24 9:55 AM, Brenden Deluna wrote: > > Hello, I am requesting to be added to the contributor list to take care > of > > some tickets. Thank you. > > >
Re: outerjoin not joining after window
Hello, I think the issue is related to certain partitions not getting records to advance stream time (because of low volume). Can someone confirm that each partition has its own stream time and that the stream time for a partition only advances when a record is written to the topic after the window closes? If I use the repartition method on each input topic to reduce the number of partitions for those streams, how many instances of the application will process records? For example, if the input topics each have 6 partitions, and I use the repartition method to set the number of partitions for the streams to 2, how many instances of the application will process records? Thanks, Chad On Wed, May 1, 2024 at 6:47 PM Matthias J. Sax wrote: > >>> How do you know this? > >> First thing we do is write a log message in the value joiner. We don't > see > >> the log message for the missed records. > > Well, for left/right join results, the ValueJoiner would only be called > when the window is closed... And for invalid input (or late record, ie, > which arrive out-of-order and their window was already closes), records > would be dropped right away. So you cannot really infer that a record > did make it into the join or not, or what happens if it did make it into > the `Processor`. > > -> https://kafka.apache.org/documentation/#kafka_streams_task_monitoring > > `dropped-records-total` is the name of the metric. > > > > -Matthias > > > > On 5/1/24 11:35 AM, Chad Preisler wrote: > > Hello, > > > > We did some testing in our test environment today. We are seeing some > > records processes where only one side of the join has a record. So that's > > good. However, we are still seeing some records get skipped. They never > hit > > the value joiner (we write a log message first thing in the value > joiner). > > During the test we were putting some load on the system, so stream time > was > > advancing. We did notice that the join windows were taking much longer > than > > 30 minutes to close and process records. Thirty minutes is the window > plus > > grace. > > > >> How do you know this? > > First thing we do is write a log message in the value joiner. We don't > see > > the log message for the missed records. > > > > I will try pushing the same records locally. However, we don't see any > > errors in our logs and the stream does process one sided joins after the > > skipped record. Do you have any docs on the "dropper records" metric? I > did > > a Google search and didn't find many good results for that. > > > > Thanks, > > > > Chad > > > > On Tue, Apr 30, 2024 at 2:49 PM Matthias J. Sax > wrote: > > > Thanks for the information. I ran the code using Kafka locally. After > submitting some records inside and outside of the time window and > grace, > the join performed as expected when running locally. > >> > >> That gives some hope :) > >> > >> > >> > >>> However, they never get into the join. > >> > >> How do you know this? > >> > >> > >> Did you check the metric for dropper records? Maybe records are > >> considers malformed and dropped? Are you using the same records in > >> production and in your local test? > >> > >> > Are there any settings for the stream client that would affect the > join? > >> > >> Not that I can think of... There is one more internal config, but as > >> long as data is flowing, it should not impact the result you see. > >> > >> > Are there any settings on the broker side that would affect the join? > >> > >> No. The join is computed client side. Broker configs should not have any > >> impact. > >> > >>> f I increase the log level for the streams API would that > shed some light on what is happening? > >> > >> I don't think it would help much. The code in question is > >> org.apache.kafka.streams.kstream.internals.KStreamKStreamJoin -- but it > >> does not do any logging except WARN for the already mentioned "dropping > >> malformed" records that is also recorded via JMX. > >> > >>> WARN: "Skipping record due to null key or value. " > >> > >> > >> If you can identify a specific record from the input which would produce > >> an output, but does not, maybe you can try to feed it into your local > >> test env and try to re-produce the issue? > >> > >> > >> -Matthias > >> > >> On 4/30/24 11:38 AM, Chad Preisler wrote: > >>> Matthias, > >>> > >>> Thanks for the information. I ran the code using Kafka locally. After > >>> submitting some records inside and outside of the time window and > grace, > >>> the join performed as expected when running locally. > >>> > >>> I'm not sure why the join is not working as expected when running > against > >>> our actual brokers. We are peeking at the records for the streams and > we > >>> are seeing the records get pulled. However, they never get into the > join. > >>> It's been over 24 hours since the expected records were created, and > >> there > >>> has been plenty of traffic to advance the stream time. Only records > that > >
Re: outerjoin not joining after window
See one small edit below... On Tue, May 21, 2024 at 10:25 AM Chad Preisler wrote: > Hello, > > I think the issue is related to certain partitions not getting records to > advance stream time (because of low volume). Can someone confirm that each > partition has its own stream time and that the stream time for a partition > only advances when a record is written to the partition after the window > closes? > > If I use the repartition method on each input topic to reduce the number > of partitions for those streams, how many instances of the application will > process records? For example, if the input topics each have 6 partitions, > and I use the repartition method to set the number of partitions for the > streams to 2, how many instances of the application will process records? > > Thanks, > Chad > > > On Wed, May 1, 2024 at 6:47 PM Matthias J. Sax wrote: > >> >>> How do you know this? >> >> First thing we do is write a log message in the value joiner. We don't >> see >> >> the log message for the missed records. >> >> Well, for left/right join results, the ValueJoiner would only be called >> when the window is closed... And for invalid input (or late record, ie, >> which arrive out-of-order and their window was already closes), records >> would be dropped right away. So you cannot really infer that a record >> did make it into the join or not, or what happens if it did make it into >> the `Processor`. >> >> -> https://kafka.apache.org/documentation/#kafka_streams_task_monitoring >> >> `dropped-records-total` is the name of the metric. >> >> >> >> -Matthias >> >> >> >> On 5/1/24 11:35 AM, Chad Preisler wrote: >> > Hello, >> > >> > We did some testing in our test environment today. We are seeing some >> > records processes where only one side of the join has a record. So >> that's >> > good. However, we are still seeing some records get skipped. They never >> hit >> > the value joiner (we write a log message first thing in the value >> joiner). >> > During the test we were putting some load on the system, so stream time >> was >> > advancing. We did notice that the join windows were taking much longer >> than >> > 30 minutes to close and process records. Thirty minutes is the window >> plus >> > grace. >> > >> >> How do you know this? >> > First thing we do is write a log message in the value joiner. We don't >> see >> > the log message for the missed records. >> > >> > I will try pushing the same records locally. However, we don't see any >> > errors in our logs and the stream does process one sided joins after the >> > skipped record. Do you have any docs on the "dropper records" metric? I >> did >> > a Google search and didn't find many good results for that. >> > >> > Thanks, >> > >> > Chad >> > >> > On Tue, Apr 30, 2024 at 2:49 PM Matthias J. Sax >> wrote: >> > >> Thanks for the information. I ran the code using Kafka locally. After >> submitting some records inside and outside of the time window and >> grace, >> the join performed as expected when running locally. >> >> >> >> That gives some hope :) >> >> >> >> >> >> >> >>> However, they never get into the join. >> >> >> >> How do you know this? >> >> >> >> >> >> Did you check the metric for dropper records? Maybe records are >> >> considers malformed and dropped? Are you using the same records in >> >> production and in your local test? >> >> >> >> >> Are there any settings for the stream client that would affect the >> join? >> >> >> >> Not that I can think of... There is one more internal config, but as >> >> long as data is flowing, it should not impact the result you see. >> >> >> >> >> Are there any settings on the broker side that would affect the join? >> >> >> >> No. The join is computed client side. Broker configs should not have >> any >> >> impact. >> >> >> >>> f I increase the log level for the streams API would that >> shed some light on what is happening? >> >> >> >> I don't think it would help much. The code in question is >> >> org.apache.kafka.streams.kstream.internals.KStreamKStreamJoin -- but it >> >> does not do any logging except WARN for the already mentioned "dropping >> >> malformed" records that is also recorded via JMX. >> >> >> >>> WARN: "Skipping record due to null key or value. " >> >> >> >> >> >> If you can identify a specific record from the input which would >> produce >> >> an output, but does not, maybe you can try to feed it into your local >> >> test env and try to re-produce the issue? >> >> >> >> >> >> -Matthias >> >> >> >> On 4/30/24 11:38 AM, Chad Preisler wrote: >> >>> Matthias, >> >>> >> >>> Thanks for the information. I ran the code using Kafka locally. After >> >>> submitting some records inside and outside of the time window and >> grace, >> >>> the join performed as expected when running locally. >> >>> >> >>> I'm not sure why the join is not working as expected when running >> against >> >>> our actual brokers. We are peeking at the records for the streams and >> we >>
Request to be added to kafka contributors list
Dear Apache Kafka Team, I hope to post in the right place: my name is Franck LEDAY, under Apache-Jira ID "handfreezer". I opened an issue as Improvement KAFKA-16707 but I failed to assigned it to me. May I ask to be added to the contributors list for Apache Kafka? As I already did the job of improvement, and would like to be assigned on to end my contribution. Thank you for considering my request. Best regards, Franck.
Re: Fwd: Request to be added to kafka contributors list
Hi Franck, Thank you for contributing to Apache Kafka! 1. Git is generally permissive of this, as long as there are no merge conflicts. If you have merge conflicts with `trunk`, you will need to resolve them before a committer can merge your changes, so rebasing on trunk before opening the PR is a good idea :) 2. Are you on an M1 mac, with a recent (>11) JDK? I've been experiencing some consistent failures recently [1] and haven't figured it out yet. You may also be getting a flaky failure: a test which is nondeterministic and sometimes fails. We are constantly trying to burn down the list of flaky tests [2], but there are still some around. As far as how this impacts the PR: You should find and resolve all of the deterministic failures that you introduce in the PR, and do your best to check whether you introduced any flakiness. You can look for tickets mentioning those failures, or ask a committer for more information. Hope this helps, Greg Harris [1] https://issues.apache.org/jira/browse/KAFKA-16701 [2] https://issues.apache.org/jira/browse/KAFKA-16701?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20flaky-test On Tue, May 21, 2024 at 6:43 AM Franck wrote: > > Hello, > > It works like a charm. > > Few questions: > > 1. Now, I'm asking my self, I did the job describe in JIRA 16707 in a > fork/branch of the 3.7.0 of kafka, but reading the "Contributing > Code Change", I feeI should have done it on a branch from trunk of > my fork? (if so, I'll just do on my fork a new branch, rebase, and > re-run test for sure, I just want to from which point I should start > to PR correctly) > 2. when doing a "gradelew clean test" from a clean fork of the 3.7.0 > branch, I have a failure, so I'm asking my self how it will be > managed when I'll do the PR, do you know? > > Best regards > > On 21/05/2024 03:35, Matthias J. Sax wrote: > > Done. You should be all set :) > > > > > > -Matthias > > > > On 5/20/24 10:10 AM, bou...@ulukai.net wrote: > >> > >> Dear Apache Kafka Team, > >> > >> I hope to post in the right place: my name is Franck LEDAY, > >> under Apache-Jira ID "handfreezer". > >> > >> I opened an issue as Improvement KAFKA-16707 but I failed to > >> assigned it to me. > >> > >> May I ask to be added to the contributors list for Apache Kafka? > >> As I already did the job of improvement, and would like to be > >> assigned on to end my contribution. > >> > >> Thank you for considering my request. > >> Best regards, Franck.
Re: outerjoin not joining after window
After reviewing the logs, I think I understand what happens with the repartition topics. Looks like they will be assigned to one or more instances. In my example I ran three instances of the application (A, B, C). Looks like the two repartition topics got assigned to A and B. The six partitions from the input topics got split evenly across all three running instances A, B, and C. Since the repartitioned streams are what I'm joining on, I guess the join will run on two instances, and any input topic processing will run across all three. Is that correct? Still would like clarification regarding some records appearing to not get processed: I think the issue is related to certain partitions not getting records to advance stream time (because of low volume). Can someone confirm that each partition has its own stream time and that the stream time for a partition only advances when a record is written to the partition after the window closes? On Tue, May 21, 2024 at 10:27 AM Chad Preisler wrote: > See one small edit below... > > On Tue, May 21, 2024 at 10:25 AM Chad Preisler > wrote: > >> Hello, >> >> I think the issue is related to certain partitions not getting records to >> advance stream time (because of low volume). Can someone confirm that each >> partition has its own stream time and that the stream time for a partition >> only advances when a record is written to the partition after the window >> closes? >> >> If I use the repartition method on each input topic to reduce the number >> of partitions for those streams, how many instances of the application will >> process records? For example, if the input topics each have 6 partitions, >> and I use the repartition method to set the number of partitions for the >> streams to 2, how many instances of the application will process records? >> >> Thanks, >> Chad >> >> >> On Wed, May 1, 2024 at 6:47 PM Matthias J. Sax wrote: >> >>> >>> How do you know this? >>> >> First thing we do is write a log message in the value joiner. We >>> don't see >>> >> the log message for the missed records. >>> >>> Well, for left/right join results, the ValueJoiner would only be called >>> when the window is closed... And for invalid input (or late record, ie, >>> which arrive out-of-order and their window was already closes), records >>> would be dropped right away. So you cannot really infer that a record >>> did make it into the join or not, or what happens if it did make it into >>> the `Processor`. >>> >>> -> https://kafka.apache.org/documentation/#kafka_streams_task_monitoring >>> >>> `dropped-records-total` is the name of the metric. >>> >>> >>> >>> -Matthias >>> >>> >>> >>> On 5/1/24 11:35 AM, Chad Preisler wrote: >>> > Hello, >>> > >>> > We did some testing in our test environment today. We are seeing some >>> > records processes where only one side of the join has a record. So >>> that's >>> > good. However, we are still seeing some records get skipped. They >>> never hit >>> > the value joiner (we write a log message first thing in the value >>> joiner). >>> > During the test we were putting some load on the system, so stream >>> time was >>> > advancing. We did notice that the join windows were taking much longer >>> than >>> > 30 minutes to close and process records. Thirty minutes is the window >>> plus >>> > grace. >>> > >>> >> How do you know this? >>> > First thing we do is write a log message in the value joiner. We don't >>> see >>> > the log message for the missed records. >>> > >>> > I will try pushing the same records locally. However, we don't see any >>> > errors in our logs and the stream does process one sided joins after >>> the >>> > skipped record. Do you have any docs on the "dropper records" metric? >>> I did >>> > a Google search and didn't find many good results for that. >>> > >>> > Thanks, >>> > >>> > Chad >>> > >>> > On Tue, Apr 30, 2024 at 2:49 PM Matthias J. Sax >>> wrote: >>> > >>> Thanks for the information. I ran the code using Kafka locally. >>> After >>> submitting some records inside and outside of the time window and >>> grace, >>> the join performed as expected when running locally. >>> >> >>> >> That gives some hope :) >>> >> >>> >> >>> >> >>> >>> However, they never get into the join. >>> >> >>> >> How do you know this? >>> >> >>> >> >>> >> Did you check the metric for dropper records? Maybe records are >>> >> considers malformed and dropped? Are you using the same records in >>> >> production and in your local test? >>> >> >>> >> >>> Are there any settings for the stream client that would affect the >>> join? >>> >> >>> >> Not that I can think of... There is one more internal config, but as >>> >> long as data is flowing, it should not impact the result you see. >>> >> >>> >> >>> Are there any settings on the broker side that would affect the >>> join? >>> >> >>> >> No. The join is computed client side. Broker configs should not have >>> any >>> >> impact. >>> >> >>> >>> f I increase the log level for
Re: Request to be added to kafka contributors list
Ok. Hopefully it's working now. Sorry for the hiccup. -Matthias On 5/21/24 1:14 AM, Fan Yang wrote: Hi Matthia, I tried sign out and sign in, still can't find the "Assign" button, my JIRA ID is fanyan, could you help me set it again? Best, Fan From: Matthias J. Sax Sent: Saturday, May 18, 2024 4:06 To: users@kafka.apache.org Subject: Re: 回复: Request to be added to kafka contributors list Did you sign out and sign in again? On 5/17/24 9:49 AM, Yang Fan wrote: Thanks Matthias, I still can't find "Assign to me" button beside Assignee and Reporter. Could you help me set it again? Best regards, Fan 发件人: Matthias J. Sax 发送时间: 2024年5月17日 2:24 收件人: users@kafka.apache.org 主题: Re: Request to be added to kafka contributors list Thanks for reaching out Yang. You should be all set. -Matthias On 5/16/24 7:40 AM, Yang Fan wrote: Dear Apache Kafka Team, I hope this email finds you well. My name is Fan Yang, JIRA ID is fanyan, I kindly request to be added to the contributors list for Apache Kafka. Being part of this list would allow me to be assigned to JIRA tickets and work on them. Thank you for considering my request. Best regards, Fan
Re: Fwd: Request to be added to kafka contributors list
Hello, 1/ for sure 2/ after rebasing my code change, I'll do the full test. To answer to the question I'm under Debian 12, OpenJDK 17.0.10 on my dev machine. Best regards Le 2024-05-21 16:46, Greg Harris a écrit : Hi Franck, Thank you for contributing to Apache Kafka! 1. Git is generally permissive of this, as long as there are no merge conflicts. If you have merge conflicts with `trunk`, you will need to resolve them before a committer can merge your changes, so rebasing on trunk before opening the PR is a good idea :) 2. Are you on an M1 mac, with a recent (>11) JDK? I've been experiencing some consistent failures recently [1] and haven't figured it out yet. You may also be getting a flaky failure: a test which is nondeterministic and sometimes fails. We are constantly trying to burn down the list of flaky tests [2], but there are still some around. As far as how this impacts the PR: You should find and resolve all of the deterministic failures that you introduce in the PR, and do your best to check whether you introduced any flakiness. You can look for tickets mentioning those failures, or ask a committer for more information. Hope this helps, Greg Harris [1] https://issues.apache.org/jira/browse/KAFKA-16701 [2] https://issues.apache.org/jira/browse/KAFKA-16701?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20flaky-test On Tue, May 21, 2024 at 6:43 AM Franck wrote: Hello, It works like a charm. Few questions: 1. Now, I'm asking my self, I did the job describe in JIRA 16707 in a fork/branch of the 3.7.0 of kafka, but reading the "Contributing Code Change", I feeI should have done it on a branch from trunk of my fork? (if so, I'll just do on my fork a new branch, rebase, and re-run test for sure, I just want to from which point I should start to PR correctly) 2. when doing a "gradelew clean test" from a clean fork of the 3.7.0 branch, I have a failure, so I'm asking my self how it will be managed when I'll do the PR, do you know? Best regards On 21/05/2024 03:35, Matthias J. Sax wrote: > Done. You should be all set :) > > > -Matthias > > On 5/20/24 10:10 AM, bou...@ulukai.net wrote: >> >> Dear Apache Kafka Team, >> >> I hope to post in the right place: my name is Franck LEDAY, >> under Apache-Jira ID "handfreezer". >> >> I opened an issue as Improvement KAFKA-16707 but I failed to >> assigned it to me. >> >> May I ask to be added to the contributors list for Apache Kafka? >> As I already did the job of improvement, and would like to be >> assigned on to end my contribution. >> >> Thank you for considering my request. >> Best regards, Franck.
Re: Request to be added to kafka contributors list
It's working now. Thank you Matthias! From: Matthias J. Sax Sent: Wednesday, May 22, 2024 2:58 To: users@kafka.apache.org Subject: Re: Request to be added to kafka contributors list Ok. Hopefully it's working now. Sorry for the hiccup. -Matthias On 5/21/24 1:14 AM, Fan Yang wrote: > Hi Matthia, > > I tried sign out and sign in, still can't find the "Assign" button, my JIRA > ID is fanyan, could you help me set it again? > > Best, > Fan > > > From: Matthias J. Sax > Sent: Saturday, May 18, 2024 4:06 > To: users@kafka.apache.org > Subject: Re: 回复: Request to be added to kafka contributors list > > Did you sign out and sign in again? > > On 5/17/24 9:49 AM, Yang Fan wrote: >> Thanks Matthias, >> >> I still can't find "Assign to me" button beside Assignee and Reporter. Could >> you help me set it again? >> >> Best regards, >> Fan >> >> 发件人: Matthias J. Sax >> 发送时间: 2024年5月17日 2:24 >> 收件人: users@kafka.apache.org >> 主题: Re: Request to be added to kafka contributors list >> >> Thanks for reaching out Yang. You should be all set. >> >> -Matthias >> >> On 5/16/24 7:40 AM, Yang Fan wrote: >>> Dear Apache Kafka Team, >>> >>> I hope this email finds you well. My name is Fan Yang, JIRA ID is fanyan, I >>> kindly request to be added to the contributors list for Apache Kafka. Being >>> part of this list would allow me to be assigned to JIRA tickets and work on >>> them. >>> Thank you for considering my request. >>> Best regards, >>> Fan