Re: [DISCUSS] KIP-827: Expose logdirs total and usable space via Kafka API
Hey Mickael, Great KIP! I have one question: You mentioned "DescribeLogDirs is usually a low volume API. This change should not significantly affect the latency of this API." and "That would allow to easily validate whether disk operations (like a resize), or topic deletion (log deletion only happen after a short delay) have completed." I wonder if there is an existing metric/API that can allow administrators to determine whether we need to resize? If administrators use this API to determine whether we need a resize, would this API become a high-volume API? I understand we don't want this API to be a high-volume one because the API is already costly by returning `"name": "Topics"`. Cong On Thu, Apr 7, 2022 at 2:17 AM Mickael Maison wrote: > Hi, > > I wrote a small KIP to expose the total and usable space of logdirs > via the DescribeLogDirs API: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-827%3A+Expose+logdirs+total+and+usable+space+via+Kafka+API > > Please take a look and let me know if you have any feedback. > > Thanks, > Mickael >
Re: [DISCUSS] KIP-827: Expose logdirs total and usable space via Kafka API
Thanks for the explanation. I think the question is that if we have disk utilization in our environment, what is the use case for KIP-827? The disk utilization in our environment can already do the job. Is there anything I missed? Thanks, Cong On Tue, May 31, 2022 at 2:57 AM Mickael Maison wrote: > Hi Cong, > > Kafka does not expose disk utilization metrics. This is something you > need to provide in your environment. You definitively should have a > mechanism for exposing metrics from your Kafka broker hosts and you > should absolutely monitor disk usage and have appropriate alerts. > > Thanks, > Mickael > > On Thu, May 26, 2022 at 7:34 PM Jun Rao wrote: > > > > Hi, Igor, > > > > Thanks for the reply. > > > > I agree that this KIP could be useful for improving the tool for moving > > data across disks. It would be useful to clarify on the main motivation > of > > the KIP. Also, DescribeLogDirsResponse already includes the size of each > > partition on a disk. So, it seems that UsableBytes is redundant since > it's > > derivable. > > > > Thanks, > > > > Jun > > > > On Thu, May 26, 2022 at 3:30 AM Igor Soarez wrote: > > > > > Hi, > > > > > > This can also be quite useful to make better use of existing > functionality > > > in the Kafka API — moving replicas between log directories via > > > ALTER_REPLICA_LOG_DIRS. If usable space information is also available > the > > > caller can make better decisions using the same API. It means a more > > > consistent way of interacting with Kafka to manage replicas locations > > > within a broker without having to correlate Kafka metrics with > information > > > from the Kafka API. > > > > > > -- > > > Igor > > > > > > On Wed, May 25, 2022, at 8:16 PM, Jun Rao wrote: > > > > Hi, Mickael, > > > > > > > > Thanks for the KIP. Since this is mostly for monitoring and > alerting, > > > > could we expose them as metrics instead of as part of the API? We > already > > > > have a size metric per log. Perhaps we could extend that to add > > > used/total > > > > metrics per disk? > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > On Thu, May 19, 2022 at 10:21 PM Raman Verma > > > > > > > > wrote: > > > > > > > >> Hello Mikael, > > > >> > > > >> Thanks for the KIP. > > > >> > > > >> I see that the API response contains some information about each > > > partition. > > > >> ``` > > > >> { "name": "PartitionSize", "type": "int64", "versions": "0+", > > > >> "about": "The size of the log segments in this partition in > bytes." } > > > >> ``` > > > >> Can this be summed up to provide a used space in a `log.dir` > > > >> This will also be specific to a `log.dir` (for the case where > multiple > > > >> log.dir are hosted on the same underlying device) > > > >> > > > >> On Thu, May 19, 2022 at 10:21 AM Cong Ding > > > > >> wrote: > > > >> > > > > >> > Hey Mickael, > > > >> > > > > >> > Great KIP! > > > >> > > > > >> > I have one question: > > > >> > > > > >> > You mentioned "DescribeLogDirs is usually a low volume API. This > > > change > > > >> > should not > > > >> > significantly affect the latency of this API." and "That would > allow > > > to > > > >> > easily validate whether disk operations (like a resize), or topic > > > >> deletion > > > >> > (log deletion only happen after a short delay) have completed." I > > > wonder > > > >> if > > > >> > there is an existing metric/API that can allow administrators to > > > >> determine > > > >> > whether we need to resize? If administrators use this API to > determine > > > >> > whether we need a resize, would this API become a high-volume > API? I > > > >> > understand we don't want this API to be a high-volume one because > the > > > API > > > >> > is already costly by returning `"name": "Topics"`. > > > >> > > > > >> > Cong > > > >> > > > > >> > On Thu, Apr 7, 2022 at 2:17 AM Mickael Maison < > > > mickael.mai...@gmail.com> > > > >> > wrote: > > > >> > > > > >> > > Hi, > > > >> > > > > > >> > > I wrote a small KIP to expose the total and usable space of > logdirs > > > >> > > via the DescribeLogDirs API: > > > >> > > > > > >> > > > > > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-827%3A+Expose+logdirs+total+and+usable+space+via+Kafka+API > > > >> > > > > > >> > > Please take a look and let me know if you have any feedback. > > > >> > > > > > >> > > Thanks, > > > >> > > Mickael > > > >> > > > > > >> > > > >> > > > >> > > > >> -- > > > >> Best Regards, > > > >> Raman Verma > > > >> > > > >
Re: [DISCUSS] KIP-827: Expose logdirs total and usable space via Kafka API
Thank you, Mickael. One more question: are you imaging these tooling/automation to call this API at a very low frequency? since high-frequency calls to this API are prohibitively expensive. Can you give some examples of low-frequency call use cases? I can think of some high-frequency call use cases which are valid in this case, but I had a hard time coming up with low-frequency call use cases. The one you give in the KIP is validating whether disk resize operations have been completed. However, this looks like a high-frequency call use case to me because we need to keep monitoring disk usage before and after resizing. Cong On Fri, Jun 3, 2022 at 5:22 AM Mickael Maison wrote: > > Hi Cong, > > Maybe some people can do without this KIP. > But in many cases, especially around tooling and automation, it's > useful to be able to retrieve disk utilization values via the Kafka > API rather than interfacing with a metrics system. > > Does that clarify the motivation? > > Thanks, > Mickael > > On Wed, Jun 1, 2022 at 7:10 PM Cong Ding wrote: > > > > Thanks for the explanation. I think the question is that if we have disk > > utilization in our environment, what is the use case for KIP-827? The disk > > utilization in our environment can already do the job. Is there anything I > > missed? > > > > Thanks, > > Cong > > > > On Tue, May 31, 2022 at 2:57 AM Mickael Maison > > wrote: > > > > > Hi Cong, > > > > > > Kafka does not expose disk utilization metrics. This is something you > > > need to provide in your environment. You definitively should have a > > > mechanism for exposing metrics from your Kafka broker hosts and you > > > should absolutely monitor disk usage and have appropriate alerts. > > > > > > Thanks, > > > Mickael > > > > > > On Thu, May 26, 2022 at 7:34 PM Jun Rao wrote: > > > > > > > > Hi, Igor, > > > > > > > > Thanks for the reply. > > > > > > > > I agree that this KIP could be useful for improving the tool for moving > > > > data across disks. It would be useful to clarify on the main motivation > > > of > > > > the KIP. Also, DescribeLogDirsResponse already includes the size of each > > > > partition on a disk. So, it seems that UsableBytes is redundant since > > > it's > > > > derivable. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > On Thu, May 26, 2022 at 3:30 AM Igor Soarez wrote: > > > > > > > > > Hi, > > > > > > > > > > This can also be quite useful to make better use of existing > > > functionality > > > > > in the Kafka API — moving replicas between log directories via > > > > > ALTER_REPLICA_LOG_DIRS. If usable space information is also available > > > the > > > > > caller can make better decisions using the same API. It means a more > > > > > consistent way of interacting with Kafka to manage replicas locations > > > > > within a broker without having to correlate Kafka metrics with > > > information > > > > > from the Kafka API. > > > > > > > > > > -- > > > > > Igor > > > > > > > > > > On Wed, May 25, 2022, at 8:16 PM, Jun Rao wrote: > > > > > > Hi, Mickael, > > > > > > > > > > > > Thanks for the KIP. Since this is mostly for monitoring and > > > alerting, > > > > > > could we expose them as metrics instead of as part of the API? We > > > already > > > > > > have a size metric per log. Perhaps we could extend that to add > > > > > used/total > > > > > > metrics per disk? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > On Thu, May 19, 2022 at 10:21 PM Raman Verma > > > > > > > > > > > > > > wrote: > > > > > > > > > > > >> Hello Mikael, > > > > > >> > > > > > >> Thanks for the KIP. > > > > > >> > > > > > >> I see that the API response contains some information about each > > > > > partition. > > > > > >> ``` > > > > > >> { "name": "PartitionSize", "type": "int64&quo
Jira contributor request
Hello, I would like to become a contributor in JIRA, would you please grant me permission to do so? Jira ID: ccding Best Regards, Cong
[jira] [Created] (KAFKA-13603) empty active segment can trigger recovery after clean shutdown and restart
Cong Ding created KAFKA-13603: - Summary: empty active segment can trigger recovery after clean shutdown and restart Key: KAFKA-13603 URL: https://issues.apache.org/jira/browse/KAFKA-13603 Project: Kafka Issue Type: Bug Reporter: Cong Ding Within a LogSegment, the TimeIndex and OffsetIndex are lazy indices that don't get created on disk until they are accessed for the first time. If the active segment is empty at the time of the clean shutdown, the disk will have only the log file but no index files. However, Log recovery logic expects the presence of an offset index file on disk for each segment, otherwise, the segment is considered corrupted. We need to address this issue: create the index files for empty active segments during clean shutdown. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13149) Null Pointer Exception when record==null at produce path
Cong Ding created KAFKA-13149: - Summary: Null Pointer Exception when record==null at produce path Key: KAFKA-13149 URL: https://issues.apache.org/jira/browse/KAFKA-13149 Project: Kafka Issue Type: Bug Components: log Reporter: Cong Ding In production, we have seen an exception {code:java} java.lang.NullPointerException: Cannot invoke "org.apache.kafka.common.record.Record.hasMagic(byte)" because "record" is null{code} which is triggered by [https://github.com/apache/kafka/blob/bfc57aa4ddcd719fc4a646c2ac09d4979c076455/core/src/main/scala/kafka/log/LogValidator.scala#L191] when handling a produce request. The reason is that [https://github.com/apache/kafka/blob/bfc57aa4ddcd719fc4a646c2ac09d4979c076455/clients/src/main/java/org/apache/kafka/common/record/DefaultRecord.java#L294-L296] returns record==null, which is possibly caused by a bad client. We should let the broker throw an invalid record exception and notify clients -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13315) log layer exception during shutdown that caused an unclean shutdown
Cong Ding created KAFKA-13315: - Summary: log layer exception during shutdown that caused an unclean shutdown Key: KAFKA-13315 URL: https://issues.apache.org/jira/browse/KAFKA-13315 Project: Kafka Issue Type: Bug Reporter: Cong Ding We have seen an exception caused by shutting down scheduler before shutting down LogManager. When LogManager was closing partitons one by one, scheduler called to delete old segments due to retention. However, the old segments could have been closed by the LogManager, which subsequently marked logdir as offline and didn't write the clean shutdown marker. Ultimately the broker would take hours to restart. -- This message was sent by Atlassian Jira (v8.3.4#803005)