Thanks for replying. I still have a few lingering questions/comments.

*Reg#1* Understood. I checked and the underlying system call is statvfs for
unix systems which should be ok to call here.
*Reg#2* Fair point. I checked again and yes, log.dir always means local
storage even when tiered storage is enabled.
*Reg#3* The rationale for adding these new (size) fields to the
`DescribeLogDirs` is to allow the administrator to monitor or perhaps take
automated action based on results. Doesn't monitoring the number of file
descriptors fall in the same category of use cases? I am assuming that we
want to add the size information in the API response because JVM makes it
possible to get this information in a platform agnostic manner which is not
true for open file descriptors, correct?
*Reg#4* Agree.
*New#5*: As an FYI, Java FileStore API breaks on large storage sizes. See:
https://bugs.openjdk.java.net/browse/JDK-8162520. ElasticSearch has been
hit by these limitations in the past. For JDK 11, you will probably have to
add defensive checks such as
https://github.com/opensearch-project/OpenSearch/blob/b74d71fb747cc2873d4c2ffae825944da4d06e1b/server/src/main/java/org/opensearch/monitor/fs/FsProbe.java#L148.
The documentation of the API mentioned in KIP will also be modified to
account for this edge case.
*New#6*: Can you please provide an example where the return for these APIs
would be UNKNOWN_SPACE? Doesn't JVM guarantee that this API will definitely
return results (else it throws an IOException)? I would propose that we get
rid of default since JVM guarantees that this would work on all platforms.
If it doesn't then it's a bug and should be uncovered via an exception.

Also, I would like to volunteer to code review (of course, it would be
non-binding) your implementation once this KIP is approved.

Regards,
Divij Vaidya

On Fri, Apr 8, 2022 at 11:35 AM Mickael Maison <mickael.mai...@gmail.com>
wrote:

> Hi Divij,
>
> Thanks for taking a look!
>
> 1. In order to retrieve the sizes, the plan is to use getTotalSpace()
> and getUsableSpace() from java.nio.file.FileStore. The implementations
> may vary depending on the filesystem but these calls typically don't
> depend on the size of storage but instead just return metadata the
> filesystem maintains.
> 2. I'm not an expert on KIP-405, so correct me if I'm wrong. As far as
> I understand brokers will still have local log dirs and remote volumes
> are not counted as log dirs. KIP-405 does not mention updating the
> DescribeLogDirs API. So I don't think this KIP needs to do anything
> special to be compatible with KIP-405. On the other hand, I wonder if
> KIP-405 should update DescribeLogDirs to provide details about the
> location of replicas.
> 3. Counting files can be a slow operation as it requires exploring all
> paths recursively to find all files. Administrators should
> definitively monitor file descriptors via host metrics but I'm not
> sure it's something we want to expose via the Kafka API. As mentioned
> it could be slow to compute and files are not really a Kafka concept.
> 4. DescribeLogDirs is usually a low volume API. This change should not
> significantly affect the latency of this API.
>
> Thanks,
> Mickael
>
>
>
>
> On Thu, Apr 7, 2022 at 1:41 PM Divij Vaidya <divijvaidy...@gmail.com>
> wrote:
> >
> > Hi Mickael
> >
> > Thanks for starting this. It is a very useful feature.
> >
> > Some initial thoughts (I am new to Kafka so please excuse if these are
> > naive suggestions):
> > 1. What is the impact on latency of the DescribeLogDirs API due to this
> > change? Would calculating the totalSpace from each logdir be a bottleneck
> > for the API? What if we are talking about a large storage size in the
> order
> > of hundred (or tens) of GBs?
> > 2. How does this fit in with RemoteStorage (KIP-405)? I think integration
> > with KIP-405 is worth discussing in the scope of this KIP. My
> > recommendation will be to add a new API in the RLMM
> > (RemoteLogMetadataManager) called GetLogSize() and leave it upto the
> remote
> > storage to perform a concrete implementation for this
> > interface. DescribeLogDirs could call this interface internally to
> provide
> > the relevant information.
> > 3. Do you think adding the number of files in the directory as part of
> the
> > API response will be useful as well? e.g. a use case where this
> information
> > will be useful is to monitor/alarm the situations when the number of
> files
> > are dangerously reaching the max value of file descriptors configured at
> > the OS.
> > 4. Please add an API latency perf test as part of the release criteria
> for
> > this change. We want to avoid regression.
> >
> > Regards,
> > Divij Vaidya
> >
> >
> >
> > On Thu, Apr 7, 2022 at 11:17 AM Mickael Maison <mickael.mai...@gmail.com
> >
> > wrote:
> >
> > > Hi,
> > >
> > > I wrote a small KIP to expose the total and usable space of logdirs
> > > via the DescribeLogDirs API:
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-827%3A+Expose+logdirs+total+and+usable+space+via+Kafka+API
> > >
> > > Please take a look and let me know if you have any feedback.
> > >
> > > Thanks,
> > > Mickael
> > >
>

Reply via email to