Hi Divij,

The new fields default to -1 in the protocol too. So in case a broker
hits an Exception accessing a logdir and sends an error response back,
the client will get -1. For this reason I think UNKNOWN_SPACE is
slightly better than UNSUPPORTED.

Thanks,
Mickael

On Fri, Apr 22, 2022 at 9:35 AM Tom Bentley <tbent...@redhat.com> wrote:
>
> Hi Mickael,
>
> Thanks for the KIP, I can see this would be useful.
>
> I guess you could have used optional tagged fields, rather than bumping the
> version, but then again I don't see it being particularly advantageous in
> this case either.
>
> Kind regards,
>
> Tom
>
> On Tue, 19 Apr 2022 at 10:23, Divij Vaidya <divijvaidy...@gmail.com> wrote:
>
> > I have a minor suggestion below but overall KIP looks good to me to start a
> > vote.
> >
> > *Reg#6* Would you consider replacing UNKNOWN_SPACE with UNSUPPORTED?
> > UNSUPPORTED tells the user explicitly that the value is missing due to
> > client/server version mismatch whereas with UNKNOWN_SPACE, the user is left
> > wondering whether it is a problem with underlying storage not providing
> > space information or something else.
> >
> > Divij Vaidya
> >
> >
> >
> > On Fri, Apr 15, 2022 at 3:40 PM Mickael Maison <mickael.mai...@gmail.com>
> > wrote:
> >
> > > Hi Luke,
> > >
> > > 7. I've updated the KIP to clarify these sizes are in bytes.
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Fri, Apr 15, 2022 at 12:16 PM Luke Chen <show...@gmail.com> wrote:
> > > >
> > > > Hi Mickael,
> > > >
> > > > Thanks for the KIP!
> > > > This is a good improvement.
> > > >
> > > > (3) +1 for not adding the number of files in the directory. Counting
> > the
> > > > file numbers should be slow.
> > > > (7) Could you make the fields clear in `DescribeLogDirsResponse`, to
> > > > mention the returned number is size in Byte (or not?)
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Fri, Apr 15, 2022 at 5:27 PM Mickael Maison <
> > mickael.mai...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Thanks for the feedback.
> > > > >
> > > > > 3. Yes that's right. Also the number of file descriptors is really
> > not
> > > > > a property of log directories. Administrators typically tracked that
> > > > > count per process and for the whole operating system.
> > > > >
> > > > > 5. That's a good point, I've updated the KIP to mention sizes will be
> > > > > capped to Long.MAX_VALUE even if the actual storage is larger.
> > > > >
> > > > > 6. Brokers would never return UNKNOWN_SPACE. When new clients query
> > > > > older brokers via the admin API, the admin client will use
> > > > > UNKNOWN_SPACE to indicate these values weren't provided by brokers.
> > > > >
> > > > > Thanks,
> > > > > Mickael
> > > > >
> > > > > On Fri, Apr 8, 2022 at 5:00 PM Divij Vaidya <divijvaidy...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > Thanks for replying. I still have a few lingering
> > questions/comments.
> > > > > >
> > > > > > *Reg#1* Understood. I checked and the underlying system call is
> > > statvfs
> > > > > for
> > > > > > unix systems which should be ok to call here.
> > > > > > *Reg#2* Fair point. I checked again and yes, log.dir always means
> > > local
> > > > > > storage even when tiered storage is enabled.
> > > > > > *Reg#3* The rationale for adding these new (size) fields to the
> > > > > > `DescribeLogDirs` is to allow the administrator to monitor or
> > perhaps
> > > > > take
> > > > > > automated action based on results. Doesn't monitoring the number of
> > > file
> > > > > > descriptors fall in the same category of use cases? I am assuming
> > > that we
> > > > > > want to add the size information in the API response because JVM
> > > makes it
> > > > > > possible to get this information in a platform agnostic manner
> > which
> > > is
> > > > > not
> > > > > > true for open file descriptors, correct?
> > > > > > *Reg#4* Agree.
> > > > > > *New#5*: As an FYI, Java FileStore API breaks on large storage
> > sizes.
> > > > > See:
> > > > > > https://bugs.openjdk.java.net/browse/JDK-8162520. ElasticSearch
> > has
> > > been
> > > > > > hit by these limitations in the past. For JDK 11, you will probably
> > > have
> > > > > to
> > > > > > add defensive checks such as
> > > > > >
> > > > >
> > >
> > https://github.com/opensearch-project/OpenSearch/blob/b74d71fb747cc2873d4c2ffae825944da4d06e1b/server/src/main/java/org/opensearch/monitor/fs/FsProbe.java#L148
> > > > > .
> > > > > > The documentation of the API mentioned in KIP will also be modified
> > > to
> > > > > > account for this edge case.
> > > > > > *New#6*: Can you please provide an example where the return for
> > these
> > > > > APIs
> > > > > > would be UNKNOWN_SPACE? Doesn't JVM guarantee that this API will
> > > > > definitely
> > > > > > return results (else it throws an IOException)? I would propose
> > that
> > > we
> > > > > get
> > > > > > rid of default since JVM guarantees that this would work on all
> > > > > platforms.
> > > > > > If it doesn't then it's a bug and should be uncovered via an
> > > exception.
> > > > > >
> > > > > > Also, I would like to volunteer to code review (of course, it would
> > > be
> > > > > > non-binding) your implementation once this KIP is approved.
> > > > > >
> > > > > > Regards,
> > > > > > Divij Vaidya
> > > > > >
> > > > > > On Fri, Apr 8, 2022 at 11:35 AM Mickael Maison <
> > > mickael.mai...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Divij,
> > > > > > >
> > > > > > > Thanks for taking a look!
> > > > > > >
> > > > > > > 1. In order to retrieve the sizes, the plan is to use
> > > getTotalSpace()
> > > > > > > and getUsableSpace() from java.nio.file.FileStore. The
> > > implementations
> > > > > > > may vary depending on the filesystem but these calls typically
> > > don't
> > > > > > > depend on the size of storage but instead just return metadata
> > the
> > > > > > > filesystem maintains.
> > > > > > > 2. I'm not an expert on KIP-405, so correct me if I'm wrong. As
> > > far as
> > > > > > > I understand brokers will still have local log dirs and remote
> > > volumes
> > > > > > > are not counted as log dirs. KIP-405 does not mention updating
> > the
> > > > > > > DescribeLogDirs API. So I don't think this KIP needs to do
> > anything
> > > > > > > special to be compatible with KIP-405. On the other hand, I
> > wonder
> > > if
> > > > > > > KIP-405 should update DescribeLogDirs to provide details about
> > the
> > > > > > > location of replicas.
> > > > > > > 3. Counting files can be a slow operation as it requires
> > exploring
> > > all
> > > > > > > paths recursively to find all files. Administrators should
> > > > > > > definitively monitor file descriptors via host metrics but I'm
> > not
> > > > > > > sure it's something we want to expose via the Kafka API. As
> > > mentioned
> > > > > > > it could be slow to compute and files are not really a Kafka
> > > concept.
> > > > > > > 4. DescribeLogDirs is usually a low volume API. This change
> > should
> > > not
> > > > > > > significantly affect the latency of this API.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Mickael
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Apr 7, 2022 at 1:41 PM Divij Vaidya <
> > > divijvaidy...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Mickael
> > > > > > > >
> > > > > > > > Thanks for starting this. It is a very useful feature.
> > > > > > > >
> > > > > > > > Some initial thoughts (I am new to Kafka so please excuse if
> > > these
> > > > > are
> > > > > > > > naive suggestions):
> > > > > > > > 1. What is the impact on latency of the DescribeLogDirs API due
> > > to
> > > > > this
> > > > > > > > change? Would calculating the totalSpace from each logdir be a
> > > > > bottleneck
> > > > > > > > for the API? What if we are talking about a large storage size
> > > in the
> > > > > > > order
> > > > > > > > of hundred (or tens) of GBs?
> > > > > > > > 2. How does this fit in with RemoteStorage (KIP-405)? I think
> > > > > integration
> > > > > > > > with KIP-405 is worth discussing in the scope of this KIP. My
> > > > > > > > recommendation will be to add a new API in the RLMM
> > > > > > > > (RemoteLogMetadataManager) called GetLogSize() and leave it
> > upto
> > > the
> > > > > > > remote
> > > > > > > > storage to perform a concrete implementation for this
> > > > > > > > interface. DescribeLogDirs could call this interface internally
> > > to
> > > > > > > provide
> > > > > > > > the relevant information.
> > > > > > > > 3. Do you think adding the number of files in the directory as
> > > part
> > > > > of
> > > > > > > the
> > > > > > > > API response will be useful as well? e.g. a use case where this
> > > > > > > information
> > > > > > > > will be useful is to monitor/alarm the situations when the
> > > number of
> > > > > > > files
> > > > > > > > are dangerously reaching the max value of file descriptors
> > > > > configured at
> > > > > > > > the OS.
> > > > > > > > 4. Please add an API latency perf test as part of the release
> > > > > criteria
> > > > > > > for
> > > > > > > > this change. We want to avoid regression.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Divij Vaidya
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Apr 7, 2022 at 11:17 AM Mickael Maison <
> > > > > mickael.mai...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I wrote a small KIP to expose the total and usable space of
> > > logdirs
> > > > > > > > > via the DescribeLogDirs API:
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-827%3A+Expose+logdirs+total+and+usable+space+via+Kafka+API
> > > > > > > > >
> > > > > > > > > Please take a look and let me know if you have any feedback.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Mickael
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >

Reply via email to