I understand your point but I think we could potentially improve the user experience here by sending different error codes for the following two different situations. Situation 1: "when broker hits an Exception accessing a logdir" -> UNKNOWN_SPACE Situation 2: "when feature is not supported" -> UNSUPPORTED Different error codes will help the customer understand whether the failure was due to a server/client mismatch (situation 2) or was it a genuine exception while trying to find the value (situation 1).
Or, are you saying that we don't have a way to distinguish between situation 1 and situation 2 from code perspective and hence, we will send UNKNOWN in both cases? Divij Vaidya On Mon, May 2, 2022 at 6:24 PM Mickael Maison <mickael.mai...@gmail.com> wrote: > Hi Divij, > > The new fields default to -1 in the protocol too. So in case a broker > hits an Exception accessing a logdir and sends an error response back, > the client will get -1. For this reason I think UNKNOWN_SPACE is > slightly better than UNSUPPORTED. > > Thanks, > Mickael > > On Fri, Apr 22, 2022 at 9:35 AM Tom Bentley <tbent...@redhat.com> wrote: > > > > Hi Mickael, > > > > Thanks for the KIP, I can see this would be useful. > > > > I guess you could have used optional tagged fields, rather than bumping > the > > version, but then again I don't see it being particularly advantageous in > > this case either. > > > > Kind regards, > > > > Tom > > > > On Tue, 19 Apr 2022 at 10:23, Divij Vaidya <divijvaidy...@gmail.com> > wrote: > > > > > I have a minor suggestion below but overall KIP looks good to me to > start a > > > vote. > > > > > > *Reg#6* Would you consider replacing UNKNOWN_SPACE with UNSUPPORTED? > > > UNSUPPORTED tells the user explicitly that the value is missing due to > > > client/server version mismatch whereas with UNKNOWN_SPACE, the user is > left > > > wondering whether it is a problem with underlying storage not providing > > > space information or something else. > > > > > > Divij Vaidya > > > > > > > > > > > > On Fri, Apr 15, 2022 at 3:40 PM Mickael Maison < > mickael.mai...@gmail.com> > > > wrote: > > > > > > > Hi Luke, > > > > > > > > 7. I've updated the KIP to clarify these sizes are in bytes. > > > > > > > > Thanks, > > > > Mickael > > > > > > > > On Fri, Apr 15, 2022 at 12:16 PM Luke Chen <show...@gmail.com> > wrote: > > > > > > > > > > Hi Mickael, > > > > > > > > > > Thanks for the KIP! > > > > > This is a good improvement. > > > > > > > > > > (3) +1 for not adding the number of files in the directory. > Counting > > > the > > > > > file numbers should be slow. > > > > > (7) Could you make the fields clear in `DescribeLogDirsResponse`, > to > > > > > mention the returned number is size in Byte (or not?) > > > > > > > > > > Thank you. > > > > > Luke > > > > > > > > > > On Fri, Apr 15, 2022 at 5:27 PM Mickael Maison < > > > mickael.mai...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > Thanks for the feedback. > > > > > > > > > > > > 3. Yes that's right. Also the number of file descriptors is > really > > > not > > > > > > a property of log directories. Administrators typically tracked > that > > > > > > count per process and for the whole operating system. > > > > > > > > > > > > 5. That's a good point, I've updated the KIP to mention sizes > will be > > > > > > capped to Long.MAX_VALUE even if the actual storage is larger. > > > > > > > > > > > > 6. Brokers would never return UNKNOWN_SPACE. When new clients > query > > > > > > older brokers via the admin API, the admin client will use > > > > > > UNKNOWN_SPACE to indicate these values weren't provided by > brokers. > > > > > > > > > > > > Thanks, > > > > > > Mickael > > > > > > > > > > > > On Fri, Apr 8, 2022 at 5:00 PM Divij Vaidya < > divijvaidy...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > Thanks for replying. I still have a few lingering > > > questions/comments. > > > > > > > > > > > > > > *Reg#1* Understood. I checked and the underlying system call is > > > > statvfs > > > > > > for > > > > > > > unix systems which should be ok to call here. > > > > > > > *Reg#2* Fair point. I checked again and yes, log.dir always > means > > > > local > > > > > > > storage even when tiered storage is enabled. > > > > > > > *Reg#3* The rationale for adding these new (size) fields to the > > > > > > > `DescribeLogDirs` is to allow the administrator to monitor or > > > perhaps > > > > > > take > > > > > > > automated action based on results. Doesn't monitoring the > number of > > > > file > > > > > > > descriptors fall in the same category of use cases? I am > assuming > > > > that we > > > > > > > want to add the size information in the API response because > JVM > > > > makes it > > > > > > > possible to get this information in a platform agnostic manner > > > which > > > > is > > > > > > not > > > > > > > true for open file descriptors, correct? > > > > > > > *Reg#4* Agree. > > > > > > > *New#5*: As an FYI, Java FileStore API breaks on large storage > > > sizes. > > > > > > See: > > > > > > > https://bugs.openjdk.java.net/browse/JDK-8162520. > ElasticSearch > > > has > > > > been > > > > > > > hit by these limitations in the past. For JDK 11, you will > probably > > > > have > > > > > > to > > > > > > > add defensive checks such as > > > > > > > > > > > > > > > > > > > > > https://github.com/opensearch-project/OpenSearch/blob/b74d71fb747cc2873d4c2ffae825944da4d06e1b/server/src/main/java/org/opensearch/monitor/fs/FsProbe.java#L148 > > > > > > . > > > > > > > The documentation of the API mentioned in KIP will also be > modified > > > > to > > > > > > > account for this edge case. > > > > > > > *New#6*: Can you please provide an example where the return for > > > these > > > > > > APIs > > > > > > > would be UNKNOWN_SPACE? Doesn't JVM guarantee that this API > will > > > > > > definitely > > > > > > > return results (else it throws an IOException)? I would propose > > > that > > > > we > > > > > > get > > > > > > > rid of default since JVM guarantees that this would work on all > > > > > > platforms. > > > > > > > If it doesn't then it's a bug and should be uncovered via an > > > > exception. > > > > > > > > > > > > > > Also, I would like to volunteer to code review (of course, it > would > > > > be > > > > > > > non-binding) your implementation once this KIP is approved. > > > > > > > > > > > > > > Regards, > > > > > > > Divij Vaidya > > > > > > > > > > > > > > On Fri, Apr 8, 2022 at 11:35 AM Mickael Maison < > > > > mickael.mai...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Divij, > > > > > > > > > > > > > > > > Thanks for taking a look! > > > > > > > > > > > > > > > > 1. In order to retrieve the sizes, the plan is to use > > > > getTotalSpace() > > > > > > > > and getUsableSpace() from java.nio.file.FileStore. The > > > > implementations > > > > > > > > may vary depending on the filesystem but these calls > typically > > > > don't > > > > > > > > depend on the size of storage but instead just return > metadata > > > the > > > > > > > > filesystem maintains. > > > > > > > > 2. I'm not an expert on KIP-405, so correct me if I'm wrong. > As > > > > far as > > > > > > > > I understand brokers will still have local log dirs and > remote > > > > volumes > > > > > > > > are not counted as log dirs. KIP-405 does not mention > updating > > > the > > > > > > > > DescribeLogDirs API. So I don't think this KIP needs to do > > > anything > > > > > > > > special to be compatible with KIP-405. On the other hand, I > > > wonder > > > > if > > > > > > > > KIP-405 should update DescribeLogDirs to provide details > about > > > the > > > > > > > > location of replicas. > > > > > > > > 3. Counting files can be a slow operation as it requires > > > exploring > > > > all > > > > > > > > paths recursively to find all files. Administrators should > > > > > > > > definitively monitor file descriptors via host metrics but > I'm > > > not > > > > > > > > sure it's something we want to expose via the Kafka API. As > > > > mentioned > > > > > > > > it could be slow to compute and files are not really a Kafka > > > > concept. > > > > > > > > 4. DescribeLogDirs is usually a low volume API. This change > > > should > > > > not > > > > > > > > significantly affect the latency of this API. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 7, 2022 at 1:41 PM Divij Vaidya < > > > > divijvaidy...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi Mickael > > > > > > > > > > > > > > > > > > Thanks for starting this. It is a very useful feature. > > > > > > > > > > > > > > > > > > Some initial thoughts (I am new to Kafka so please excuse > if > > > > these > > > > > > are > > > > > > > > > naive suggestions): > > > > > > > > > 1. What is the impact on latency of the DescribeLogDirs > API due > > > > to > > > > > > this > > > > > > > > > change? Would calculating the totalSpace from each logdir > be a > > > > > > bottleneck > > > > > > > > > for the API? What if we are talking about a large storage > size > > > > in the > > > > > > > > order > > > > > > > > > of hundred (or tens) of GBs? > > > > > > > > > 2. How does this fit in with RemoteStorage (KIP-405)? I > think > > > > > > integration > > > > > > > > > with KIP-405 is worth discussing in the scope of this KIP. > My > > > > > > > > > recommendation will be to add a new API in the RLMM > > > > > > > > > (RemoteLogMetadataManager) called GetLogSize() and leave it > > > upto > > > > the > > > > > > > > remote > > > > > > > > > storage to perform a concrete implementation for this > > > > > > > > > interface. DescribeLogDirs could call this interface > internally > > > > to > > > > > > > > provide > > > > > > > > > the relevant information. > > > > > > > > > 3. Do you think adding the number of files in the > directory as > > > > part > > > > > > of > > > > > > > > the > > > > > > > > > API response will be useful as well? e.g. a use case where > this > > > > > > > > information > > > > > > > > > will be useful is to monitor/alarm the situations when the > > > > number of > > > > > > > > files > > > > > > > > > are dangerously reaching the max value of file descriptors > > > > > > configured at > > > > > > > > > the OS. > > > > > > > > > 4. Please add an API latency perf test as part of the > release > > > > > > criteria > > > > > > > > for > > > > > > > > > this change. We want to avoid regression. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Divij Vaidya > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 7, 2022 at 11:17 AM Mickael Maison < > > > > > > mickael.mai...@gmail.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I wrote a small KIP to expose the total and usable space > of > > > > logdirs > > > > > > > > > > via the DescribeLogDirs API: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-827%3A+Expose+logdirs+total+and+usable+space+via+Kafka+API > > > > > > > > > > > > > > > > > > > > Please take a look and let me know if you have any > feedback. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >